[Survey / RL] Action Masking 관련 자료

invalid action이 있는 경우 따로 계산할 필요가 없기 때문에, 선택지에서 애초에 제고를 하여 에이전트가 잘 학습되게 해야 한다.

그래서 action masking은 제한 조건이 있는 경우에 이쪽에 적용하는 것이 가장 좋을 것으로 판단된다.

그래서 관련된 리서치를 진행해보고 있는 중이다.

사례 및 간단 논문 설명

DQN
- 적용 사례 있음

PPO
- 적용 사례 없어서 논문 나옴
  - Action Mask 씌우고 확률값 재조정
- Only valid actions are used in the collection of trajectory T.
- During stochastic descent, again only valid actions are used in the calculation of Eq.
- softmax 사용할 때 예
  - K=4
  - 1,2 유효하지 않음
  - $y_k = \frac{exp(p_k)}{exp(p_3)+epx(p_4}$

Implementing action mask in proximal policy optimization (PPO) algorithm

towardsdatascience.com/action-masking-with-rllib-5e4bec5e7505

Action Masking with RLlib

RL algorithms learn via trial and error. The agent searches the state space early on and takes random actions to learn what leads to a…

towardsdatascience.com

www.sciencedirect.com/science/article/pii/S2405959520300746

Implementing action mask in proximal policy optimization (PPO) algorithm

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask i…

www.sciencedirect.com

arxiv.org/abs/2006.14171

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action space will typically be

arxiv.org

저작자표시 (새창열림)

'관심있는 주제 > RL' 카테고리의 다른 글

[RL] 강화학습 알고리즘 baseline 코드 URL (0)	2020.11.27
[Review] Distral: Robust Multitask Reinforcement Learning 논문 (0)	2020.11.04
[Review / RL ] Deep Reinforcement Learning in Large Discrete Action Spaces (0)	2020.10.24
RL multiple action space일 경우 단순 고민... (0)	2020.10.24
RL Environment Open Source (0)	2020.09.29

[Survey / RL] Action Masking 관련 자료

관련 예제

사례 및 간단 논문 설명

'관심있는 주제 > RL' 카테고리의 다른 글

AI 도구

AI 도구 사이드 패널

티스토리툴바