[Survey / RL] Action Masking 관련 자료

2020. 10. 24. 18:46관심있는 주제/RL

 

invalid action이 있는 경우 따로 계산할 필요가 없기 때문에, 선택지에서 애초에 제고를 하여 에이전트가 잘 학습되게 해야 한다. 

그래서 action masking은 제한 조건이 있는 경우에 이쪽에 적용하는 것이 가장 좋을 것으로 판단된다. 

그래서 관련된 리서치를 진행해보고 있는 중이다.

관련 예제

  • video game of snake
    •  

  • automated stock trading

사례 및 간단 논문 설명

  • DQN
    • 적용 사례 있음

  • PPO
    • 적용 사례 없어서 논문 나옴
      • Action Mask 씌우고 확률값 재조정
    • Only valid actions are used in the collection of trajectory T.
    • During stochastic descent, again only valid actions are used in the calculation of Eq.
    • softmax 사용할 때 예
      • K=4
      • 1,2 유효하지 않음
      • $y_k = \frac{exp(p_k)}{exp(p_3)+epx(p_4}$
      •  

 

 

 

 

Implementing action mask in proximal policy optimization (PPO) algorithm

 

 

 

 

towardsdatascience.com/action-masking-with-rllib-5e4bec5e7505

 

Action Masking with RLlib

RL algorithms learn via trial and error. The agent searches the state space early on and takes random actions to learn what leads to a…

towardsdatascience.com

www.sciencedirect.com/science/article/pii/S2405959520300746

 

Implementing action mask in proximal policy optimization (PPO) algorithm

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask i…

www.sciencedirect.com

arxiv.org/abs/2006.14171

 

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action space will typically be

arxiv.org

 

728x90