[RL] Continuous Action 일 때 참고 (A2C)

2020. 9. 6. 13:12관심있는 주제/RL

728x90

A2C Continuous Reward (Online)

https://github.com/hermesdt/reinforcement-learning/blob/master/a2c/pendulum_a2c_online.ipynb

 

hermesdt/reinforcement-learning

Contribute to hermesdt/reinforcement-learning development by creating an account on GitHub.

github.com

https://medium.com/deeplearningmadeeasy/advantage-actor-critic-continuous-case-implementation-f55ce5da6b4c

 

Advantage Actor Critic continuous case implementation

Woha! This one have been quite tough! Also having a beautiful one year old kid doesn’t make writing articles and having side projects easy…

medium.com

github.com/colinskow/move37/blob/master/actor_critic/a2c_continuous.py

 

colinskow/move37

Coding Demos from the School of AI's Move37 Course - colinskow/move37

github.com

incredible.ai/reinforcement-learning/2019/07/20/Advantage-A2C/

 

N-Step Advantage Actor Critic Model

Improving Policy Gradients with a baseline The problem of the PG Policy Gradient는 다음과 같습니다. (REINFORCE Method 참고) \[\nabla_{\theta} J(\theta) = \sum^{T}_{t=1} \nabla_{\theta} \log \pi_{\theta} (a_t | s_t) \cdot R_t\] PG는 게임이 끝

incredible.ai

www.datahubbs.com/policy-gradients-and-advantage-actor-critic/

728x90