2020. 9. 6. 13:12ㆍ관심있는 주제/RL
A2C Continuous Reward (Online)
https://github.com/hermesdt/reinforcement-learning/blob/master/a2c/pendulum_a2c_online.ipynb
hermesdt/reinforcement-learning
Contribute to hermesdt/reinforcement-learning development by creating an account on GitHub.
github.com
Advantage Actor Critic continuous case implementation
Woha! This one have been quite tough! Also having a beautiful one year old kid doesn’t make writing articles and having side projects easy…
medium.com
github.com/colinskow/move37/blob/master/actor_critic/a2c_continuous.py
colinskow/move37
Coding Demos from the School of AI's Move37 Course - colinskow/move37
github.com
incredible.ai/reinforcement-learning/2019/07/20/Advantage-A2C/
N-Step Advantage Actor Critic Model
Improving Policy Gradients with a baseline The problem of the PG Policy Gradient는 다음과 같습니다. (REINFORCE Method 참고) \[\nabla_{\theta} J(\theta) = \sum^{T}_{t=1} \nabla_{\theta} \log \pi_{\theta} (a_t | s_t) \cdot R_t\] PG는 게임이 끝
incredible.ai
www.datahubbs.com/policy-gradients-and-advantage-actor-critic/
'관심있는 주제 > RL' 카테고리의 다른 글
[RL] Great Reward Function 만들 때 참고 (0) | 2020.09.06 |
---|---|
[RL] Create Environment 만들 때 참고 (2) | 2020.09.06 |
N-STEP On-Policy SARSA, N-STEP Off-Policy SARSA wiht Importance Sampling, N-STEP Expected SARA 코드 비교해보기 (0) | 2020.07.19 |
강화학습 기초 자료 모음집 (0) | 2020.07.18 |
Sarsa, Q-Learning , Expected Sarsa, Double Q-Learning 코드 비교하기 (2) | 2020.07.18 |