N-STEP On-Policy SARSA, N-STEP Off-Policy SARSA wiht Importance Sampling, N-STEP Expected SARA 코드 비교해보기
N STEP SARSA On-Policy def gen_epsilon_greedy_policy(n_action, epsilon): def policy_function(state, Q): probs = torch.ones(n_action) * epsilon / n_action best_action = torch.argmax(Q[state]).item() probs[best_action] += 1.0 - epsilon action = torch.multinomial(probs, 1).item() return action return policy_function from collections import defaultdict def n_step_sarsa(env, gamma, n_episode, alpha ,..
2020.07.19