다시 한번 정리해보기 stats.stackexchange.com/questions/243384/deriving-bellmans-equation-in-reinforcement-learning