There are few caveats when you want to use a Recurrent Neural Network (RNN) policy with Policy Gradient Algorithms. This repository explains them and provide a solution for them. Please see the blog for more details.
I want to involve RL into my RNN model, your code is a great beginning to me. I have a question about the gradient calculation part in your '/pg_rnn.py' file. From my knowledge, when we calculate the gradient in RL, we need to multiple the 'reward' with 'log probability', but I cannot find the reward part in your code :( Could you please give me some guides? Thanks a lot!
Hi. When I run your code, I find a mistake in run_pg_rnn.py.
The last line print("reward is {0}".format(np.sum(episode["rewards"])))
should be print("reward is {0}".format(np.sum(episode["returns"])))