We rewrite the code from Flood Sun's Framework based on Double Q Learning(but we use a Q-network to replace Q-function) and DDQN with these changes:
- 1,every t episodes draw a terminal-reward figure
- 2,try another epsilon-greedy algorithm to fit this game.
-
1,The agent's behavior is not good enough after training(try
policy-based algorithm in the future). -
2,How can we achieve to use the less episodes to train the agent.
-
3,It could easily fall into local optimum(we try to change the
epsilon-greedy algorithm but it improves a little). -
4,If the training results satisfy a kind of distribution. The training result is so discrete, we are not able to ensure train a promising robot in a certain episodes.
-
5,It seems to have over-fitting problem.
As a reinforcement learning problem, we knows we need to obtain observations and output actions, and the 'brain' do the processing work.
Therefore, you can easily understand the BrainDDQN.py code. There are three interfaces:
- getInitState() for initialization
- getAction()
- setPerception(nextObservation,action,reward,terminal)
the game interface just need to be able to feed the action to the game and output observation,reward,terminal
This work is based on the repo: floodsung /DRL-FlappyBird