GithubHelp home page GithubHelp logo

initial-h / flappybird_dqn_with_target_network Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 1.0 77.83 MB

DQN with freezing target network in tensorflow on pygame FlappyBird

Python 100.00%
python tensorflow dqn deep-reinforcement-learning flappybird

flappybird_dqn_with_target_network's Introduction

FlappyBird in DQN with Target Network

Overview

This repo is based on yenchenlin/DeepLearningFlappyBird

I do these things on his repo:

  • add freezing target network in DQN algorithm
  • rewrite the network using tflearn,a wrapper of tensorflow (you can also write it only with tensorflow)
  • write a function for human to play the game(space key to fly, doing nothing to fall down)

Installation Dependencies:

  • Python3
  • TensorFlow >= 1.1.0
  • tflearn
  • pygame
  • OpenCV-Python

How to Install

  • my system is win10, linux/mac is a little defferent but easier
pip install tensorflow
pip install tflearn
pip install opencv-python

  pygame click here:PyGame
  tensorflow gpu click here:Windows10 64位下安装TensorFlow - 官方原生GPU版安装

How to Run

  • Play the game by yourself
python DQN_with_target_network.py --human_play True
  • Test the model
python DQN_with_target_network.py
  • Continue training
python DQN_with_target_network.py --train True

DQN Algorithm

This algorithm is on paper Human-level control through deep reinforcement learning


  • The only difference in this code is that I don't fix the C=10000,and adjust it with the network's cost.

Details

Environment,network architecture and the training parameter nearly all the same with yenchenlin/DeepLearningFlappyBird

  • preprocess

  • network

  • game
    I cancel the FPS restriction when training so that it can run faster depending on your computer.

Train Tricks

We know that RL algorithms always get better performance at the cost of stability, sometimes we can't get a good result even with the same code. It is highly based on the parameters of the algorithm,network and even the random seed. When we update some states' action for a more accurate Q value,other actions may get worse,and this is always unavoidable. So,when we find the pictures as follows, we should be aware that the network may has fallen into a bad circumstance and can't improve anymore.

  • Unbalanced samples

We can see from the cost picture that the network fits well, the max_value picture shows the Q value of the best action has converged and range in a section with very low max Q value(-1~0.1). When you train from scratch, you may encounter with this and it is because of Unbalanced samples. The bird always fly to the top of the game screen and finally bump the pipe in a clumsy way. So, the memory D saves all these data and doesn't konw there is a better way to go further and thinks it's the end state of the MDP problem. In addition,the network updates time after time to get lower and lower Q value, at the same time the best action does not change because of BP algorithm and unbalanced samples,in other words,the update is useless to choice a better action.

For example,a state S has two actions: 0 and 1. In fact,0 is a better action but now with a lower Q value,let's assume that Q(0)=0.2,Q(1)=0.3 here. And unfortunately the memory D now has all data about action 1 with reward -1,so the BP algorithm can only base the action 1's data to update the network. The memory D thinks the action 1's value should be -1,but knows nothing about action 0,so he updates action 1's Q value by amending it to -1,but at the same time action 0's Q value has been changed to -1.1 by chance and is still lower than action 1, so the update is useless to choice a better action and this is the reason caused by BP algorithm and unbalanced samples.

How to solve it? The most stupid but always effective method is restarting it to initialize the parameters(better initialize the network do nothing).There must be some better ways need more study.

Differences between with/without Target Network

Future Work Can Try

There are still some works may improve the performance

  • Training it for a longer time. This model reaches more than 5,000 score when training about 2 days on 4 million frames with COREi7 and 1080Ti(an enough long time for the bird to fly before bumping the pipe), far less than paper's 50 million frames and 38 days.
  • Memory D is only 50,000 here in total,and in paper it's 1 million most recent frames. A huge memory can save more data to fit a more stable network and also with longer time.
  • Frame-skipping technique. Now the network can take an action at every frame, it's very flexible for the network to adjust its state but also extend action/state space which needs more data to fit,and also will take more time for the network to compute and then output an action. In the original paper, it is said as follow:
    More precisely, the agent sees and selects actions on every kth frame instead of every frame, and its last action is repeated on skipped frames. Because running the emulator forward for one step requires much less computation than having the agent select an action, this technique allows the agent to play roughly k times more games without significantly increasing the runtime.We use k=4 for all games.
  • We use Adam to update network in this game,and RMSProp is used in original paper. I think they are all okay,but maybe there are some diffenrences?
  • More details are in the following picture,and maybe you can find some more improvements.

flappybird_dqn_with_target_network's People

Contributors

initial-h avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

guixianjin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.