async-deep-flappybird's Introduction

Asynchronous Deep ReinFlappyBird

This repository contains an implementation of Asynchronous Advantage Actor-Critic (A3C) that teaches an agent to play Flappy Bird.

Performance

Coming soon!

Technical Details

For my tests, these are the training speeds when using a CPU (Intel Xeon E5620 2.40 GHz) or GPU (NVIDIA GTX1070).

	FF	LSTM
CPU	57 steps/s	TBA steps/s
GPU	400 steps/s	300 steps/s

Settings

Here are some of the available flags you can set when you train an agent. For the full list, see a3c.py.

Agent settings

mode / [train, display, visualize] - Which mode you want to activate when you start a session.
use_gpu / [True, False] - If you have a/want to use GPU to speed up the training process.
parallel_agent_size - Number of parallel agents to use during training.
action_size - Numbers of available actions.
agent_type / [FF, LSTM] - What type of A3C to train the agent with.

Training and Optimizer settings

The current settings are based on or borrowed from the [implemenentation] (https://github.com/miyosuda/async_deep_reinforce) by @miyosuda. They have not yet been optimized for Flappy Bird but rather used as is for now. Tell me settings that perform better than the current ones!

max_time_step - 40 000 000 - Maximum training steps.
initial_alpha_low - -5 - LogUniform low limit for learning rate (represents x in 10^x).
initial_alpha_high - -3 - LogUniform high limit for learning rate (represents x in 10^x).
gamma - 0.99 - Discount factor for rewards.
entropy_beta - 0.01 - Entropy regularization constant.
grad_norm_clip - 40.0- Gradient norm clipping.
rmsp_alpha - 0.99 - Decay parameter for RMSProp.
rmsp_epsilon - 0.1 - Epsilon parameter for RMSProp.
local_t_max - 5- Repeat step size.

Logging

log_level - Log level [NONE, FULL]
average_summary - How many episodes to average summary over.

Display

display_episodes - Numbers of episodes to display.
average_summary - How many episodes to average summary over.
display_log_level - Display log level - NONE prints end summary, MID prints episode summary and FULL prints the π-values, state value and reward for every state. [NONE, MID, FULL]

Getting started

To start a training session with the default parameters, run:

$ python a3c.py

To check your progress and possibly compare different experiments in real time, navigate to your async-deep-flappybird folder and start tensorboard by running:

$ tensorboard --logdir summaries/

Enjoy!

Credit

A3C - The A3C implementation used is a modified version by @miyosuda.

Flappy Bird - The Flappy Bird implementation is based on a version by @yenchenlin with som minor adjustments.

—

2016, Babak Toghiani-Rizi

async-deep-flappybird's People

Contributors

Stargazers

Watchers

async-deep-flappybird's Issues

What is the magic to make PyGame capable for multithreading?

Hi,

Thank you for your wonderful work on FlappyBird A3C!

I surveyed some other solutions training FlappyBird with A3C on Github, and tried out my own as well. I found many people argue that PyGame does not support multithreading (here), and so if you want to use A3C you have to do it by multiprocessing and implement the troublesome communication between processes. But I see you are just using multithreading here and it works well. Would you please share your idea on how did you achieve it?

Thank you!

How did you run 400 steps/sec with GTX 1070?

Hi,

Recently I ran this on a Tesla P40, and I found it ran only 237 steps/sec. I'm wondering if GTX 1070 can run 400 steps/sec, there might be some mistakes that I have made.

I simply run the code with python a3c.py --use_gpu True, is that all? Or I missed something?