GithubHelp home page GithubHelp logo

sergia-ch / deephack.rl Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 638 KB

Playing ATARI games using a convolutional autoencoder and an evolutionary algorithm. Team name: bad_skiers_evolved (DeepHack.RL)

Jupyter Notebook 98.45% Python 1.55%
deep-learning reinforcement-learning gym autoencoder neat-python python theano keras

deephack.rl's Introduction

Playing ATARI games using a convolutional autoencoder and an evolutionary algorithm.

Usually, ATARI games are solved using DQN network [1]:
1. Convolutional layers
2. Fully-connected layers
3. Input: raw image, output: Q(s,a)
4. Training: gradient updates via Bellman equations.

We use another approach for training fully connected layers: genetic algorithms.

- Explanation of choice - 
In the game Skiing reward is given only at the end. Therefore, Bellman updates are useless for 99.9% frames.
There are some techniques to overcome such obstacle (use advanced experience replay) [2], but, as the article shows, improvement is insignificant.

Therefore, we use older approach to Atari games: neuroevolution [3], [4]. Specifically, we use NEAT algorithm [5].
It uses a specific representation of the fully connected part of the network, "genome". The algorithm changes genomes the following way:
1. Create random set of NN's
2. Evaluate their fitness (i.e. sum of rewards)
3. Choose the best ones
4. Crossover and mutate them (possibly adding new neurons)
5. Repeat stage 2 for the result.

We train the convolutional part of the network in advance of running neuroevolution.
Specifically, we use convolutional autoencoder:

inp -> conv -> encoded -> deconv -> out

1. Sample 10000 frames from the environment using random actions: action_space.sample()
2. Train the autoencoder in supervised way
3. Remove deconv and out parts of the autoencoder
4. Use 'encoded' features as the description of 'inp'

Code:
1. collect/ -- Autoencoder training & weights:
gym-collect.ipynb -- autoencoder supervised training & saving results
*.pkl -- saved weights
2. neat_python/ -- Neuroevolution using python-neat library
Evolution.ipynb -- open autoencoder, train neuroevolution, send results to OpenAI
fc.config -- configuration file for NEAT
visualize.py -- used for plotting the resulting FC network
3. old -- old stuff
4. keyboard_agent.py -- human agent (used for debugging)

Additionally, autoencoder receives not the raw observation, but a frame which roughly follows the idea of "Motion vectors" in video estimation [6]:

  alpha = 0.6
  diff = zeros

  o = env.step()
  diff = (1 - alpha) * diff + alpha * (o - prev_o)
  prev_o = o

'diff' is used as the input for autoencoder

[1] https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
[2] https://arxiv.org/pdf/1511.05952.pdf
[3] http://people.idsia.ch/~koutnik/papers/koutnik2014gecco.pdf
[4] http://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/TCIAIG13-mhauskn.pdf
[5] http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf
[6] https://en.wikipedia.org/wiki/Motion_vector

deephack.rl's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.