sergia-ch / deephack.rl Goto Github PK
View Code? Open in Web Editor NEWPlaying ATARI games using a convolutional autoencoder and an evolutionary algorithm. Team name: bad_skiers_evolved (DeepHack.RL)
Playing ATARI games using a convolutional autoencoder and an evolutionary algorithm. Team name: bad_skiers_evolved (DeepHack.RL)
Playing ATARI games using a convolutional autoencoder and an evolutionary algorithm. Usually, ATARI games are solved using DQN network [1]: 1. Convolutional layers 2. Fully-connected layers 3. Input: raw image, output: Q(s,a) 4. Training: gradient updates via Bellman equations. We use another approach for training fully connected layers: genetic algorithms. - Explanation of choice - In the game Skiing reward is given only at the end. Therefore, Bellman updates are useless for 99.9% frames. There are some techniques to overcome such obstacle (use advanced experience replay) [2], but, as the article shows, improvement is insignificant. Therefore, we use older approach to Atari games: neuroevolution [3], [4]. Specifically, we use NEAT algorithm [5]. It uses a specific representation of the fully connected part of the network, "genome". The algorithm changes genomes the following way: 1. Create random set of NN's 2. Evaluate their fitness (i.e. sum of rewards) 3. Choose the best ones 4. Crossover and mutate them (possibly adding new neurons) 5. Repeat stage 2 for the result. We train the convolutional part of the network in advance of running neuroevolution. Specifically, we use convolutional autoencoder: inp -> conv -> encoded -> deconv -> out 1. Sample 10000 frames from the environment using random actions: action_space.sample() 2. Train the autoencoder in supervised way 3. Remove deconv and out parts of the autoencoder 4. Use 'encoded' features as the description of 'inp' Code: 1. collect/ -- Autoencoder training & weights: gym-collect.ipynb -- autoencoder supervised training & saving results *.pkl -- saved weights 2. neat_python/ -- Neuroevolution using python-neat library Evolution.ipynb -- open autoencoder, train neuroevolution, send results to OpenAI fc.config -- configuration file for NEAT visualize.py -- used for plotting the resulting FC network 3. old -- old stuff 4. keyboard_agent.py -- human agent (used for debugging) Additionally, autoencoder receives not the raw observation, but a frame which roughly follows the idea of "Motion vectors" in video estimation [6]: alpha = 0.6 diff = zeros o = env.step() diff = (1 - alpha) * diff + alpha * (o - prev_o) prev_o = o 'diff' is used as the input for autoencoder [1] https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf [2] https://arxiv.org/pdf/1511.05952.pdf [3] http://people.idsia.ch/~koutnik/papers/koutnik2014gecco.pdf [4] http://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/TCIAIG13-mhauskn.pdf [5] http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf [6] https://en.wikipedia.org/wiki/Motion_vector
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.