GithubHelp home page GithubHelp logo

fwdeng / atari Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bhaktipriya/atari

0.0 1.0 0.0 17 KB

1-step Q Learning from the paper "Asynchronous Methods for Deep Reinforcement Learning"

Home Page: https://bhaktipriya96.wordpress.com/deep-atari/

Jupyter Notebook 75.41% Python 24.59%

atari's Introduction

Deep Atari

1-step Q Learning from the paper "Asynchronous Methods for Deep Reinforcement Learning" Atari games are one of the coolest games out there and have gained widespread mainstream popularity. Breakout is one of my personal favorites. Pong, which was the first game ever developed by Atari Inc. was also one of the most influential video games ever created. In 2013, Deep Mind released its paper “Playing Atari with Deep Reinforcement Learning”. It's a very popular paper in literature. My project implements 1-step Q Learning from this paper.

Environment:

Here, I've used OpenAI gym's Atari environment, which is a toolkit for developing and comparing RL algorithms.  Changing the environments is as simple as changing the value of a string variable.

Q learning:

In Q-learning we define a function Q(s, a) representing the maximum discounted future reward when we perform action a in state s, and continue optimally from that point on. Screen Shot 2015-12-21 at 11.09.47 AM We can think of Q(s, a) as the best possible score at the end of the game after performing action a in state s. It is called Q-function, because it represents the “quality” of a certain action in a given state.

Deep Q network:

We use a CNN which takes in the State S, and predicts the Q values for all the possible actions from that state S.  Screenshot from 2017-03-12 16:12:20   The network architecture that DeepMind used is as follows: Screen Shot 2015-12-21 at 11.23.28 AM This is a classical convolutional neural network with three convolutional layers, followed by two fully connected layers. There are no pooling layers since pooling layers buy us translation invariance, which is not something that we desire when we train our bots for games. Input to the network are four 84×84 grayscale game screens.  We use 4 recent screens as the environment state. Outputs of the network are Q-values for each possible action. This is a regression task, since Q-values can be any real values . The loss function of this network is a simple squared error loss.  

Training the network:

  Screenshot from 2017-03-12 17:28:24.png  

Experience Replay:

During gameplay all the experiences < s, a, r, s’> are stored in a replay memory. We use these minibatches to train the network, which makes the training task similar to usual supervised learning.

Screenshot from 2017-03-13 12:06:54.png

Exploitation vs Exploration:

Screenshot from 2017-03-13 12:08:36

Results:

https://www.youtube.com/watch?v=0KRVL-VkMGw

https://www.youtube.com/watch?v=0-ATaiFjzi8

https://www.youtube.com/watch?v=48HNdmfGEjE

atari's People

Contributors

bhaktipriya avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.