GithubHelp home page GithubHelp logo

adibyte95 / cartpole-openai-gym Goto Github PK

View Code? Open in Web Editor NEW
12.0 2.0 5.0 139.44 MB

solution to cartpole problem of openAI gym with different approaches

License: MIT License

Jupyter Notebook 15.54% Python 84.46%
openai-gym cartpole-v0 keras dnn dqn reinforcement-learning

cartpole-openai-gym's Introduction

TOPIC HitCount

A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.

Approaches

1. random movements

in this approach we choose an random action (left or right) given a paticular state of the enviroment. needles to say this approach performs very poorly because it does not take into consideration the present state.
this approach because of its random nature is quite un predictable. On 10 trails runs the max time of survival is 118 timesteps and acg survival time of about 21 time steps which is pretty bad.

ramdom rendering gif

2. using weight vector

in this approach we take a random weight vecotor of size 4 which is equal to the dimension of the state of the enviroment. A dot product is taken between the weight vector and state and depending upon the value of the output we take an action i.e either left or right. we see that this method outperforms the previous method but this method does not uses any machine learning algorithm. Resutls of this approach is very impressive. with proper number of games played this approach can last for more than 1000 time steps.
on 10 trail run of this algorithm max score achieved was 762 and avg score of about315.
Note that these can change with trail run and we can get even better results than this with appropriate parameter tuining
bruteforce rendering gif

3. using deep neural networks

in this approach we take generate training data by randomly taking actions on the enviromnent . if the run is succesful that is the pole is balanced on the cart from more than 100 time steps we add this example to out training set. this approach aims that we can learn how to balance the pole by learning from good training examples. we then fit the model to this training data and try to predict the outcome that is action for any new observation.

neural network rendering gif

4. using deep Q networks

this uses a technique in which the model is rewarded is if makes correct action given the observations of a state and penalty otherwise. initially the model will not be very good at guessing the output but slowly it will become good at predicting the output. exploration and exploitation is carried simaltaneouly to find new improved solutions and to find the good solution in explored search space

comparison how model performs in the begining and after a few epochs
ย 
we can see that initialy the model was not able to perform very good, but eventually it learns from its mistakes and performs very good( 1199 is the upper time limit ...after this game is forcefully closed).even higher avg score can be achieved by training longer and increasing the time limit

plot of score during various episodes

the pole was balanced on the cart for more than 2000 timeframes and outperforms all the approaches used above reinforcement rendering gif

references

Sentdex
Machine Learning with Phil
Medium blog

Link to other OpenAI-GYM Enviroments

mountain car

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.