GithubHelp home page GithubHelp logo

avrumnoor / pole-balancer Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 2.6 MB

Pole Balancer is a Python program that uses reinforcement learning (RL) to automatically design a policy for the classic controls problem of a cart balancing a pole. Through Markov decision processes framework, we can perform reinforcement learning without having any explicit knowledge of the physics of the underlying system, in our case, the pole and on the cart.

License: MIT License

Python 100.00%
cartpole-problem controls machine-learning reinforcement-learning

pole-balancer's Introduction

Pole Balancer

Forks Stars

NumPy SciPy

PyCharm

About Pole Balancer

Pole being balanced

Pole Balancer is a Python program that uses reinforcement learning (RL) to automatically design a policy for the classic controls problem of a cart balancing a pole. Through Markov decision processes framework, we can perform reinforcement learning without having any explicit knowledge of the physics of the underlying system, in our case, the pole on the cart.

Requirements

  • Ubuntu 18.04+, macOS 10.15+ and Windows 10+ (64-bit)
  • At least 5GB of memory
  • Anaconda/Miniconda
  • Python 3.6 or above
  • A Python IDE (Jupyter/PyCharm)

Getting Started

Install the following Python packages:

  • matplotlib
  • numpy
  • scipy
  • pillow

Clone

git clone https://github.com/avrumnoor/PoleBalancer.git

Run

python polebalancer.py

Model

A thin pole is hinged to a cart. The cart moves laterally on a smooth table surface. The program fails if either the angle of the pole deviates by more than a particular amount from the vertical position (i.e., if the pole falls over), or if the cart’s position goes out of bounds (i.e., if it falls off the end of the table).

Program Objective

Balance the pole with these constraints, by appropriately having the cart accelerate left and right.

Algorithm

  • Estimate a model (i.e., transition probabilities and rewards) for the underlying MDP.
  • Obtain a value function by solving Bellman’s equations for this estimataion to obtain a value function.
  • Act greedily with respect to this value function.
  • Initially, each state has estimated reward zero, and the estimated transition probabilities are uniform.
  • As the program goes along taking actions, it will gather observations on transitions and rewards, which it can use to get a better estimate of the MDP model.
  • Store the state transitions and reward observations each time, and update the model and value function/policy only periodically.
  • Each time a failure occurs, re-estimate the transition probabilities and rewards as the average of the observed values (if any).
  • Repeat previous steps until convergence (once several consecutive attempts (defined by the parameter NO LEARNING THRESHOLD) to solve Bellman’s equation all converge in the first iteration since this implies that the estimated model has stopped changing significantly).

Results

Graph of the results

Author

Avrum Noor

Buy Me A Coffee

LinkedIn Twitter Followers Stars

Acknowledgements

Anand Avati

Stanford Machine Learning Coursework

pole-balancer's People

Contributors

avrumnoor avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.