Pole Balancer

About Pole Balancer

Pole Balancer is a Python program that uses reinforcement learning (RL) to automatically design a policy for the classic controls problem of a cart balancing a pole. Through Markov decision processes framework, we can perform reinforcement learning without having any explicit knowledge of the physics of the underlying system, in our case, the pole on the cart.

Requirements

Ubuntu 18.04+, macOS 10.15+ and Windows 10+ (64-bit)
At least 5GB of memory
Anaconda/Miniconda
Python 3.6 or above
A Python IDE (Jupyter/PyCharm)

Getting Started

Install the following Python packages:

matplotlib
numpy
scipy
pillow

Clone

git clone https://github.com/avrumnoor/PoleBalancer.git

Run

python polebalancer.py

Model

A thin pole is hinged to a cart. The cart moves laterally on a smooth table surface. The program fails if either the angle of the pole deviates by more than a particular amount from the vertical position (i.e., if the pole falls over), or if the cart’s position goes out of bounds (i.e., if it falls off the end of the table).

Program Objective

Balance the pole with these constraints, by appropriately having the cart accelerate left and right.

Algorithm

Estimate a model (i.e., transition probabilities and rewards) for the underlying MDP.
Obtain a value function by solving Bellman’s equations for this estimataion to obtain a value function.
Act greedily with respect to this value function.
Initially, each state has estimated reward zero, and the estimated transition probabilities are uniform.
As the program goes along taking actions, it will gather observations on transitions and rewards, which it can use to get a better estimate of the MDP model.
Store the state transitions and reward observations each time, and update the model and value function/policy only periodically.
Each time a failure occurs, re-estimate the transition probabilities and rewards as the average of the observed values (if any).
Repeat previous steps until convergence (once several consecutive attempts (defined by the parameter NO LEARNING THRESHOLD) to solve Bellman’s equation all converge in the first iteration since this implies that the estimated model has stopped changing significantly).

Results

Author

Avrum Noor

Acknowledgements

Anand Avati

Stanford Machine Learning Coursework

avrumnoor / pole-balancer Goto Github PK

pole-balancer's Introduction

Pole Balancer

About Pole Balancer

Requirements

Getting Started

Model

Program Objective

Algorithm

Results

Author

Acknowledgements

pole-balancer's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs