GithubHelp home page GithubHelp logo

oneupwallstreet / td-gammon Goto Github PK

View Code? Open in Web Editor NEW
7.0 1.0 1.0 1.36 MB

Implementation of TD Gammon algorithm by Gerald Tesauro at IBM's Thomas J. Watson Research Center in Python.

Python 100.00%
machine-learning artificial-intelligence deep-learning nueral-networks python pytorch numpy reinforcement-learning backgammon self-play td-gammon

td-gammon's Introduction

TD-Gammon

This is the implementation of the original TD Gammon algorithm by Gerald Tesauro at IBM's Thomas J. Watson Research Center. The Agent uses Self-Play Reinforcement learning, and a non linear function approximater I.e. a Neural Network. I have trained the program over 30,000 games and used a single hidden layer and 80 units. The original program used 2-ply search to select it's moves but since the program is currently only capable of running on a CPU performing a search with depth greater than 1 is far too expensive as it takes a long time to select a movie (even with alpha-beta pruning).

TD-Gammon Learning Methodology

Technologies

Project is created with:

  • Python 3
  • Pytorch
  • Numpy

Training Files

Files Required to train agent:

  • env.py
  • Agent.py
  • model.py

Testing Files

Files to test agent:

  • test_agent.py
  • test.py
  • Play.py (Used to play against human)

Weights File:

Weights to load in neural network:

  • model.pth (trained over 30,000 games of self-play)
  • model_weak.pth (trained over 3000 games of self-play)

Setup

Download by cloning this repository and store all the files in the same folder and run the following commands:

Start the training (Enter number of iterations i.e. games of self-play RL) $ python model.py

To check if agent is learning, run this command which pits the agent against a player who always selects the first action amongst all the possible actions.

$ python test.py

To play against a slightly better agent let it play against a better agent trained with around 3000 games.

$ python test_agent.py

  • Note - To run smoothly move all the files into a single folders i.e. remove the Train, Test, Play.

Images

  • In Play.py the program outputs a graph which predicts the win rate at that ply.

WinRate

  • Current interface in Play.py

Output

To-Do

  • Implement a GUI, currently its just a linear representation of the board with numbers between 0-27
  • Enable GPU support as currently training takes a long time.

References

td-gammon's People

Contributors

oneupwallstreet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

nsragow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.