GithubHelp home page GithubHelp logo

beniz / caffe_alphazero Goto Github PK

View Code? Open in Web Editor NEW

This project forked from adepierre/caffe_alphazero

0.0 3.0 0.0 398 KB

Implementation of Deepmind's AlphaZero algorithm with Caffe and C++

License: GNU Affero General Public License v3.0

CMake 1.76% C++ 97.29% Batchfile 0.95%

caffe_alphazero's Introduction

Caffe_AlphaZero: A Caffe/C++ implementation of Deepmind's AlphaZero Algorithm used to learn Go, Chess and Shogi

This is my implementation of AlphaZero algorithm with Caffe. I tried to follow as much as possible the paper. Differences with my code are listed in section Differences. The core algorithm is templated and separated from the specific game rules (AlphaZero folder), so this code could theoretically be trained on any game (like Go or Chess). However, as I am far from having enough computing power, only two simple games are implemented as examples: Tic Tac Toe and Connect Four, both with variable size boards.

Dependencies

This code relies on Caffe main branch.

Boost is also used, but you should already have it installed if you have Caffe.

I have only tested this code on Windows with Visual Studio 2013 and 2015, but it should also be able to run on Linux and Mac Os provided that Caffe is correctly installed.

Building

The project uses CMake as a building tool. Once you have correctly built and compiled the code, you should be able to launch the program for both training and testing.

Testing

There are four testing modes available:

  • Human test: N games between the algorithm and the user
  • Self test: N games in which the algorithm plays both players
  • Compare test: N games played between two models to determine which one is the best
  • MCTS test: N games played against a MCTS algorithm using random playouts and uniform probability policy instead of neural network evaluation
  • Random test: The opponent is a random player

An example of a command line used to perform tests can be found in launch_files/test_tic_tac.bat as well as a trained model for Tic Tac Toe game on a 6x6 board. This training was made without batch normalization layers because I found out that they slow down the training process and do not help improving performances very much (at least for these games).

Training

An example of a command line used to train a network can be found in launch_files/train_tic_tac.bat. The training process can be stopped and restarted at any time using .solverstate files. However, the memory buffer is not saved and has to be filled again.

Implementing a new game

To train the algorithm on a new game, you just have to create a YourGameState.h file describing the rules (following the AlphaZero/GenericState.h example) and a main.cpp file to launch the training. You should be able to follow the sections Training and Testing once your game is compiled.

Differences with the original paper

This implementation differs from Deepmind's explanations in (at least) these points:

  • As the implemented games are much simpler than Go or Chess, the network's architecture is lighter, having only one residual block instead of 40 in the paper and 64 output channels instead of 256 for the convolution layers
  • ReLU layers are replaced with leaky ReLU
  • MCTS is not parallelized
  • During testing the first action is selected randomly to ensure a bit of diversity in the games
  • During self-playing the temperature is always set to 1 (in the paper it is set to 1 only for the 30 first moves and then set to 0)

TODO

  • Add a Elo-like rating system to monitor training improvement
  • Separate self-playing and training in two threads running asynchronously (for example two separated networks with periodic updates)
  • Add the possibility to use symmetry
  • Parallelize MCTS and perform evaluation in batch
  • Add time limit instead of iteration limit when using MCTS for testing
  • Save memory buffer when the training process is stopped

License

GNU AGPLv3

caffe_alphazero's People

Contributors

adepierre avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.