GithubHelp home page GithubHelp logo

davinwang / c2tutorialsgo Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 3.0 1.93 MB

This is a tutorial written for Caffe2 which mocks google AlphaGo Fan and AlphaGo Zero.

Jupyter Notebook 56.19% Python 43.81%
caffe2 alphago alphago-zero

c2tutorialsgo's Introduction

C2TutorialsGo

This is a tutorial written for Caffe2 which mocks google AlphaGo Fan and AlphaGO Zero. v0.2.0 is released, with ResNet based AlphaGo Zero model.

Installation

This program by so far relies on RocAlphaGo Cython implementation for feature preprocessing and Go rules. Cython compilation can be done by running shell command python setup.py build_ext --inplace.

New updates from AlphaGo Zero

Preprocess

   The Go game dataset are usually stored in SGF file format. We need to transform SGF file into Caffe2 Tensor. AlphaGo Zero requires 17 feature planes of 19x19 size, which does not include 'human knowledge' like Liberties or Escape.
   This preprocess program still relies on RocAlphaGo for Go rules, but no more dependencies for feature generation. I'm looking for a better(more accurate) Go rule implementation which can support Chinese/Korean/Japanese Go rules and different Komi, please feel free to recommend.

Dual Policy and Value network with ResNet

   The Supervised Learning program is used to evaluate whether the network architecture is correct. Due to a bug in Caffe2 spatial_BN op, the program cannot resume from previous run. Since each epoch requires 200~250 GPU hours, thus it's not viable to run it on personal computer.

epochs LR loss train/test accu epochs LR loss train/test accu
0.2 0.1 - - / 0.1698 11 /
0.4 / 12 /
0.6 / 13 /
0.8 / 14 /
1 / 15 /
6 / 16 /
7 / 17 /
8     / 18           /
9 / 19 /
10 / * 0.60/0.57(alphago zero)

Reinforced Learning pipline

On going. This will be different from AlphaGo Fan in may ways:
1. Always use the best primary player to generate data.
2. Before each move, do wide search to obtain better distribution than Policy predict.
3. MCTS only relies on Policy and Value network, no more Rollout.
4. more detail will be added during implementation

About AlphaGo Fan

Preprocess

The Go game dataset are usually stored in SGF file format. We need to transform SGF file into Caffe2 Tensor which are 48 feature planes of 19x19 size, according to DeepMind.
   The preprocess program relies on Cython implementation of RocAlphaGo project for Go rules and feature plane generation. It is estimated to take 60 CPU hours for preprocess complete KGS data set.

Supervised Learning - Policy Network

According to DeepMind, AlphaGo can achieve 55.4% test accuracy after 20 epochs training. Test set is the first 1 million steps. i.e. KGS2004. The speed of each prediction is 4.8ms (on Kepler K40 GPU).
This program achieves 52.83% by 11 epochs so far. Test set is the latest 1M steps. i.e.KGS201705-KGS201709. It also achieved speed of around 4.5ms for each single prediction (on Maxwell GTX980m GPU). Therefore each epochs takes ~40 GPU hours. Running on GPU mode is around 100x faster than CPU mode.

epochs LR loss train/test accu epochs LR loss train/test accu
1 0.003 1.895 0.4800 / 0.4724 11 0.0002 1.5680 0.5416 / 0.5283
2 0.003 1.7782 0.5024 / 0.4912 12 0.0001 1.5639 0.5424 / 0.5291
3 0.002 1.7110 0.5157 / 0.5029 13 /
4 0.002 1.6803 0.5217 / 0.5079 14 /
5 0.002 1.6567 - / 0.5119 15 /
6 0.002 1.6376 0.5302 / 0.5146 16 /
7 0.001 1.6022 0.5377 / 0.5202 17 /
8     0.0005 1.5782 - / 0.5273 18           /
9 0.0005 1.6039 0.5450 / 0.5261 19 /
10 0.0002 1.5697 0.5447 / 0.5281 20 0.569/0.554(alphago)

The training accuracy record of epoch 5/8 were lost.
Intel Broadwell CPU can provide around 30 GFlops compute power per core. Nvidia Kepler K40 and Maxwell GTX980m GPU can provide around 3 TFlops compute power.

Reinforced Learning - Policy Network

The RL program is runnable now but still under evaluation. It also relies on RocAlphaGo project for Go rules by now. A new program is under construction to implement first 12 features in GPU mode to replace RocAlphaGo. It is believed to be at least 10x faster than RocAlphaGo(python implementation).

Supervised Learning - Value Network

tbd. Depends on Reinforced Learning to generate 30 millions games. And pick 1 state of each game.

Supervised Learning - Fast Rollout

tbd. AlphaGo achieved 24.2% of accuracy and 2us of speed.

MTCS

tbd. Depends on Fast Rollout.

c2tutorialsgo's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.