GithubHelp home page GithubHelp logo

shyamalschandra / lczero-training Goto Github PK

View Code? Open in Web Editor NEW

This project forked from leelachesszero/lczero-training

0.0 2.0 0.0 285 KB

For code etc relating to the network training process.

Shell 5.37% Python 94.63%

lczero-training's Introduction

Training

The training pipeline resides in tf, this requires tensorflow running on linux (Ubuntu 16.04 in this case). (It can be made to work on windows too, but it takes more effort.)

Installation

Install the requirements under tf/requirements.txt. And call ./init.sh to compile the protobuf files.

Data preparation

In order to start a training session you first need to download trainingdata from http://lczero.org/training_data. This data is packed in tar.gz balls each containing 10'000 games or chunks as we call them. Preparing data requires the following steps:

tar -xzf games11160000.tar.gz
ls training.* | parallel gzip {}

This repacks each chunk into a gzipped file ready to be parsed by the training pipeline. Note that the parallel command uses all your cores and can be installed with apt-get install parallel.

Training pipeline

Now that the data is in the right format one can configure a training pipeline. This configuration is achieved through a yaml file, see training/tf/configs/example.yaml:

%YAML 1.2
---
name: 'kb1-64x6'                       # ideally no spaces
gpu: 0                                 # gpu id to process on

dataset: 
  num_chunks: 100000                   # newest nof chunks to parse
  train_ratio: 0.90                    # trainingset ratio
  # For separated test and train data.
  input_train: '/path/to/chunks/*/draw/' # supports glob
  input_test: '/path/to/chunks/*/draw/'  # supports glob
  # For a one-shot run with all data in one directory.
  # input: '/path/to/chunks/*/draw/'

training:
    batch_size: 2048                   # training batch
    total_steps: 140000                # terminate after these steps
    test_steps: 2000                   # eval test set values after this many steps
    # checkpoint_steps: 10000          # optional frequency for checkpointing before finish
    shuffle_size: 524288               # size of the shuffle buffer
    lr_values:                         # list of learning rates
        - 0.02
        - 0.002
        - 0.0005
    lr_boundaries:                     # list of boundaries
        - 100000
        - 130000
    policy_loss_weight: 1.0            # weight of policy loss
    value_loss_weight: 1.0             # weight of value loss
    path: '/path/to/store/networks'    # network storage dir

model:
  filters: 64
  residual_blocks: 6
...

The configuration is pretty self explanatory, if you're new to training I suggest looking at the machine learning glossary by google. Now you can invoke training with the following command:

./train.py --cfg configs/example.yaml --output /tmp/mymodel.txt

This will initialize the pipeline and start training a new neural network. You can view progress by invoking tensorboard:

tensorboard --logdir leelalogs

If you now point your browser at localhost:6006 you'll see the trainingprogress as the trainingsteps pass by. Have fun!

Restoring models

The training pipeline will automatically restore from a previous model if it exists in your training:path as configured by your yaml config. For initializing from a raw weights.txt file you can use training/tf/net_to_model.py, this will create a checkpoint for you.

Supervised training

Generating trainingdata from pgn files is currently broken and has low priority, feel free to create a PR.

lczero-training's People

Contributors

error323 avatar tilps avatar glinscott avatar cyanogenoid avatar killerducky avatar ttl avatar benediamond avatar kiudee avatar scchess avatar borg323 avatar

Watchers

Shyamal Suhana Chandra avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.