GithubHelp home page GithubHelp logo

trigrass2 / reversi-alpha-zero Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mokemokechicken/reversi-alpha-zero

0.0 1.0 0.0 119 KB

Reversi reinforcement learning by AlphaGo Zero methods.

License: MIT License

Shell 0.58% Jupyter Notebook 15.14% Python 84.27%

reversi-alpha-zero's Introduction

About

Reversi reinforcement learning by AlphaGo Zero methods.

Environment

  • Python 3.6.3
  • tensorflow-gpu: 1.3.0
    • tensorflow==1.3.0 is also ok, but very slow. When play_gui, tensorflow(cpu) is enough speed.
  • Keras: 2.0.8

Modules

Reinforcement Learning

This AlphaGo Zero implementation consists of three worker self, opt and eval.

  • self is Self-Play to generate training data by self-play using BestModel.
  • opt is Trainer to train model, and generate next-generation models.
  • eval is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.

Evaluation

For evaluation, you can play reversi with the BestModel.

  • play_gui is Play Game vs BestModel using wxPython.

Data

  • data/model/model_best_*: BestModel.
  • data/model/next_generation/*: next-generation models.
  • data/play_data/play_*.json: generated training data.
  • logs/main.log: log file.

If you want to train the model from the beginning, delete the above directories.

How to use

Setup

install libraries

pip install -r requirements.txt

If you want use GPU,

pip install tensorflow-gpu

set environment variables

Create .env file and write this.

KERAS_BACKEND=tensorflow

Download Trained BestModel(If needed)

Download trained BestModel for example.

sh ./download_best_model.sh

Basic Usages

For training model, execute Self-Play, Trainer and Evaluator.

Self-Play

python src/reversi_zero/run.py self

When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.

options

  • --new: create new BestModel
  • --type mini: use mini config for testing, (see src/reversi_zero/configs/mini.py)

Trainer

python src/reversi_zero/run.py opt

When executed, Training will start. A base model will be loaded from latest saved next-generation model. If not existed, BestModel is used. Trained model will be saved every 2000 steps(mini-batch) after epoch.

options

  • --type mini: use mini config for testing, (see src/reversi_zero/configs/mini.py)
  • --total-step: specify total step(mini-batch) numbers. The total step affects learning rate of training.

Evaluator

python src/reversi_zero/run.py eval

When executed, Evaluation will start. It evaluates BestModel and the oldest next-generation model by playing about 200 games. If next-generation model wins, it becomes BestModel.

options

  • --type mini: use mini config for testing, (see src/reversi_zero/configs/mini.py)

Play Game

python src/reversi_zero/run.py play_gui

When executed, ordinary reversi board will be displayed and you can play against BestModel. After BestModel moves, numbers are displayed on the board.

  • Top left numbers(1) mean 'Visit Count (=N(s,a))' of the last search.
  • Bottom left numbers(2) mean 'Q Value (=Q(s,a)) on AI side' of the last state and move. The Q values are multiplied by 100.

Note: Mac pyenv environment

play_gui uses wxPython. It can not execute if your python environment is built without Framework. Try following pyenv install option.

env PYTHON_CONFIGURE_OPTS="--enable-framework" pyenv install 3.6.3

Tips and Memo

GPU Memory

In my environment of GeForce GTX 1080, memory is about 8GB, so sometimes lack of memory happen. Usually the lack of memory cause warnings, not error. If error happens, try to change per_process_gpu_memory_fraction in src/worker/{evaluate.py,optimize.py,self_play.py},

tf_util.set_session_config(per_process_gpu_memory_fraction=0.2)

Less batch_size will reduce memory usage of opt. Try to change TrainerConfig#batch_size in NormalConfig.

Training Speed

  • CPU: 8 core i7-7700K CPU @ 4.20GHz
  • GPU: GeForce GTX 1080
  • 1 game in Self-Play: about 47 sec.
  • 1 game in Evaluation: about 50 sec.
  • 1 step(mini-batch, batch size=512) in Training: about 2.3 sec.

reversi-alpha-zero's People

Contributors

mokemokechicken avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.