poja / cattus Goto Github PK

Cattus is a chess engine based on DeepMind AlphaZero paper, written in Rust. It uses a neural network to evaluate positions, and MCTS as a search algorithm.

Rust 82.14% Python 17.86%

ai artificial-intelligence chess

cattus's People

Contributors

Stargazers

Watchers

cattus's Issues

Remove "model_id"

@barakugav
It's not so important - I'm just (ab)using this "Issues" platform to discuss something in the code:

What's the goal of the model_id function? Can I remove it and just name the directories with the self-play data according to the time they were created? Or if we want to label them by the model that is used, can we use the date of the model creation as its "id"? Why make this complicated hash function?

Clean messy code constants

Like hard-coded paths

Add CPU arg to all components

CPU/GPU

Train: add support for DEBUG or RELEASE configuration

"cargo run" always run debug by default

MCTS: adjustable temperature for random exploration in top most level

MCTS: reuse search tree between moves

Create evaluation mechanism

We create and trains models, we should be able to evalute them during the trainling loop.
This is required to understand the learning rate

TODO best move, choose randomly if some children on tie

Training batches: use mini batches randomly selected from latest games

mini-batches of 2,048 positions each, sampled uniformly at random from all positions of the most recent 500,000 games

MCTS: Implement PUCT

MCTS bins: add argument for network type

Choose some code review mechanism

Maybe githubpull requests with github CLI

Correctly update score_w when backpropagating

From Wikipedia:
If white loses the simulation, all nodes along the selection incremented their simulation count (the denominator), but among them only the black nodes were credited with wins (the numerator). If instead white wins, all nodes along the selection would still increment their simulation count, but among them only the white nodes would be credited with wins. In games where draws are possible, a draw causes the numerator for both black and white to be incremented by 0.5 and the denominator by 1.

Training: add L2 regularization factor to config

currently fixed to '0.5 * (0.0001)'. add to config and propagate to model creation

Serializer: set illegal moves to have -1 probability

Use current turn as an input to the model

Model comparison mechanism

Evaluation should consists of 400 games. If the new player wins >55% then it becomes the best player

Training: collect metrics

Model.fit return some sort of history containing metrics which might be important information

make get_legal_moves fast

Currently we iterate over all tiles for each call, we can maintain some data structure within the position to iterate over these faster.
Need to measure the performance bottleneck before optimizing this.

Resignation: add resignation mechanism to reduce compute time

Score range

The network output range is [-1,1] while our current convension in the MCTS implementation is [0,1], need to aligned the two.
I think [-1,1] is more intuitive, in which case we should change the draw value to 0 and inverte scores by "-score" rather than "1-score"

Utils: move MCTS to a utils folder outside of hex

Position encoding: use planes

bitmap of planes

Network: implement a simple network with tensor flow

input layer, output layers, no hidden layers

MCTS: use PUCT alg

see https://docs.google.com/document/d/1QFNRRoPtywh1sWIAnhjxpQQmwHu7MICuaK1KHbNEvao exact MCTS selection calculation

MCTS: store evaluation values (sim count, score) on edges instead of vertices

MCTS: expand all children, simulate only one

currently we are expanding all children and simulating them all, need to simulate only one.
In selection we assume if a node have any child that all its children exists and simulated, after this change need to consider non simulated children first.

Logging: find a logging library and add log prints

https://github.com/rust-lang/log

Choose chess library and implement wrapper interface

https://docs.rs/chess/ is a nice option for a chess library, supporting legal moves query, game status (checkmate, stalemate, ect) detection and so on.
We should write a wrapper to the library we choose, enabling our generic MCTS implementation to use it.

Reuse edges instead of recreating inside get_winner

Find out the algo used in the site we play Hex

MCTS: add move/time limit to player

Create a loop of data-generation and model training

Training: learn from random games produced by many models

Currently each training iteration learns from the self play games of the last model only.
We should learn from games produced by all models, and always take X random games from the Y latest ones.
This will allow us to reduce the self play games without damaging the training.

Use the same directory for all self play games.
Add X,Y to confiuration file.
Set 'epoches=1'

MCTS: use the network per move probability output as initial node score

Currently, if no simulation were performed for a leaf, the function calc_selection_heuristic return f32::MAX.
Instead, in the first type a node is expanded, the two-headed network should be run, resulting in a scalar value [-1,1] for propagation and per-more probability which we should assign to the new expanded leafs and us in calc_selection_heuristic instead of f32::MAX.

Create infrastructure for Hex players comparation

Implement tic-tac-toe as an example game and train to perfection

Two-headed Network: per-move probability along with scalar as output

Instead of only [-1,1] score, the network should contain two "heads", the first outputing a scalar [-1,1] score (which is the "simulation" value) and the second outputing a per-move probability values

Setup and build project on MacOS

Play MCTS with network instead of full game simulation

MCTS: use our own tree implementation instead of petgraph

We will suffer from the overhead of accessing an element in the petgraph, our own tree implementation can use pointers.
Need to check how much pain this requires in Rust and if this is worth it.
This is a task for the future, we are good for now.

Training: fix accuracy measurement

currently using kl-divergence which doesn't realy match the cross entropy loss

Model training loss: add normalization factor

Make position is_over, get_winner fast

Fully understand MCTS

Fix and validate timeout handling and process leaking in UXI tester and UXI players executables

Training: use multithreading in model.fit

Solve Hex! and figure out hyperparams

Train a two headed network until the engine wins against us consistently.
Understand how long such training requires and with what hyper parameters?:
learning rate
games generated in self play
temperature for softmax, for how many moves should we use softmax
model structure
should we take a single position from each game or more

a lot of this can be taken from lc0

poja / cattus Goto Github PK

cattus's People

Contributors

Stargazers

Watchers

cattus's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs