GithubHelp home page GithubHelp logo

harbecke / hexhex Goto Github PK

View Code? Open in Web Editor NEW
20.0 20.0 4.0 23.62 MB

AlphaGo Zero adaptation for Hex

License: GNU General Public License v3.0

Python 74.06% Jupyter Notebook 2.63% HTML 1.62% CSS 1.25% JavaScript 20.20% Shell 0.24%

hexhex's Introduction

hexhex's People

Contributors

cleeff avatar harbecke avatar pascalcremer avatar simonant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hexhex's Issues

Add coach mode

The coach mode suggests stronger alternative moves based on neural network output.

adapt code for pytorch 1.2

pytorch 1.2 was released. some of the relevant features for us are:

  • tensorboard is natively supported: from torch.utils.tensorboard import SummaryWriter
  • AdamW optimizer is included (an Adam Optimizer that works with weight decay)
  • there is a nn.Transformer module, if we want to experiment with transformer models

validation data is not true validation data

Since validation data is generated every epoch for the training set, it can be seen by the model in previous iterations. Furthermore positions from the same game are very related. We should find a solution to select only a subset of positions from one game or create games only for validation to make sure the validation loss is different from the training loss.

creating elo ratings takes too long for many models

hex/elo/elo/create_ratings takes approximately nยฒ/10000 seconds to calculate for n models on the server. this is way too much for long trainings and should rather be an approximation than quadratic time.

TensorboardX doesn't refresh data

@cleeff I think the refreshing problem with TensorboardX is because we don't flush the data to disk. This can be done with writer.flush() or writer.close(). I prefer the second method, as it creates a new events file after e.g. every epoch and feels cleaner to me than writing everything to one file.

simple tests

It would be good to have very basic tests which ensure that all functions can be called as expected and no partial refactoring broke anything.

These tests could depend on the sample_config.ini which would make sure that it is also up to date.

add Bayesian Optimization for hyperparameter search

We are still pretty much clueless about how optimal our parameters are. I am working on adding ax. Here is a list of possible parameters for optimization:

[TRAIN]
batch_size
learning_rate
epochs
weight_decay

[CREATE_DATA]
train_samples_per_model
temperature
temperature_decay
gamma

[REPEATED SELF TRAINING]
num_data_models

creation of the not already existing puzzle data does not work

"The creation of the not already existing puzzle data does not work for me. The problem seems to be that config cannot be deepcopied. When adding boardsize to config instead of puzzle_config everything seems to work. I do not know how to properly fix this."

Originally posted by @simonant in #22 (comment)

I created an issue because the commit f49fe5c is outside the pull request. So you have to write board_size hard into the train.py script? What error does the deepcopy throw?

data creation slows down dramatically

I added logger.info(f"{len(board_states)}") after https://github.com/harbecke/hex/blob/c82fb86594e90d4d5acb4bf8b6a185bf3dc839e2/hex/creation/create_data.py#L34.

Results for batch_size=32 and samples_per_file = 40000 are:

2019-07-18 19:46:47,530 - INFO - === creating data from self play ===
2019-07-18 19:46:49,915 - INFO - 3603
2019-07-18 19:46:52,931 - INFO - 3549
2019-07-18 19:46:58,871 - INFO - 3558
2019-07-18 19:47:06,903 - INFO - 3519
2019-07-18 19:47:16,569 - INFO - 3482
2019-07-18 19:47:28,136 - INFO - 3580
2019-07-18 19:48:16,708 - INFO - 3548
2019-07-18 19:49:23,159 - INFO - 3571
2019-07-18 19:50:48,866 - INFO - 3562
2019-07-18 19:52:26,660 - INFO - 3465
2019-07-18 19:54:13,377 - INFO - 3559
2019-07-18 19:56:16,024 - INFO - 3487

This is probably due to memory allocation for concatenation of tensors.

introduce learning rate scheduler

Early Bayesian optimization results suggest that optimal parameters are heavily dependent on the training time. However, for most parameters it should be fine to set them to a long training optimum and suffer a small training time increase. This may not be the case for learning_rate.

model and optimizer shouldn't be saved together

there seem to be two optimizer parameters for every model parameter, thus model files become thrice the size with optimizer attached
furthermore saving optimizer parameters after training should be optional

Implement Monte Carlo Tree Search

One can implement MCTS based on the value model to improve model performance. This is not intended to be used during data generation due to the high cost of computation.

remove / rewrite switch rule

the rule is dificult to support in interactive mode and makes things more complex in general. I dont think it contributes greatly to the appeal of the ai experiment.

suggestion: simply remove

simple self improvement flow

This is a feature proposal for the following flow.

starting with an initial random model do repeatitively

  • generate N data points using self play of the last best agent
  • train a new agent agent on this data
  • if it beats the old agent in 55% of k games, accept it as the new champion, otherwise train it on more data

data is in overcomplicated format

board_tensor has three channels, where one or two should suffice. the tensor should be transposed after a player made a move, such that the model only plays the board in one direction. This removes the need for VerticalWrapperModel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.