harbecke / hexhex Goto Github PK

AlphaGo Zero adaptation for Hex

License: GNU General Public License v3.0

Python 74.06% Jupyter Notebook 2.63% HTML 1.62% CSS 1.25% JavaScript 20.20% Shell 0.24%

hexhex's Introduction

👡 Welcome 🧑‍💻

I'm David, Researcher and PhD Student at German Research Center for Artificial Intelligence.

I am interested in Deep Learning, Natural Language Processing and Explainability. Currently, I am branching out to Information Extraction. With friends I have written a Reinforcement Learning Agent that plays HEX. Code for this project can be found here

hexhex's People

Contributors

Stargazers

Watchers

Forkers

stephensong ddyer0 pinaki19 icloudx

hexhex's Issues

Add coach mode

The coach mode suggests stronger alternative moves based on neural network output.

when all values of the output tensor are 0 there is a division by 0

unfortunately this just gives a
RuntimeError: CUDA error: device-side assert triggered
Assertion THCNumerics<T>::ge(val, zero) failed

and not for the line where the division by 0 is :(

saving and loading model and optimizer is too intricate

there should be functions for loading and saving models and optimizers for all cases, no need to write torch.save in any code except these functions

adapt code for pytorch 1.2

pytorch 1.2 was released. some of the relevant features for us are:

tensorboard is natively supported: from torch.utils.tensorboard import SummaryWriter
AdamW optimizer is included (an Adam Optimizer that works with weight decay)
there is a nn.Transformer module, if we want to experiment with transformer models

interactive from given starting position

experimental model evaluation is complicated by the model only choosing the best starting move(s).

validation data is not true validation data

Since validation data is generated every epoch for the training set, it can be seen by the model in previous iterations. Furthermore positions from the same game are very related. We should find a solution to select only a subset of positions from one game or create games only for validation to make sure the validation loss is different from the training loss.

creating elo ratings takes too long for many models

hex/elo/elo/create_ratings takes approximately n²/10000 seconds to calculate for n models on the server. this is way too much for long trainings and should rather be an approximation than quadratic time.

implement noise in MCTS move selection

TensorboardX doesn't refresh data

@cleeff I think the refreshing problem with TensorboardX is because we don't flush the data to disk. This can be done with writer.flush() or writer.close(). I prefer the second method, as it creates a new events file after e.g. every epoch and feels cleaner to me than writing everything to one file.

change logic for retrieving position data for gui

I wouldn't introduce a legacy board tensor but rather have board provide a function get_owner(position) which knows about the roatation logic.

Originally posted by @PascalCremer in #18 (comment)

simple tests

It would be good to have very basic tests which ensure that all functions can be called as expected and no partial refactoring broke anything.

These tests could depend on the sample_config.ini which would make sure that it is also up to date.

issue with temperature in evaluate_two_models

evaluate_two_models should give sweep-scores if temperature=0, because there is no noise and thus every move is deterministic

rename config.ini to sample_config.ini

Each user uses its own version of config.ini which should be in .gitignore. sample_config.ini is useful nevertheless.

use board symmetry

add Bayesian Optimization for hyperparameter search

We are still pretty much clueless about how optimal our parameters are. I am working on adding ax. Here is a list of possible parameters for optimization:

[TRAIN]
batch_size
learning_rate
epochs
weight_decay

[CREATE_DATA]
train_samples_per_model
temperature
temperature_decay
gamma

[REPEATED SELF TRAINING]
num_data_models

creation of the not already existing puzzle data does not work

"The creation of the not already existing puzzle data does not work for me. The problem seems to be that config cannot be deepcopied. When adding boardsize to config instead of puzzle_config everything seems to work. I do not know how to properly fix this."

Originally posted by @simonant in #22 (comment)

I created an issue because the commit f49fe5c is outside the pull request. So you have to write board_size hard into the train.py script? What error does the deepcopy throw?

data creation slows down dramatically

I added logger.info(f"{len(board_states)}") after https://github.com/harbecke/hex/blob/c82fb86594e90d4d5acb4bf8b6a185bf3dc839e2/hex/creation/create_data.py#L34.

Results for batch_size=32 and samples_per_file = 40000 are:

2019-07-18 19:46:47,530 - INFO - === creating data from self play ===
2019-07-18 19:46:49,915 - INFO - 3603
2019-07-18 19:46:52,931 - INFO - 3549
2019-07-18 19:46:58,871 - INFO - 3558
2019-07-18 19:47:06,903 - INFO - 3519
2019-07-18 19:47:16,569 - INFO - 3482
2019-07-18 19:47:28,136 - INFO - 3580
2019-07-18 19:48:16,708 - INFO - 3548
2019-07-18 19:49:23,159 - INFO - 3571
2019-07-18 19:50:48,866 - INFO - 3562
2019-07-18 19:52:26,660 - INFO - 3465
2019-07-18 19:54:13,377 - INFO - 3559
2019-07-18 19:56:16,024 - INFO - 3487

This is probably due to memory allocation for concatenation of tensors.

introduce learning rate scheduler

Early Bayesian optimization results suggest that optimal parameters are heavily dependent on the training time. However, for most parameters it should be fine to set them to a long training optimum and suffer a small training time increase. This may not be the case for learning_rate.

model and optimizer shouldn't be saved together

there seem to be two optimizer parameters for every model parameter, thus model files become thrice the size with optimizer attached
furthermore saving optimizer parameters after training should be optional

Implement Monte Carlo Tree Search

One can implement MCTS based on the value model to improve model performance. This is not intended to be used during data generation due to the high cost of computation.

create puzzle file automatically for repeated_self_training

If the respective puzzle file doesn't exist, create it automatically with num_samples and batch_size from config.

remove / rewrite switch rule

the rule is dificult to support in interactive mode and makes things more complex in general. I dont think it contributes greatly to the appeal of the ai experiment.

suggestion: simply remove

simple self improvement flow

This is a feature proposal for the following flow.

starting with an initial random model do repeatitively

generate N data points using self play of the last best agent
train a new agent agent on this data
if it beats the old agent in 55% of k games, accept it as the new champion, otherwise train it on more data

harbecke / hexhex Goto Github PK

hexhex's Introduction

👡 Welcome 🧑‍💻

hexhex's People

Contributors

Stargazers

Watchers

Forkers

hexhex's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs