GithubHelp home page GithubHelp logo

orez-ahpla's Introduction

orez-ahpla

Alpha Zero algorithm applied to the game Othello (aka Reversi).

Original paper: https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ

MCTS framework from: https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/

Bitboard algorithm to find possible moves from: http://eprints.qut.edu.au/85005/1/__staffhome.qut.edu.au_staffgroupm$_meaton_Desktop_bits-7.pdf

$_ Bitboard algorithm to reflect over diagonals from: https://www.chessprogramming.org/index.php?title=Flipping_Mirroring_and_Rotating&mobileaction=toggle_view_mobile#FlipabouttheDiagonal

Neural network architecture adapted from: https://web.stanford.edu/~surag/posts/alphazero.html

Changelog

0.1.0 - 2/1/19

Added

  • Basic implementation of mcts for othello.
  • Basic implementation of mcts for ttt (in /ttt).

0.1.1 - 2/1/19

Added

  • Links to sources used.
  • Punctuation to this log.

Changed

  • Moved ttt mcts out of /ttt.

Removed

  • /ttt.

0.2.0 - 2/2/19

Added

  • Bitboard implementation of Othello 1 (show_poss.py, tested with AI Grader).
  • 64-char string to bitboard conversion methods.
  • Display methods for bitboards.

0.3.0 - 2/3/19

Added

  • Bitboard implementation of Othello 3 (make_moves.py, tested with AI Grader).
  • Bitboard to str conversion methods.
  • Bitboard implementation of get_score.

Changed

  • 0.2.0 README section wording.
  • Streamlined get_poss to match new place method (in make_moves.py).

0.3.1 - 2/3/19

Changed

  • Turned start() and s_board_to_bitboard() into one-liners.

Removed

  • Debug print/displays in show_poss and make_moves.

0.4.0 - 2/3/19

Added

  • Bitboard implementation of Othello 6 (terminal_alphabeta.py, tested with AI Grader, 93.5).
  • State recursion for get_poss.
  • To-dos.

Changed

  • Made bit_moves_to_s_moves() return a set rather than a list.
  • get_score() now returns score as a single +/- int from perspective of token rather than a pair of ints.

0.4.1 - 2/3/19

Changed

  • bit_moves_to_s_moves renamed as bit_poss_to_moves.
  • bit_poss_to_moves now integrated into get_poss.

Removed

  • To-dos (impractical).

0.4.2 - 2/3/19

Added

  • Speed tested bitboard get_poss vs string get_poss (bitwise is faster by a factor of 5, results in speed_test.py).

0.4.3 - 2/3/19

Added

  • Bitboards to check corners and center 4 squares added.

0.5.0 - 2/3/19

Added

  • Bitboard implementation of MCTS (doubled the amount of positions searched in 5 s).

Changed

  • Tweaked cutoff val from 0.01 to 0.1 in both MCTS searches.
  • Fixed small bug in make_moves.py.

0.6.0 - 2/6/19

Added

  • Bitboard reflections across diagonals implemented.
  • Reflections added to state recursion in get_poss, unsure of speed upgrade.

Removed

  • To-dos.

0.6.1 - 2/6/19

Removed

  • Bitboard reflections removed from state recursion dicts (was slower).

0.7.0 - 2/9/19

Changed

  • Organized files to start implementing neural net.
  • Fixed UCB-selection in bit_mcts.py.

0.8.0 - 2/9/19

Added

  • New folder (/nnet) for neural net implementation.
  • Modified MCTS to use policy/eval funcs.
  • Separated game and display functions.

Changed

  • (ONLY IN /NNET)
  • Game methods take a state arg instead of separate board, token args
  • Get_poss returns list

0.8.1 - 2/9/19

Changed

  • (ONLY IN /NNET)
  • Shortened bit_poss_to_moves.
  • Added stochastic play option for early game mcts.

0.8.2 - 2/10/19

Changed

  • Changed is_terminal to return -1, 0, or +1.

0.9.0 - 2/11/19

Added

  • A working neural net for policy/eval funcs (neural_net.py)–currently assesses positions in conjunction with MCTS, albeit randomly.
  • To-do.

1.0.0 - 2/13/19

Added

  • Methods for saving/loading weights of nnet.
  • Self_play.py, which handles the self_play for training the nnet.
  • Methods for saving/loading training_examples from self play.
  • Verbose/training flags for play_game (maybe this method is overloaded...).

Changed

  • Made get_probs return a list of length 65 for probs.
  • Renamed mcts to player (more accurate).
  • Passes now indicated by 65 rather than -1 (to match prob vector).
  • Combined get_probs with get_best_move for clarity.
  • Merged display into game (no reason for sep file).
  • Reflect funcs now take a state arg rather than a single bitboard arg.

Removed

  • To-dos.

1.0.1 - 2/14/19

Changed

  • To-dos.

1.0.2 - 2/17/19

Added

  • Reflecting boards/pis for training data.

1.0.3 - 2/17/19

Changed

  • run_adversarial_episode() returns win_pct and win/loss/tie record

1.0.4 - 2/17/19

Added

  • latest_weights.h5 added with git LFS.

Training Results

  • After 100 games of self_play (~6000 states for training), nnet player beat completely random player in 18/25 games and beat random MCTS based player in 17/25 games.

1.0.5 - 3/6-8/19

Added

  • Full training/testing loop established in self_play.py (for testing on K80s).
  • Saving of nnet pre-training iteration for testing purposes.
  • Hyperparameter tuning begun.

Changed

  • Rand_MCTS given same parameters as normal self_player.
  • Debug statements.
  • Saves training examples after full eps of self-play.
  • Takes training examples from the last episodes' worth of games rather than from the last episode.

Training Results

  • Further testing (around 10 full training blocks) with similar settings as in 1.0.4 yield up-and-down performance both against random and previous selves. Need to implement a win_pct check.

1.0.6 - 3/19/19

Added

  • Win percentage check against previous selves (to prevent decrease in performance).
  • More hyperparameters.

Changed

  • Adv episodes take games rather than pairs of games (will be rounded down for odd numbers).
  • NNet instantiation not verbose anymore.
  • Last_nnet in self_play.py is a NeuralNet object now rather than a Keras model.
  • Hyperparameters.

Removed

  • Old debug comments.

1.0.7 - 4/5/19

Changed

  • NeuralNet class removed because instantiating Keras models in a loop is memory-intensive.
  • Player class and self_play.py changed to reflect this class removal.
  • Most instances of "nnet" replaced by "model" for naming accuracy.

1.0.8 - 5/4/19

Added

  • Games played in self_play loop are now saved in saved_games.
  • Helper script see_old_games.py added to see saved games (but doesn't work).

To-do

  • Rework state recursion
  • Add folder for saved self_play games (as lists of moves)
  • Figure out systematic way to load/save/clear training examples
  • Since planning to use terminal_alphabeta competitively, enable for self-play and remove those training examples that alphabeta covers

orez-ahpla's People

Contributors

kfu02 avatar

Stargazers

GhostSec avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.