nim-mcts's Introduction

nim-MCTS

Monte Carlo tree search with Upper Confidence bounds for Trees (UCT) for Nim

UCT searches for good moves, in a random way. Every iteration has three stages:

Walk down the tree of tried moves, in upper confidence (UCT) order, until an untried move is found.
When an untried move is found, play the game by randomly playing moves.
When the game is finished, walk back up the tree, updating the upper confidence (UCT).

This algorithm will explore untried moves first, then moves that look promising for whichever player is choosing the move.

UCT is evaluated as wins/visits + UCTK*sqrt(2*log(visits)/visits)

UCTK is some constant, which will tune exploration vs exploitation. A higher constant will tend to favor more exploration (approaching infinity, it will pick the least visited node), while a zero value will simply pick the current best node.

I also added an option to run a heuristic search. TODO - read the literature, and find a good way to do this.

Using a very simple heuristic, I get a player using 5 search iterations to beat one using 1000 iterations in Othello. Which is a pretty good speedup.

Another speedup would be Zobrist hashing. This basically hashes seen nodes (since many games have multiple paths to the same state). Zobrist hashing using XORing of positions, so the entire hash need not be recalculated every move (when you move a piece, you XOR it out of its current position, then XOR it into the next position).

To run Othello:

nimrod compile --run -d:release --p:../ Othello.nim

Recommend Projects

peter-the-tea-drinker / nim-mcts Goto Github PK

nim-mcts's Introduction

nim-MCTS

nim-mcts's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs