GithubHelp home page GithubHelp logo

ziap / 2048-tdl Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 294 KB

Temporal difference learning for 2048

License: MIT License

C++ 44.54% Makefile 1.64% HTML 53.82%
2048-solver temporal-difference-learning n-tuple-networks expectimax 2048-ai agent reinforcement-learning

2048-tdl's Introduction

2048 Temporal Difference Learning

A 2048 agent with N-Tuple Network trained using Backward Temporal Coherence Learning.

Benchmark (Intel® Core™ i5-8300H Processor)

Score

1 ply search is the trained model without any tree search algorithm

3 ply is depth 2 expectimax search with the trained model for evaluation function

Depth Games Score % 32768 % 16384 % 8192
1 ply 100000 201381 3.62 37.14 73.86
3 ply 1000 419479 35.7 80.2 95.5

You can achieve similar results with:

# Build the program with the 4x6 tuple network
make STRUCTURE=nw4x6

# Train the network with 1000000 episodes, α = 1.0, λ = 0.5, TC and restart strategy
./2048 train -e 1000 -a 1.0 -l 0.5 -c -o -r

# Run the agent for 10000 games with no search
./2048 agent -e 10000

# Run the agent for 300 games with 3 ply search
./2048 agent -e 300 -d 2

Tuple networks

Structure Size Max tile Speed (1 thread) Speed (8 threads) Speed (3 ply, 8 threads)
nw5x4 1.25MB 8192 5901430 moves/s 20643753 moves/s 40542 moves/s
nw4x5 16MB 16384 5560291 moves/s 18563599 moves/s 50428 moves/s
nw6x5 24MB 16384 3905823 moves/s 12509119 moves/s 37908 moves/s
nw4x6 256MB 32768 2553369 moves/s 8381214 moves/s 40051 moves/s
nw8x6 512MB 32768 1525958 moves/s 4279976 moves/s 19764 moves/s

Features

  • Backward TD(λ) learning.
  • N-Tuple Network with configurable structure.
  • Optional temporal coherence learning and restart strategy.
  • Expectimax search with configurable depth.
  • Webview based GUI application.
  • Multi-threaded training and evaluation.

To achieve high speed and fast learning, both the agent and training code are heavily optimized:

  • 64-bit bitboard representation.
  • Table lookup for movement and reward.
  • Transposistion table with Zobrist Hash.
  • Bit optimizations.
  • Efficient N-Tuple Network implementation with static structure.

Usage

Download and unzip the trained model (8x6tuple network) here.

Build

GCC and GNU Make are required to build the program. Currently, the makefile only supports Linux. Building the program on other platforms should be possible, but not tested. Cross platform build script is planned.

To build the GUI, you need to install the GTK+ development libraries and webkit2gtk-4.0 and enable the submodule.

Building the program on Google Colaboratory is also supported if you disable the GUI.

Build steps

make [Options]

Parameters:

  • STRUCTURE: The tuple network structure. (default: nw4x6)
  • ENABLE_GUI: Enable the GUI. (default: true)
  • EXTRAS: Extra compiler options for profiling, etc.

Available structures: (See benchmark)

Train model

./2048 train [Options]

Parameters:

-a <alpha>    -- Set the learning rate
                 default: 0.1
-l <lambda>   -- Set the trace decay
                 default: 0.5
-e <episodes> -- Set the number of training games * 1000
                 default: 1
-t <threads>  -- Set the number of threads
                 default: 1 (0 uses all threads)
-i            -- Enable reading from a binary file
-o            -- Enable writing to a binary file
-c            -- Enable temporal coherence learning
-r            -- Enable restart strategy
-h            -- Show this message

Run agent

./2048 agent [Options]

Parameters:

-d <depth>    -- Set the search depth
                 default: 0
-e <episodes> -- Set the number of games to play
                 default: 1
-t <threads>  -- Set the number of threads
                 default: 1 (0 uses all threads)
-g            -- Enable the GUI
-h            -- Show this message                        

Example:

./agent -d2 -i100 -t 0 # 3 ply, 100 games, multi-threaded
./agent -d4 -g         # 5 ply, enable GUI

Example game with the GUI:

Todo

  • Add multi-stage learning
  • Add more settings to the GUI application

License

This app is licensed under the MIT license.

2048-tdl's People

Contributors

ziap avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.