GithubHelp home page GithubHelp logo

lukasmyth96 / piggy Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 23.68 MB

Using Value Iteration and Policy Iteration to discover the optimal solution for the strategic dice game PIG. Ultimately interested in whether the optimal solution can be reached through self-play alone.

Python 100.00%
reinforcement-learning dice-game pig value-iteration policy-iteration self-play

piggy's Introduction

Piggy

The Rules

Pig is a strategic, two-player dice game. The rules are simple - each turn, a player repeatedly rolls a die until they either roll a 1 or they decide to 'hold':

  • If a 1 is rolled, the player looses all accumulated points in that turn and it becomes the other player's turn
  • If any other number is rolled, that number gets added to their turn total and the player decides whether to roll again or hold
  • If a player chooses to hold, their turn total is added to their overall score and it becomes the other player's turn

The first player to reach 100 points wins!

Basic Strategy Will Only Take You Sow Far

It is straightforward to calculate that in order to maximise the expected number of points gained in a single turn, you should roll until your turn score reaches 20 and then hold. Whilst this simple strategy will typically serve you well, there are many circumstances in which you'd do best to deviate from it - sometimes significantly. The reason for this is that maximising the number of points you add to your score in a single turn is not the same thing as maximising your changes of winning. Consider, for example, the extreme case in which your opponent sits on 99 points with you far behind at just 50. In this situation the 'hold at 20' policy is clearly of no use as your opponent is most certain to win on their next go.

Finding the Optimal Policy

With a bit of thought it's easy to figure out generally under what circumstances you should play with more or less risk. And yet, determining precisely the optimal strategy in each of the 100^3 possible states is far beyond human comprehension. Thankfully, given that we have a complete understanding of the game's dynamics we can use the methods of dynamic programming (such as value iteration and policy iteration) to compute the optimal policy. Todd Neller was the first to do this in his 2004 paper.

Alt text

The Purpose of this Work

I am currently working my way through Reinforcement Learning - the classic text by Sutton & Barto so this work is primarily to aid in my understanding of some of the topics there. Pig also has a small place in my heart as one of the things that inspired me to begin studying machine learning was a university project centered around discovering good strategy in Pig - although we were primarily focused on genetic algorithms which, looking back, are quite poorly suited to this problem but alas. I have included a copy of our report from that project in this repo for no particular reason.

My plan is to first reproduce the Neller's optimal policy using both value iteration and policy iteration and try and understand the differences between them in terms of their rate of convergence. Neller also discusses some 'tricks' that can be used to improve the convergence rate by applying value/policy iteration incrementally on disjoint subsets states. I'd like to analyse what impact these tricks have on convergence - are they a requirement for a reasonable convergence time or just nice to have? It'd also be interesting to investigate the impact that the initial policy has on the convergence to the optimal policy. For example does starting at a reasonable policy like 'hold at 20' lead to significantly faster convergence?

Moving beyond 'vanilla' pig I'd also like to experiment with variants for which dynamic programming cannot be used. This will be the case whenever we no longer have a perfect model of the games dynamics. One idea I have is to simply simulate a biased die for which the probability of rolling each number is no longer equal and is unknown to the agent. Of course it'd be quite simple to just have the agent estimate the die's probability distribution from some initial set of experiences and then apply value/policy iteration but where's the fun in that? I'm more interested in seeing if an agent can learn to approximate the optimal policy through self-play alone.

piggy's People

Contributors

lukasmyth96 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

mhoehle

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.