GithubHelp home page GithubHelp logo

pandaant / poker-cfrm Goto Github PK

View Code? Open in Web Editor NEW
45.0 6.0 18.0 12.73 MB

A NLTH Poker Agent using Counterfactual Regret Minimization

C++ 68.56% C 27.05% Objective-C 0.29% Makefile 1.90% HTML 1.99% Shell 0.21%

poker-cfrm's Introduction

NLTH Poker Agent using Counterfactual Regret Minimization

  • Trains strategies for Kuhn-, Leduc- and Texas Holdem Poker
  • Supported Training methods:
    • External Sampling
    • Chance Sampling
    • Outcome Sampling

Requirements

  • Clang (For C++11 support)
  • Boost Programmoptions 1.55+

Installation

  • Clone repositories and copy HandRanks.dat into cfrm:
$ git clone https://github.com/pandaant/cfrm.git
$ git clone https://github.com/christophschmalhofer/poker.git
$ cp poker/XPokerEval/XPokerEval.TwoPlusTwo/HandRanks.dat cfrm/handranks.dat
$ #rm -r poker # was only needed for handranks.dat
  • Compile required libraries:
$ cd cfrm/lib
$ cd libpoker && make && cd ..
$ cd libecalc && make && cd ..
  • Compile EHS tool and generate Expected Handstrength table (EHS.dat):
$ cd cfrm/tools/ehs_gen
# tweak values for number of threads and number of samples in src/gen_eval_table.cpp if neccessary
$ make
$ ./gen_eval_table #( may take some time! )
$ mv ehs.dat ../../ehs.dat
  • Build agent binaries
$ cd cfrm 
$ make

Usage

If the build process was successful 4 binaries have been created:

  • ./cfrm is the main executable that trains a strategy.

  • ./cluster-abs generates card abstractions based of different metrics ( explained below ).

  • ./potential-abs generates a potential based card abstraction based on a precalculated cluster abstraction.

  • ./player can be used to play the agent against itself or other agents ( The server can be found here. )

  • The scripts folder contains example scripts to generate abstractions and strategies for different games.

Action Abstraction

A action abstraction discretizes the range of betting possibilities a player may choose from. This abstraction can reduce the number of states drastically.

  • NullActionAbstraction
  • PotRelationAbstraction

Card Abstraction

NLTH has a large number of chance events due to private-card dealings preflop and public-card dealings post-flop. To reduce the number of outcomes that a chance event may result in, card combinations that are similar in strength are combined into buckets. These buckets are then used as new chance events in the abstract game.

NullCardAbstraction

BlindCardAbstraction

ClusterCardAbstractions

  • SI

  • EHS - Expected Hand Strength

    E[HS] or E[HS 2] metric groups hands into buckets. Although E[HS] is a good first estimator it can’t distinguish between hands that realize their expectation in different stages of the game.

  • EMD - Earth Movers Distance

  • OCHS - Opponent Cluster Hand Strength

    OCHS can only be applied to the River phase. The E[HS] value is the probability to win against a random distribution of opponent holdings. OCHS splits the opponents possible holdings into n buckets. Each index in the histogram then corresponds to the probability of winning against hands that are in the bucket with the same index.

    OCHS can be used cluster river hands using a distribution-aware approach. It has been shown that OCHS based river abstractions outperformed expectation based abstractions.

  • Mixed Abstractions

    N stands for no abstraction, S for E[HS ], E for EMD clustering over histograms, O for OCHS and P for the potential aware abstraction. The Ordering of the letters determines which abstraction is used in which phase of the game (preflop, flop, turn, river).

    • MIXED_NEEO
    • MIXED_NEES
    • MIXED_NSSS
    • MIXED_NOOO

Sampling Schemes

  • ChanceSampling

    Chance sampled CFR randomly selects private and public chance events. The portion of the game-tree that is reachable through the sampled chance events is traversed recursively. A vector of probabilities is passed from the root down the tree containing the probability of each players contribution to reach the current node in the tree. In all reached terminal nodes the utility can be calculated in O(1).

  • ExternalSampling

    External sampling only samples factors that are external to the player, namely chance nodes and opponents actions. It has been proven that external sampling only requires a constant factor more iterations in contrast to vanilla CFR. It still archives an asymptotic improvement in equilibrium computation time because of an order reduction in cost per iteration. External sampling performs a post-order depth-first traversal during an iteration.

  • OutcomeSampling

    Outcome sampling only visits one terminal history in each iteration and updates the regrets in information sets visited along the path traversed.

Action Translation

  • PseudoHarmonicMapping

poker-cfrm's People

Contributors

pandaant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

poker-cfrm's Issues

Custom clusters in potential-abs

Hi Mark,

I was wondering whether or not the potential aware abstraction you created is based on the paper: http://www.cs.cmu.edu/afs/cs/Web/People/sandholm/potential-aware_imperfect-recall.aaai14.pdf

secondly, in the readme you say that the clusters in potential-abs are pre-calculated. Is there a way to regenerate them for a custom target. For example, if I wanted to create a potential aware abstraction of {169 ,200, 200, 200} or {169 ,500, 500, 500} ext.. would this be possible somehow?

Also, is the cfrm a version of MCCFR with external sampling?

With kind regards

MIXED_XXXX abstraction head to head win averages are in unexpected order

I ran some tests of holdem, nolimit, 2player, 1|2 small blind |bigblind, 200|200 stack, maxRaises of 3 4 4 4 games. During cluster abstractions runs for all tests I kept the nb-samples to "0,2,500,500", the buckets to : 169,5,10,500, the error bounds to : .01,.01,.01,.01, the nb-hist-samples-per-round to 0,1,200,200. For all tests I held the action-abstraction to polrelative at 0.4,0.8,1.2,2,5,9999 raises. For cfr learning I had 12 threads and times of 8 hours and sometimes 16 hrs and 24 hrs.

I ran the head to heads, specifically NSSS against each of the NOOO, NEES, NEEO. I expected NSSS to perform the wost, meaning lose money, i.e. negative average wins and NEEO to be best. I'm getting NSSS to be the best ! Here's a table of results. As you can see I ran cfr's learning phase for the most sophisticated strategy, NEEO, for longer and longer times, so 8 hrs then 16 hours then 24 hours but that didn't change things. Any ideas of what to experiment with to get the results to align with expectations - meaning NEEO, NEES, NOOO to be all better than NSSS.
Update: thinking harder, I'm wondering if the clustering abstraction is too coarse so I need to increase the fineness, by increasing the nb-samples and the nb-hist-samples. Any ideas on combinatrics around this to see what's appropriate ?

Abs cfrm runtime (secs / 1000) Abs cfrm runtime (secs / 1000) Avg win Var num games seed median win
NSSS 28 NOOO 28 2.94 7167 500000 7534 0

28 NOOO 28 2.76 7119 100000 3575 0

28 NOOO 28 2.95 7163 100000 8379 1.5










28 NEES 28 3.4 7564 100000 8379 1.5

28 NEES 28 3.03 3575 100000 3575 1.5










28 NEEO 28 5.07 7475 100000 8379 1.5

28 NEEO 57 4.53 7118 100000 7534 1.5

28 NEEO 86 5.05 7141 100000 7534 1.5

28 NEEO 86 4.84 7138 100000 8370 1.5

Card abstraction - Potential Aware

Hi Pandaant,
Nice work ! I would like to know if you implemented the "P for the potential aware abstraction" because i don't see it in the code available in github.
If i use cluster_abs with -m mixed_npeo it just computes with the mixed_nooo abstraction, you have something wrote like that in the scripts directory.

The github's files are updated?
I have a particular interest in this algorithm.

Thanks for your time

get expected value of actions in state

Hi, sorry to bother you,

if I wanted to obtain for each action in each state an expected value of taking that action, would the basic idea be to modify the train function in cfrm.cpp to keep track of what is returned by each action?

"Too many abstract actions" error in training no-limit-holdem using the provided game definitions

Hello Pandaant,

Great work with the project and thanks for sharing. I have run into the following issue, kindly share your thoughts.

I have not made any changes to the code apart from increasing the number of threads. The strategy file was computed, 17GB in total. But when I run "player" I get the error, "too many abstract actions". Looks like the maximum allowed is defined to be 20. The blinds are 1 and 2, stack sizes are 200 and 200. Since max allowed raises in each rounds are defined to be 3 4 4 4, my calculation for max number of actions is 4+5+5+5=19. So I am not sure why I am getting this error.

If I do change "MAX_ABSTRACT_ACTIONS" to say 30, do I have to repeat all the steps - gen_eval_table, cluster_abs, potential_abs, and cfrm again?

What is the idea behind code duplication in tools/ehs_gen/src and src?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.