official-stockfish / wdl_model Goto Github PK

Fit a chess Win-Draw-Loss model from played games

License: GNU General Public License v3.0

Python 49.84% Makefile 1.13% C++ 40.55% Shell 8.48%

wdl_model's Introduction

Generate SF WDL model based on data

Stockfish's "centipawn" evaluation is decoupled from the classical value of a pawn, and is calibrated such that an advantage of "100 centipawns" means the engine has a 50% probability to win from this position in selfplay at fishtest LTC time control.
If the option UCI_ShowWDL is enabled, the engine will show Win-Draw-Loss probabilities alongside its "centipawn" evaluation. These probabilities depend on the engine's evaluation and the material left on the board, and are computed from a WDL model that can be generated from fishtest data with the help of the scripts in this repository.

Install

Python 3.9 or higher is required.

pip install -r requirements.txt

C++17 compatible compiler is required and zlib needs to be present.

sudo apt-get install zlib1g-dev

Usage

To allow for efficient analysis multiple pgn files are analysed in parallel. Analysis of a single pgn file is not parallelized. Files can either be in .pgn or .pgn.gz format. The script will automatically detect the file format and decompress .pgn.gz files on the fly.

To update Stockfish's internal WDL model, the following steps are needed:

Obtain a large collection of engine-vs-engine games (at fishtest LTC time control) by regularly running python download_fishtest_pgns.py over a period of time. The script will download the necessary pgn files and metadata describing the test conditions from fishtest.
Run the script updateWDL.sh, which will automatically perform these steps:
- Run make to compile scoreWDLstat.cpp, producing the executable scoreWDLstat.
- Run scoreWDLstat with some custom parameters to parse the downloaded pgn files. The computed WDL statistics will be stored in a file called updateWDL.json. The file will have entries of the form "('D', 1, 78, 35)": 668132, meaning this tuple for (result, move, material, eval) was seen a total of 668132 times in the processed pgn files.
- Run python scoreWDL.py with some custom parameters to compute the WDL model parameters from the data stored in updateWDL.json. The script's output will be stored in scoreWDL.log and will contain the new values for as[] and bs[] in Stockfish's uci.cpp. See e.g. official-stockfish/Stockfish#5121. In addition, the script will produce a graphical illustration of the analysed data and the fitted WDL model, as displayed below.

Results

Help and other options

Running scoreWDLstat --help and python scoreWDL.py --help, respectively, will provide a description of possible command line options for the two programs. For example:

scoreWDLstat --matchEngine <regex> : extracts WDL data only from the engine matching the regex
python scoreWDL.py --NormalizeToPawnValue 356 --momType move --momTarget 32 --moveMin 8 : fit the model based on full move number, with move 32 as the 100cp anchor (until SF16.1 this was used for Stockfish)

Background

The underlying assumption of the WDL model is that the win rate for a position can be well modeled as a function of the evaluation of that position. The data shows that a logistic function (see also logistic regression) gives a good approximation of the win rate (the probability of a win) as a function of the evaluation x:

win_rate(x) = 1 / ( 1 + exp(-(x-a)/b))

In this equation, the parameters a and b need to be fitted to the data, which is the purpose of this repository. a is the evaluation for which a 50% win rate is observed, while b indicates how quickly this rate changes with the evaluation. A small b indicates that small changes in the evaluation x quickly turn a game "on the edge" (i.e. a 50% win rate) into a dead draw or a near certain win.

The model furthermore assumes symmetry in evaluation, so that the following quantities follow as well:

loss_rate(x) = win_rate(-x)
draw_rate(x) = 1 - win_rate(x) - loss_rate(x)

This information also allows for estimating the game score

score(x) = 1 * win_rate(x) + 0.5 * draw_rate(x) + 0 * loss_rate(x)

The model is made more accurate by not only taking the evaluation, but also the material or game move counter (mom) into account. (The model currently employed in Stockfish uses the material.) This dependency is modeled by making the parameters a and b a function of mom. The win/draw/loss rates are now 2D functions, while a and b are replaced by 1D functions. For example:

win_rate(x,mom) = 1 / ( 1 + exp(-(x-p_a(mom))/p_b(mom)))

Here for simplicity the 1D functions p_a and p_b are chosen to be polynomials of degree 3.

The parameters that need to be fitted to represent the model completely are thus the 8 coefficients that determine these two polynomials. For example:

p_a(mom) = ((-185.71 * mom / 58 + 504.85) * mom / 58 + -438.58) * mom / 58 +  474.05
p_b(mom) = ((89.24 * mom / 58 + -137.02) * mom / 58 + 73.29) * mom / 58 + 47.53

In order to fit these 8 parameters three different approaches are provided: fitDensity, optimizeProbability, optimizeScore. The simplest one (fitDensity), in a first step, and for each value of mom that is of interest, estimates the best values of a and b to fit the logistic win rate function win_rate(x) to the observed win densities. Note that this procedure, for each value of mom, fits a 1D curve to a horizontal slice of the (x,mom) data. Denoting these obtained values by a(mom) and b(mom), a second step then consists of fitting the 1D polynomials p_a and p_b to these discrete values. The options optimizeProbability and optimizeScore are a bit more sophisticated. They first take, for each value of mom, the discrete values a(mom) and b(mom) provided by the above described simple 1D fitting as initial guesses for an iterative optimization procedure that aims to either maximize the probability of predicting the correct game outcome for the available data, or to minimize the squared error in the predicted score. These improved values of a(mom) and b(mom) then yield newly fitted 1D polynomials p_a and p_b, which in turn form initial values for a final iterative optimization that aims to find the best polynomials p_a and p_b for the objective functions of interest, but now evaluated globally, over the whole 2D data (x,mom).

Interplay with Stockfish

Observe that x in the above formulas is the internal engine evaluation of a position, often also called non-normalized evaluation, which is in general not exposed to the user. By definition x = p_a(mom) is the internal evaluation with a 50% win rate at material or game move counter mom. Since SF17 (and in current development versions) this x is scaled to the displayed evalution 1.0 for every value of mom, where mom represents the material count of the material left on the board. Until SF16.1, for computational simplicity, all values of x, irrespective of the value of mom (with mom being the full move number), were rescaled to x/p_a(32), which thanks to the choice of p_a was just the sum of the four coefficients of the polynomial p_a, and in rounded form was stored within NormalizeToPawnValue.

Interpretation

The six plots in the graphic displayed above can be interpreted in the following way. The middle and right plot in the first row show contour plots in the (x,mom) domain of the observed win and draw frequencies in the data, respectively. Below them are the corresponding contour plots for the fitted model, i.e. for the 2D functions win_rate(x,mom) and draw_rate(x,mom) based on the found optimal 8 parameters. The top left plot shows a slice of the data at the chosen anchor mom=58, together with plots of win_rate(x), draw_rate(x) and loss_rate(x) for the fitted a=p_a(58) and b=p_b(58). Finally, the bottom left plot shows the collection of all the values of a(mom) and b(mom), together with plots of the two polynomials p_a and p_b. For comparison it also includes a plot of the polynomial p_a that was used in the WDL model of the input data.

wdl_model's People

Contributors

Stargazers

Watchers

Forkers

disservin robbai robertnurnberg dede1751 lynx-chess peregrineshahin vondele sarona-cccf dboingue scchess brito753 meliandam mhouppin xu-shawn seanpm2001

wdl_model's Issues

data race

compiling with -fsanitize=thread shows a data race

filter out crashes and time losses

Just leaving this here to not forget: we could (should) also filter out WDL data from games that were lost due to crashes and time losses, I think. They can be identified by [Termination "time forfeit"] and [Termination "abandoned"] in their pgn data.

current move based WDL model is 8 moves off for standard fishtest LTC data

That is because cutechess-cli saves pgns with FEN move counters 0 1.

See the discussion on discord.

License

It'd be great to explicitly state the license of this repository.

seg fault with scoreWDLstat

As of this morning, I get a segmentation fault when running update script.

> ./updateWDL.sh 
started at:  Mon 23 Oct 09:08:37 CEST 2023
Look recursively in directory pgns for games from SPRT tests using books matching "UHO_4060_v3.epd" for SF revisions between 70ba9de85cddc5460b1ec53e0a99bee271e26ece (from 2023-09-22 19:26:16 +0200) and HEAD (from 2023-10-22 16:16:02 +0200).
./updateWDL.sh: line 59: 2154418 Segmentation fault      (core dumped) ./scoreWDLstat --dir $pgnpath -r --matchRev $regex_pattern --matchBook "$bookname" --fixFEN --SPRTonly -o updateWDL.json &> scoreWDLstat.log

scoreWDLstat.log

Miscounting the number of games in a pgn collection?

Running on a recent test, I get this output, i.e. 177333 games:

$ ./scoreWDLstat --dir ./pgns/23-10-21/6533f394de6d262d08d3a55e/ -r
Looking (recursively) for pgn files in ./pgns/23-10-21/6533f394de6d262d08d3a55e/
Found 96 .pgn(.gz) files in total.
Found 96 .pgn(.gz) files, creating 96 chunks for processing.
Progress: 96/96
Time taken: 0.1s
Wrote 2788919 scored positions from 177333 games to scoreWDLstat.json for analysis.

Yet, the test meta-data shows it is only 20000 games, which is confirmed by:

$ zcat ./pgns/23-10-21/6533f394de6d262d08d3a55e/*.pgn.gz | grep 'Result' | wc -l
20000

There are also exactly 96 pgn.gz files in that directory.

In the code, the total_games counter is only incremented in the header() function, so I'm a bit at a loss here. Can the header function be called multiple times per game?

Food for Thought - Build System

What do you think about maybe upgrading the build system to something newer like CMake/Meson?

The Makefile works fine, there's no problem with it, but maybe it's time to try out new tools?
I'd personally would try out Meson, I can write up a patch in the next few days and then we can still decide ? :D

scoreWDL.py Breaks with Numpy 2.0 due to usage of np.NaN

Traceback (most recent call last):
  File "/Users/shawn/WDL_model/scoreWDL.py", line 729, in <module>
    wdl_data.load_json_data(args.filename)
  File "/Users/shawn/WDL_model/scoreWDL.py", line 150, in load_json_data
    self.w_density = np.full_like(total, np.NaN, dtype=float)
                                         ^^^^^^
  File "/Users/shawn/WDL_model/venv/lib/python3.12/site-packages/numpy/__init__.py", line 397, in __getattr__
    raise AttributeError(
AttributeError: `np.NaN` was removed in the NumPy 2.0 release. Use `np.nan` instead.. Did you mean: 'nan'?

std::exit while pgns are being processed may lead to crashes

On linux I get segmentation faults when --fixFEN finds missing keys in the metadata, say. In itself this is not a problem, as the code should exit anyway. But a more graceful way to stop would be nice. I think the segmentation faults comes from the parallel execution of many PGN analysises, and maybe the exit(1) leads to some unexpected states there. (Sadly, I do not know how to fix this.)

Sample output for me:

Missing "book_depth" key in metadata for .epd book for test pgns/23-09-23/650f26ffadc82c88993ddd80/650f26ffadc82c88993ddd80
pt pt 00
scoreWDLstat: external/chess.hpp:1872: virtual void chess::Board::placePiece(chess::Piece, chess::Square): Assertion `board_[sq] == Piece::NONE' failed.
file_from -1
Segmentation fault      (core dumped)

coordinating efforts to track the WDL model

Not sure if this is the best place to discuss this, but once the latest PRs are merged, we are basically ready to create some WDL tracker. Here some questions we could try to agree on:

Should each of us create their own local copies of fishtest .pgn files? Or best if @vondele does this somehow centrally? (Is there any point of hosting these pgn's on kaggle?)
Do we create a new repo for tracking, or make it part of this repo?
How do we deal with non-functional SF commits? We can now filter the WDL data by commit, but for the analysis it would make sense to merge data of non-funcional commits with last previous functional commit.
For creating a valid WDL_model-in-time data point, do we require a minimum number of positions? (Here I think of two functional commits in quick succession, meaning there won't be enough meaningful data for the first commit.)

downloads fail without error message

On a fresh clone, I get as output for python download_fishtest_pgns.py --path pgns --subdirs --page 2

Found 0 fully downloaded tests in pgns/ already.
Downloading pgns to pgns/23-08-30/64efc1bdb0db1f4c8581ee29/ ...
  Fetching 453 missing pgn files ...
Downloading pgns to pgns/23-08-31/64f0565fb0db1f4c8581ffc5/ ...

Then killing the process and checking the directory pgns/23-08-30/64efc1bdb0db1f4c8581ee29 gives an empty directory, whereas the second directory starts to get filled.

Will try to investigate tomorrow.

Adopt bulk download in due time

Once this is in fishtest, we should adopt the bulk download official-stockfish/fishtest#1818

monitor material based fitting

This is not an issue per se, but just a convenient place to regularly check how our material based fitting works.

Below I report on the fits from ./updateWDL.sh --firstrev b59786e750a59d3d7cff2630cf284553f607ed29 (based on move) and from python scoreWDL.py updateWDL.json --plot save --pgnName update_material.png --momType "material" --momTarget 62 --moveMin 8 --moveMax 120 --materialMin 10 --materialMax 78 --modelFitting optimizeProbability applied to the same json data (based on material).

json data: updateWDL.json.gz

sf refactoring broke the update script

I have pushed a fix to the PR #153 which I guess should be merged now. We can look again at the precise data retrieval from SF source code for the dynamic rescaling once this is in SF code.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble