GithubHelp home page GithubHelp logo

official-stockfish / wdl_model Goto Github PK

View Code? Open in Web Editor NEW
17.0 17.0 12.0 15.48 MB

Fit a chess Win-Draw-Loss model from played games

License: GNU General Public License v3.0

Python 53.49% Makefile 1.21% C++ 36.64% Shell 8.66%

wdl_model's People

Contributors

dede1751 avatar disservin avatar peregrineshahin avatar robbai avatar robertnurnberg avatar vondele avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

wdl_model's Issues

License

It'd be great to explicitly state the license of this repository.

std::exit while pgns are being processed may lead to crashes

On linux I get segmentation faults when --fixFEN finds missing keys in the metadata, say. In itself this is not a problem, as the code should exit anyway. But a more graceful way to stop would be nice. I think the segmentation faults comes from the parallel execution of many PGN analysises, and maybe the exit(1) leads to some unexpected states there. (Sadly, I do not know how to fix this.)

Sample output for me:

Missing "book_depth" key in metadata for .epd book for test pgns/23-09-23/650f26ffadc82c88993ddd80/650f26ffadc82c88993ddd80
pt pt 00
scoreWDLstat: external/chess.hpp:1872: virtual void chess::Board::placePiece(chess::Piece, chess::Square): Assertion `board_[sq] == Piece::NONE' failed.
file_from -1
Segmentation fault      (core dumped)

downloads fail without error message

On a fresh clone, I get as output for python download_fishtest_pgns.py --path pgns --subdirs --page 2

Found 0 fully downloaded tests in pgns/ already.
Downloading pgns to pgns/23-08-30/64efc1bdb0db1f4c8581ee29/ ...
  Fetching 453 missing pgn files ...
Downloading pgns to pgns/23-08-31/64f0565fb0db1f4c8581ffc5/ ...

Then killing the process and checking the directory pgns/23-08-30/64efc1bdb0db1f4c8581ee29 gives an empty directory, whereas the second directory starts to get filled.

Will try to investigate tomorrow.

seg fault with scoreWDLstat

As of this morning, I get a segmentation fault when running update script.

> ./updateWDL.sh 
started at:  Mon 23 Oct 09:08:37 CEST 2023
Look recursively in directory pgns for games from SPRT tests using books matching "UHO_4060_v3.epd" for SF revisions between 70ba9de85cddc5460b1ec53e0a99bee271e26ece (from 2023-09-22 19:26:16 +0200) and HEAD (from 2023-10-22 16:16:02 +0200).
./updateWDL.sh: line 59: 2154418 Segmentation fault      (core dumped) ./scoreWDLstat --dir $pgnpath -r --matchRev $regex_pattern --matchBook "$bookname" --fixFEN --SPRTonly -o updateWDL.json &> scoreWDLstat.log

scoreWDLstat.log

Food for Thought - Build System

What do you think about maybe upgrading the build system to something newer like CMake/Meson?

The Makefile works fine, there's no problem with it, but maybe it's time to try out new tools?
I'd personally would try out Meson, I can write up a patch in the next few days and then we can still decide ? :D

sf refactoring broke the update script

I have pushed a fix to the PR #153 which I guess should be merged now. We can look again at the precise data retrieval from SF source code for the dynamic rescaling once this is in SF code.

monitor material based fitting

This is not an issue per se, but just a convenient place to regularly check how our material based fitting works.

Below I report on the fits from ./updateWDL.sh --firstrev b59786e750a59d3d7cff2630cf284553f607ed29 (based on move) and from python scoreWDL.py updateWDL.json --plot save --pgnName update_material.png --momType "material" --momTarget 62 --moveMin 8 --moveMax 120 --materialMin 10 --materialMax 78 --modelFitting optimizeProbability applied to the same json data (based on material).

update_move

update_material

json data: updateWDL.json.gz

coordinating efforts to track the WDL model

Not sure if this is the best place to discuss this, but once the latest PRs are merged, we are basically ready to create some WDL tracker. Here some questions we could try to agree on:

  • Should each of us create their own local copies of fishtest .pgn files? Or best if @vondele does this somehow centrally? (Is there any point of hosting these pgn's on kaggle?)
  • Do we create a new repo for tracking, or make it part of this repo?
  • How do we deal with non-functional SF commits? We can now filter the WDL data by commit, but for the analysis it would make sense to merge data of non-funcional commits with last previous functional commit.
  • For creating a valid WDL_model-in-time data point, do we require a minimum number of positions? (Here I think of two functional commits in quick succession, meaning there won't be enough meaningful data for the first commit.)

Miscounting the number of games in a pgn collection?

Running on a recent test, I get this output, i.e. 177333 games:

$ ./scoreWDLstat --dir ./pgns/23-10-21/6533f394de6d262d08d3a55e/ -r
Looking (recursively) for pgn files in ./pgns/23-10-21/6533f394de6d262d08d3a55e/
Found 96 .pgn(.gz) files in total.
Found 96 .pgn(.gz) files, creating 96 chunks for processing.
Progress: 96/96
Time taken: 0.1s
Wrote 2788919 scored positions from 177333 games to scoreWDLstat.json for analysis.

Yet, the test meta-data shows it is only 20000 games, which is confirmed by:

$ zcat ./pgns/23-10-21/6533f394de6d262d08d3a55e/*.pgn.gz | grep 'Result' | wc -l
20000

There are also exactly 96 pgn.gz files in that directory.

In the code, the total_games counter is only incremented in the header() function, so I'm a bit at a loss here. Can the header function be called multiple times per game?

filter out crashes and time losses

Just leaving this here to not forget: we could (should) also filter out WDL data from games that were lost due to crashes and time losses, I think. They can be identified by [Termination "time forfeit"] and [Termination "abandoned"] in their pgn data.

data race

compiling with -fsanitize=thread shows a data race

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.