Stockfish lost track of the winning line. Eval dropped from +26.13 t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Another paper: <a href="http://webdocs.cs.ualberta.ca/~mmueller/ps/aaai-ghi.pdf" r

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Just one example with my patch. <div class="snippet-clipboard-content notranslate

TCEC Season 8, game 22: Eval dropped From +26.13 to 0.00,about official-stockfish/stockfish

Comments (11)

lucasart commented on May 22, 2024 1

This kind of issue is not very useful. After almost one year, it still can't be be reproduced. If it can't be reproduced, it can't be understood or fixed.

from stockfish.

cuddlestmonkey commented on May 22, 2024

This PGN posted by amhijo does show something not quite right with the handling of the hash entries. I'll reproduce it below together with my description of what is going on.

[Event "?"] 
[Site "?"] 
[Date "????.??.??"] 
[Round "?"] 
[White "New game"] 
[Black "?"] 
[Result "*"] 
[PlyCount "137"] 

1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. g3 Ba6 5. Nbd2 c5 6. e4 cxd4 7. e5 Ng4 8. h3 
Nh6 9. Bg2 Nc6 10. O-O Be7 11. Qa4 Bb7 12. Nxd4 Nxd4 13. Bxb7 Rb8 14. Be4 Qc7 
15. Qd1 Nhf5 16. Re1 Qxe5 17. Nb3 Rd8 18. Bf4 Qf6 19. Qd3 Bc5 20. Rad1 Nxb3 21. 
axb3 Nd4 22. Kg2 Nc6 23. h4 a5 24. Qe2 Qe7 25. Qh5 g6 26. Qf3 Nd4 27. Qc3 Qf6 
28. Bd5 Bb4 29. Qd3 O-O 30. Be5 Qf5 31. Qxd4 Bxe1 32. Rxe1 d6 33. Bf6 e5 34. 
Qxb6 Qxf6 35. Qxa5 Kh8 36. b4 g5 37. Rh1 gxh4 38. Rxh4 Qg6 39. Qa3 f5 40. Qf3 
Qg7 41. b5 Rb8 42. b4 Rf6 43. Rh5 Qg6 44. Qe2 f4 45. Be4 Qg7 46. Qf3 Rh6 47. 
Rxh6 Qxh6 48. Qe2 fxg3 49. fxg3 Qg5 50. c5 Rg8 51. Qe1 dxc5 52. bxc5 Rd8 53. b6 
Rd2+ 54. Kg1 Qd8 55. Qe3 Rb2 56. Bf3 Rb1+ 57. Kg2 Rb2+ 58. Kh3 Qf6 59. b7 Qe6+ 
60. g4 h5 61. c6 hxg4+ 62. Bxg4 Qd6 63. Bf5 Qf6 64. Kg4 Rg2+ 65. Kh3 Rc2 66. 
Be4 Rb2 67. Bc2 Rb4 68. Be4 Rb2 69. Bc2 *

Using latest master, single-threaded.

Select move 69 and engage infinite analysis. Immediately evaluated as 0.00 to depth 30+ since Black can repeat the position after move 67 with 69... Rb4. I don't think this is a true 3-fold yet though.

Now, the effect of that is to load the hash with an 0.00 eval for that position with a depth of 30+.

If I now click on the position after 63...Qf6, leaving the analysis running (so no hash clear) Bc2 is now discounted as a candidate move because of this deep 0.00 stored in the hash. Given enough time, so that Bc2 is evaluated to a high enough depth (bearing in mind its now way down the move order, so will have reductions), it will recover, but that would take a long time.

So the 0.00 repetition based eval is being stored in the hash and then being applied to a position where that move would not be a repetition.

Obviously this particular sequence is not an actual game - it's another case of something that interferes with analysis - but is it possible that something similar can happen wherein one thread analyses a line, marks a position (with a rep) as 0.00, and then another thread incorrectly applies that TT score to the non-repped position elsewhere in the tree?

from stockfish.

zamar commented on May 22, 2024

@cuddlestmonkey: This is a very old known problem: Bad interaction between Transposition table and 2-fold repetition. It can't be solved without slowing down the engine massively.

from stockfish.

cuddlestmonkey commented on May 22, 2024

@zamar Maybe, but it's worth noted that I don't get the same issue using Komodo (another 2-fold rep engine). Komodo 9 evaluates move 69 as 0.00, but evidently isn't using that 0.00 to short-circuit the eval of Bc2 when I switch to move 64.

from stockfish.

syzygy1 commented on May 22, 2024

I strongly suspect the graph-history interaction problem also to be the cause of the game 22 problems.

Although I have reproduced the problem with YBW Stockfish (assuming I am correct in thinking that the SF binary from abrok of Thu Oct 15 21:27:52, timestamp 1444969672 is still YBW!!), it would not surprise me if the problem occurs more often with lazy smp. This is because, as I understand, lazy smp lets some threads search deeper than other threads. This could result in one thread searching a position X deeper in the tree (with some key positions for the position X already flagged in history) with relatively high depth, resulting in that position being stored in hash with relatively high depth and a "too low" score (due to the flagged key positions being scored as draw when encountered below in the search of X). When X is then encountered closer to the root by another thread searching less deeply, that thread will accept the "too low" score even though that is wrong.

To reduce the bad effects of the graph-history interaction problem, it seems important to not let threads search at different depths in the endgame.

A paper from 1985 on this problem:
http://wiki.cs.pdx.edu/wurzburg2009/nfp/campbell-ghi.pdf

From the conclusion: "The key in avoiding most occurrences of GHI appears to be iterative deepening". If a position occurs multiple times in the search tree, it should be attempted to first search the occurrence of it that is closest to the root.

from stockfish.

cuddlestmonkey commented on May 22, 2024

Another paper:
http://webdocs.cs.ualberta.ca/~mmueller/ps/aaai-ghi.pdf

"The Graph History Interaction (GHI) Problem occurs when the same game position behaves differently when reached via different paths. For example, after following one path a move m may be legal in position p, while after following another path the same move is illegal in p.
Our efficient solution to GHI was instrumental in developing the world's strongest tsume Go solver, and in solving checkers."

from stockfish.

syzygy1 commented on May 22, 2024

Unfortunately that "general" solution is only general in a very specific sense. Basically it is of no value for a game-playing engine, but only for game solvers. See http://www.open-chess.org/viewtopic.php?p=17480#p17480

Before I place all the blame on lazy smp threads that search at different depths (even though the problem can be reproduced with YBW versions), the main trigger of the problem might be a particular combination of reductions and extensions that may result in, say, a position P being searched at depth N at a node further away from the root (with a larger position history) before that same position P is searched at depth N at a node closer to the root (with a smaller position history).

from stockfish.

joergoster commented on May 22, 2024

Based on the linked talkchess thread (link given by Vince in the forum),
I just created a branch no_drawscore_to_tt, where I don't save a draw score into the transposition table. joergoster@bec4d09

So far, I was not able to get a draw score for the position at move 64.
Maybe this helps at least to lower the probabilty of happening too frequently.

from stockfish.

cuddlestmonkey commented on May 22, 2024

@syzygy1 Shame. In regard of reductions and extensions, there was a position given by Uli that exhibited a similar strange "reset to zero" behaviour, and lowering the amount of reductions in that case removed the problem, so you may well be right.

from stockfish.

joergoster commented on May 22, 2024

Just one example with my patch.

info depth 41 seldepth 122 multipv 1 score cp 3327 upperbound nodes 28564409809 nps 14653036 hashfull 999 tbhits 93687414 time 1949385 pv f5c2 b2b4
info depth 41 currmove f5c2 currmovenumber 1
info depth 41 seldepth 122 multipv 1 score cp 3737 lowerbound nodes 28612741641 nps 14650383 hashfull 999 tbhits 94030620 time 1953037 pv f5c2
info depth 41 currmove f5c2 currmovenumber 1
info depth 41 seldepth 122 multipv 1 score cp 4512 lowerbound nodes 30264961927 nps 14595990 hashfull 999 tbhits 104153884 time 2073512 pv f5c2
info depth 41 currmove f5c2 currmovenumber 1

It looks like as soon as the fail-low cycle begins, my patch breaks this and SF starts to fail-high again. I don't pretend my patch 'solves' anything, but it really seems to help.

The other thing to consider is the search instability. I think it would also help to open the aspiration window a bit faster, not allowing so many fail-lows in sequence.

from stockfish.

mcostalba commented on May 22, 2024

@lucasart I agree. Closing.

from stockfish.

TCEC Season 8, game 22: Eval dropped From +26.13 to 0.00 about stockfish HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs