GithubHelp home page GithubHelp logo

lichess-org / chess-openings Goto Github PK

View Code? Open in Web Editor NEW
331.0 331.0 82.0 1.76 MB

An aggregated data set of chess opening names

License: Creative Commons Zero v1.0 Universal

Python 96.57% Makefile 3.43%
chess dataset lichess

chess-openings's People

Contributors

alexandru-duca avatar allanjoseph98 avatar benediktwerner avatar cmgchess avatar drainwordlee avatar dtpc avatar fynsta avatar haonrekcef avatar hitansh1299 avatar jdart1 avatar lescyclopes avatar louchesprocket avatar malorra avatar masynchin avatar niklasf avatar ornicar avatar pierotofy avatar rested avatar rizka10 avatar rpdelaney avatar sandromartens avatar sebasf1349 avatar sergi-nda avatar sgen avatar theforkpower avatar themadsword avatar xaverh avatar zinkelburger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chess-openings's Issues

Italian Game, Paris Defence

The Paris Defence (3. ... d6) is referenced by WikiBooks, Wikipedia, and is displayed on the lefthand pane of lichess' analysis board.

It's rare but seems documented enough to be added to opening explorer.

What is the publish workflow for?

Everytime I sync the commits from the lichess branch in my personal branch I get an mail saying something went wrong? But I dont know what this is

image
image

Delayed Alapin inconsistencies

We have:

b.tsv:B22	Sicilian Defense: Alapin Variation	1. e4 c5 2. c3
b.tsv:B22	Sicilian Defense: Alapin Variation, Barmen Defense	1. e4 c5 2. c3 d5 3. exd5 Qxd5
b.tsv:B22	Sicilian Defense: Alapin Variation, Barmen Defense, Central Exchange	1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 cxd4 5. cxd4 Nc6 6. Nf3 Bg4
b.tsv:B22	Sicilian Defense: Alapin Variation, Barmen Defense, Endgame Variation	1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 cxd4 5. cxd4 Nc6 6. Nf3 Bg4 7. Nc3 Bxf3 8. gxf3 Qxd4 9. Qxd4 Nxd4
b.tsv:B22	Sicilian Defense: Alapin Variation, Barmen Defense, Milner-Barry Attack	1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 Nc6 5. Nf3 cxd4 6. cxd4 e5 7. Nc3 Bb4 8. Be2
b.tsv:B22	Sicilian Defense: Alapin Variation, Barmen Defense, Modern Line	1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 Nf6 5. Nf3 Bg4
b.tsv:B22	Sicilian Defense: Alapin Variation, Smith-Morra Declined	1. e4 c5 2. c3 Nf6 3. e5 Nd5 4. d4 cxd4
b.tsv:B22	Sicilian Defense: Alapin Variation, Stoltz Attack	1. e4 c5 2. c3 Nf6 3. e5 Nd5 4. Nf3 Nc6 5. Bc4 Nb6 6. Bb3
b.tsv:B22	Sicilian Defense: Alapin Variation, Stoltz Attack, Ivanchuk Line	1. e4 c5 2. c3 Nf6 3. e5 Nd5 4. Nf3 Nc6 5. Bc4 Nb6 6. Bb3 c4 7. Bc2 Qc7 8. Qe2 g5
b.tsv:B40	Sicilian Defense: Alapin Variation, Sherzer Variation	1. e4 c5 2. Nf3 e6 3. c3 Nf6 4. e5 Nd5 5. d4 Nc6
b.tsv:B40	Sicilian Defense: Delayed Alapin Variation	1. e4 c5 2. Nf3 e6 3. c3
b.tsv:B50	Sicilian Defense: Delayed Alapin Variation	1. e4 c5 2. Nf3 d6 3. c3
b.tsv:B50	Sicilian Defense: Delayed Alapin Variation, Basman-Palatnik Double Gambit	1. e4 c5 2. Nf3 d6 3. c3 Nf6 4. Be2 Nc6 5. d4 cxd4 6. cxd4 Nxe4 7. d5 Qa5+ 8. Nc3 Nxc3 9. bxc3
b.tsv:B50	Sicilian Defense: Delayed Alapin Variation, Basman-Palatnik Gambit	1. e4 c5 2. Nf3 d6 3. c3 Nf6 4. Be2 Nc6 5. d4 cxd4 6. cxd4 Nxe4

Questions to resolve:

  • Should both 2... e6 3. c3 and 2... d6 3. c3 be called Delayed Alapin?
  • What about other second moves by black?
  • Does the Basman-Palatnik Gambit really belong to the Delayed Alapin, or is there a canonical move order under the regular Alapin?

Source?

Could you specify the source of the opening names? They are different (but better because more detailed and logical!) from what I usually find in ECO databases.

Misc issues and improvements

Forwarding some potential issue with an earlier version of the database:

  • Pirc Defense being 1. e4 d6 makes no sense. The Pirc is 1. e4 d6 2. d4 Nf6 3. Nc3 g6
  • The King's Indian Defense is some exchange variation? It should just be 1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6
  • Queen's Gambit Accepted: Just 1. d4 d5 2. c4 dxc4
  • Horwitz Defense is 1. d4 e6, which is highly transpositional. I don't get why it's some weird Englund gambit (1. d4 e5 2. c4)
  • Nimzo-Indian Defense: It seems like it's a b6 variation. The Nimzo-Indian Defense is just 1. d4 Nf6 2. c4 e6 3. Nc3 Bb4
  • The Blackmar-Diemer Gambit is just 1. d4 d5 2. e4
  • IDK why there's a standalone KG. It's either accepted or declined :P IG some programmatic thing where variations which weren't included get their own category
  • 1. d4 Nf6 2. c4 d6 3. Nc3 e5 4. Nf3 Nbd7 5. e4 is the Old Indian Defense. Maybe could be cut off on move 3 as that's still unique for the OID

cc @Truthdoc

Json validity of the entire file

Each line is valid json, but the file as a whole is not valid. This makes it a bit hard to work with e.g. using the json module in python. Would you accept a pull request on this?

Duplicate names with differing moves

One thing I came across while using the data was stuff like this

https://github.com/niklasf/eco/blob/master/c.tsv#L331

eco name fen moves
C30 King's Gambit Declined: Classical Variation rnbqk1nr/ppp2ppp/3p4/2b1p3/4PP2/2P2N2/PP1P2PP/RNBQKB1R b KQkq - e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 c2c3
C30 King's Gambit Declined: Classical Variation rnbqk1nr/pppp1ppp/8/2b1p3/4PP2/8/PPPP2PP/RNBQKBNR w KQkq - e2e4 e7e5 f2f4 f8c5

Where two or lines would have the same name yet one would have additional continuations/moves. I'm not sure whether this is intended or not or if this is even a problem but when I see

eco name fen moves
C30 King's Gambit Declined: Classical Variation, Rotlewi Countergambit rnbqk1nr/ppp2ppp/3p4/2b1p3/1P2PP2/5N2/P1PP2PP/RNBQKB1R b KQkq - e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 b2b4

I feel like King's Gambit Declined: Classical Variation should be e2e4 e7e5 f2f4 f8c5 since it is a strict subset of e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 b2b4. Whereas e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 c2c3 should perhaps be called e.g. King's Gambit Declined: Classical Variation, Classical Continuation or something.

Would be interested to know your thoughts!

Missing Line

Line Name: Sicilian Defense: Ginsberg Gambit
ECO Code: B55(Probably)

Line:
e2e4 c7c5 g1f3 d7d6 d2d4 c5d4 f3d4 g8f6 f1c4

should we include The Cow

Redundant and contradictory entries

I noticed there are some contradictory entries with the same name, for example the Kadas Opening: Kadas Gambit:

A00	Kadas Opening: Kadas Gambit	1. h4 c5 2. b4
A00	Kadas Opening: Kadas Gambit	1. h4 d5 2. d4 c5 3. Nf3 cxd4 4. c3
A00	Kadas Opening: Kadas Gambit	1. h4 e5 2. d4 exd4 3. c3

Also, some entries seem redundant; for example, of the following, I suspect it would be sufficient to only include the final one:

A01	Nimzo-Larsen Attack: Modern Variation	1. b3 e5
A01	Nimzo-Larsen Attack: Modern Variation	1. b3 e5 2. Bb2
A01	Nimzo-Larsen Attack: Modern Variation	1. b3 e5 2. Bb2 Nc6
A01	Nimzo-Larsen Attack: Modern Variation	1. b3 e5 2. Bb2 Nc6 3. e3

Are these intentional? If so, what is the rationale behind them?

Indian Game vs Indian Defense

What is the standpoint on this? Wikipedia says "Indian Defense" and the sublines have "Defense" in their name as well (Queen's Indian Defense, Nimzo-Indian Defense...)

duplicate fen in std_eco

"rnb1kbnr/ppp2p1p/8/3B2p1/4Pp1q/6P1/PPPP3P/RNBQ1KNR b kq -" => Opening { eco: "C33", name: "King's Gambit Accepted, bishop's gambit, Gifford variation" },

"rnb1kbnr/ppp2p1p/8/3B2p1/4Pp1q/6P1/PPPP3P/RNBQ1KNR b kq -" => Opening { eco: "C33", name: "King's Gambit Accepted, bishop's gambit, Chigorin's attack" },

"rnbqkb1r/pppp1ppp/5n2/8/2BpP3/5N2/PPP2PPP/RNBQK2R b KQkq -" => Opening { eco: "C43", name: "Petrov, Urusov gambit" }


"rnb1kb1r/pp3pp1/2p1pq1p/3p4/2PP4/1QN2N2/PP2PPPP/R3KB1R b KQkq -" => Opening { eco: "D43", name: "Queen's Gambit Declined semi-Slav, Hastings variation" },


"rnbqkb1r/pp2pp1p/2p2np1/3p4/2PP4/2N2N2/PP2PPPP/R1BQKB1R w KQkq -" => Opening { eco: "D90", name: "Gruenfeld, Schlechter variation" },

ECO Code Accuracy

Thanks for your great list! Having looked at a lot of ECO databases, I gather that accuracy is a challenge. I have found the Scid ECO database to be mostly accurate, so in case it's helpful, I generated a list of ECO-code differences between your list and Scid's (only for those paths that can be matched to equal ply depth). In the attached file, your ECO code is in the first column and Scid's (simplified) code is in the second column. After that is the move list.
ecocomp.txt

Datasets have multiple duplicate lines

The datasets have multiple duplicate lines in them. Take for eg L30 and L31 in the b.tsv. I'd assume this pollutes the db. I suggest adding a python script to remove duplicate lines, lines where all the ECO, full name and the PGN is the same, that runs when you make. I'd be happy to create a PR if it makes sense.

gen.py argparse

import io
import re
import sys
import argparse  # Added argparse for better command-line argument parsing
from typing import Dict, List, TextIO

try:
    import chess
    import chess.pgn
except ImportError:
    print("Need python-chess:", file=sys.stderr)
    print("$ pip3 install chess", file=sys.stderr)
    print(file=sys.stderr)
    raise

ECO_REGEX = re.compile(r"^[A-E]\d\d\Z")
INVALID_SPACE = re.compile(r"\s{2,}|^\s|\s\Z|\s,")
INVALID_WITH = re.compile(r"[^,:]\swith\b")

class Stats:
    def __init__(self) -> None:
        self.errors = 0
        self.warnings = 0

class Reporter:
    def __init__(self, stats: Stats, file_name: str) -> None:
        self.stats = stats
        self.file_name = file_name

    def error(self, lno: int, err_msg: str) -> None:
        print(f"::error file={self.file_name},line={lno}::{err_msg}", file=sys.stderr)
        self.stats.errors += 1

    def warning(self, lno: int, err_msg: str) -> None:
        print(f"::warning file={self.file_name},line={lno}::{err_msg}", file=sys.stderr)
        self.stats.warnings += 1

def parse_args():
    parser = argparse.ArgumentParser(description="Chess Opening Data Validator")
    parser.add_argument("input_files", nargs='+', help="Input TSV files to process")
    parser.add_argument("--disable-warnings", action="store_true", help="Disable warning checks")
    return parser.parse_args()

def main(f: TextIO, reporter: Reporter, by_epd: Dict[str, List[str]], shortest_by_name: Dict[str, int]) -> None:
    prev_eco = ""
    prev_name = ""

    for lno, line in enumerate(f, 1):
        cols = line.rstrip("\n").split("\t")

        if len(cols) != 3:
            reporter.error(lno, f"expected 3 columns, got {len(cols)}")
            continue

        if lno == 1:
            if cols != ["eco", "name", "pgn"]:
                reporter.error(lno, f"expected eco, name, pgn")
            continue

        eco, name, pgn = cols

        if not ECO_REGEX.match(eco):
            reporter.error(lno, f"invalid eco")
            continue

        if INVALID_SPACE.search(name):
            reporter.error(lno, f"invalid whitespace in name")
            continue

        try:
            board = chess.pgn.read_game(io.StringIO(pgn), Visitor=chess.pgn.BoardBuilder)
        except ValueError as err:
            reporter.error(lno, f"{err}")
            continue

        if not board:
            reporter.error(lno, f"Empty pgn")
            continue

        allowed_lowers = ["with", "de", "der", "del", "von", "and"]
        if not all([word[0].isupper() for word in re.split(r"\s|-", name) if word not in allowed_lowers and word.isalpha()]):
            reporter.warning(lno, f"{name!r} word(s) beginning with lowercase letters")

        if INVALID_WITH.search(name):
            reporter.warning(lno, f"'with' not separated with ',' or ':'")

        for blacklisted in ["refused"]:
            if blacklisted in name.lower():
                reporter.warning(lno, f"blacklisted word ({blacklisted!r} in {name!r})")

        if shortest_by_name.get(name, -1) == len(board.move_stack):
            reporter.warning(lno, f"{name!r} does not have a unique shortest line")
        try:
            shortest_by_name[name] = min(shortest_by_name[name], len(board.move_stack))
        except KeyError:
            shortest_by_name[name] = len(board.move_stack)

        clean_pgn = chess.Board().variation_san(board.move_stack)
        if clean_pgn != pgn:
            reporter.error(lno, f"unclean pgn: expected {clean_pgn!r}, got {pgn!r}")

        if name.count(":") > 1:
            reporter.error(lno, f"multiple ':' in name: {name}")

        epd = board.epd()
        if epd in by_epd:
            reporter.error(lno, f"duplicate epd: {by_epd[epd]}")
        else:
            by_epd[epd] = cols

        if eco < prev_eco:
            reporter.error(lno, f"not ordered by eco ({eco} after {prev_eco})")
        elif (eco, name) < (prev_eco, prev_name):
            reporter.error(lno, f"not ordered by name ({name!r} after {prev_name!r})")
        prev_eco = eco
        prev_name = name

        print(eco, name, clean_pgn, " ".join(m.uci() for m in board.move_stack), epd, sep="\t")

if __name__ == "__main__":
    args = parse_args()
    print("eco", "name", "pgn", "uci", "epd", sep="\t")

    stats = Stats()
    by_epd: Dict[str, List[str]] = {}
    shortest_by_name: Dict[str, int] = {}
    for file_name in args.input_files:
        with open(file_name) as f:
            main(f, Reporter(stats, file_name), by_epd, shortest_by_name)
    if stats.errors:
        sys.exit(1)

Blackburne Shilling Kostić?

Re naming of the below:

a) C50 Blackburne Shilling Gambit 1. e4 e5 2. Nf3 Nc6 3. Bc4 Nd4 4. Nxe5 Qg5 5. Nxf7 Qxg2 6. Rf1 Qxe4+ 7. Be2 Nf3#
b) C50 Italian Game: Schilling-Kostic Gambit 1. e4 e5 2. Nf3 Nc6 3. Bc4 Nd4

I believe the legend behind this is that Blackburne played games for a shilling using this line.
Looking around the web, it sounds like in version b), "Schilling" comes from a misunderstanding that "shilling" was a name.
Borislav Kostić was apparently the person who actually played the line in recorded games.

If this is accurate, either "Blackburne Shilling-Kostić", or "Blackburne-Kostić", or "Kostić" would make sense as the name for b), and likely the middle one strikes the right balance between accuracy and common usage.

Also note the name is spelled Kostić, not Kostic.

A separate issue is whether the current a) entry makes sense as an inclusion in the opening database given it's a full game and a "mainline" extension of b)?

If it should be kept, it would sound like a) should be classified along the lines of: "Italian Game: Blackburne-Kostić Gambit, Blackburne Shilling Variation" to make it consistent.

Therefore I would propose to change the names as follows:
a) to "Italian Game: Blackburne-Kostić Gambit, Blackburne Shilling Variation"
b) to "Italian Game: Blackburne-Kostić Gambit"

Relevant wikiepedia entry: https://en.wikipedia.org/wiki/Blackburne_Shilling_Gambit

Bogoljubo{v,w} inconsistency

Currently, there are a instances, where Bogoljubo{v,w} is spelled with either v or w.
It would be reasonable to unify the spelling but I'm not quite sure in which direction.
Bogoljubow makes sense as he was naturalized German citizen and e.g. that's how it is written on his grave, cf. https://en.wikipedia.org/wiki/Efim_Bogoljubow#/media/File:Bogoljubow_Grabstein_Triberg.JPG
On the other hand, the spelling Bogoljubov is more readable to non-German speakers and also more consistent with the handling of Russian names in this library in general (cf. Yusupov v Jussupow).

C33: Van Geet Opening: Nowokunski Gambit

I think this wound up in the C list through an error in transpositions that did not take into account the option for en passant.

The two conflicting transpositions:

  1. The King's Gambit accepted route - 1. e4 e5 2. f4 exf4 3. Nc3 - Wikipedia and chesscom refer to it as the Mason-Keres Gambit. 365chess refers to it as Keres (Mason-Steinitz) Gambit.
  2. The Van Geet route - 1. Nc3 e5 2. f4 exf4 3. e4 - seems to transpose but here there is the option for en passant (holy hell). Reference from chesstempo

These should be treated as distinct positions. Nowakunski belongs back in the A00 uncommon openings section. Some variation of Mason-Keres(Steinitz?) should replace it in C33.

Giuoco Piano and Italian Game

In the database the line "1.e4 e5 2.Nf3 Nc6 3.Bc4 Bc5" is called the "Giuoco Piano", which is correct I think. But then every other line continuing from there is called "Italian Game: ...". There's also "Italian Game: Giuoco Piano, ...". But then the first line above should also be named "Italian Game: Giuoco Piano". This seems to be somewhat inconsistent, or is there an explanation?

Ensure openings follow parent line

The DB already follows the format Opening family: variation1, var2, var3... It follows that French Defense: Advance Variation derives from French Defence. We should also ensure that the corresponding pgns also follow this logic: 1. e4 e6 2. d4 d5 3. e5 should be derived from 1. e4 e6 2. d4 d5 and not 1. d4 e6 2. e4 d5. This is already the case for most openings. However, there are still a lot of bad transpositions.

To fix them we should:

  1. Ensure every Opening family has a unique shortest pgn #55 - WIP
  2. Use the fixed families to fix bad var1's, use fixed var1's to fix var2's etc.
  3. If name ends in "...Gambit", ensure "...Gambit Accepted" and "...Gambit Declined" are children of "Gambit"

Wrong opening name or move?

  1. e4 c5 2. b3 d5 3. Bb2 is called B20 Caro-Kann Defense: Euwe Attack, Prins Gambit. This seems to be a mistake, as this opening should clearly be a Sicilian. Chess.com calls this opening Sicilian Defense: Snyder, Euwe Attack, Prins Gambit. Either this opening is named incorrectly or the moves should be 1. e4 c6 2. b3 d5 3. Bb2
    Interestingly, the latter has been played a lot more often on Lichess and in master games.

What makes a unique opening (variation)?

So i'm currently using this dataset to do some analysis on how openings transposition into each other. I realized there are some positions with the same name, but different move orders, positions and different eco codes ( eg Queens Pawn Game has five entries in A40, A41 and D00. The Caro-Kann has six entries in B10, B12 and B15). So if neither name or eco is unique, what could you use for one variation?

Related: Wikipedia and Wikibooks refer to all 1. d4 d5 openings as Closed Game or Double Queens Pawn Game, so maybe we could differentiate this from other 1. d4 positions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.