lichess-org / chess-openings Goto Github PK
View Code? Open in Web Editor NEWAn aggregated data set of chess opening names
License: Creative Commons Zero v1.0 Universal
An aggregated data set of chess opening names
License: Creative Commons Zero v1.0 Universal
I am looking for a dataset in which I would have the FEN for each position...
We have:
b.tsv:B22 Sicilian Defense: Alapin Variation 1. e4 c5 2. c3
b.tsv:B22 Sicilian Defense: Alapin Variation, Barmen Defense 1. e4 c5 2. c3 d5 3. exd5 Qxd5
b.tsv:B22 Sicilian Defense: Alapin Variation, Barmen Defense, Central Exchange 1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 cxd4 5. cxd4 Nc6 6. Nf3 Bg4
b.tsv:B22 Sicilian Defense: Alapin Variation, Barmen Defense, Endgame Variation 1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 cxd4 5. cxd4 Nc6 6. Nf3 Bg4 7. Nc3 Bxf3 8. gxf3 Qxd4 9. Qxd4 Nxd4
b.tsv:B22 Sicilian Defense: Alapin Variation, Barmen Defense, Milner-Barry Attack 1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 Nc6 5. Nf3 cxd4 6. cxd4 e5 7. Nc3 Bb4 8. Be2
b.tsv:B22 Sicilian Defense: Alapin Variation, Barmen Defense, Modern Line 1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 Nf6 5. Nf3 Bg4
b.tsv:B22 Sicilian Defense: Alapin Variation, Smith-Morra Declined 1. e4 c5 2. c3 Nf6 3. e5 Nd5 4. d4 cxd4
b.tsv:B22 Sicilian Defense: Alapin Variation, Stoltz Attack 1. e4 c5 2. c3 Nf6 3. e5 Nd5 4. Nf3 Nc6 5. Bc4 Nb6 6. Bb3
b.tsv:B22 Sicilian Defense: Alapin Variation, Stoltz Attack, Ivanchuk Line 1. e4 c5 2. c3 Nf6 3. e5 Nd5 4. Nf3 Nc6 5. Bc4 Nb6 6. Bb3 c4 7. Bc2 Qc7 8. Qe2 g5
b.tsv:B40 Sicilian Defense: Alapin Variation, Sherzer Variation 1. e4 c5 2. Nf3 e6 3. c3 Nf6 4. e5 Nd5 5. d4 Nc6
b.tsv:B40 Sicilian Defense: Delayed Alapin Variation 1. e4 c5 2. Nf3 e6 3. c3
b.tsv:B50 Sicilian Defense: Delayed Alapin Variation 1. e4 c5 2. Nf3 d6 3. c3
b.tsv:B50 Sicilian Defense: Delayed Alapin Variation, Basman-Palatnik Double Gambit 1. e4 c5 2. Nf3 d6 3. c3 Nf6 4. Be2 Nc6 5. d4 cxd4 6. cxd4 Nxe4 7. d5 Qa5+ 8. Nc3 Nxc3 9. bxc3
b.tsv:B50 Sicilian Defense: Delayed Alapin Variation, Basman-Palatnik Gambit 1. e4 c5 2. Nf3 d6 3. c3 Nf6 4. Be2 Nc6 5. d4 cxd4 6. cxd4 Nxe4
Questions to resolve:
2... e6 3. c3
and 2... d6 3. c3
be called Delayed Alapin?Basman-Palatnik Gambit
really belong to the Delayed Alapin, or is there a canonical move order under the regular Alapin?Could you specify the source of the opening names? They are different (but better because more detailed and logical!) from what I usually find in ECO databases.
Forwarding some potential issue with an earlier version of the database:
- Pirc Defense being 1. e4 d6 makes no sense. The Pirc is 1. e4 d6 2. d4 Nf6 3. Nc3 g6
- The King's Indian Defense is some exchange variation? It should just be 1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6
- Queen's Gambit Accepted: Just 1. d4 d5 2. c4 dxc4
- Horwitz Defense is 1. d4 e6, which is highly transpositional. I don't get why it's some weird Englund gambit (1. d4 e5 2. c4)
- Nimzo-Indian Defense: It seems like it's a b6 variation. The Nimzo-Indian Defense is just 1. d4 Nf6 2. c4 e6 3. Nc3 Bb4
- The Blackmar-Diemer Gambit is just 1. d4 d5 2. e4
- IDK why there's a standalone KG. It's either accepted or declined :P IG some programmatic thing where variations which weren't included get their own category
- 1. d4 Nf6 2. c4 d6 3. Nc3 e5 4. Nf3 Nbd7 5. e4 is the Old Indian Defense. Maybe could be cut off on move 3 as that's still unique for the OID
cc @Truthdoc
Each line is valid json, but the file as a whole is not valid. This makes it a bit hard to work with e.g. using the json module in python. Would you accept a pull request on this?
One thing I came across while using the data was stuff like this
https://github.com/niklasf/eco/blob/master/c.tsv#L331
eco | name | fen | moves |
---|---|---|---|
C30 | King's Gambit Declined: Classical Variation | rnbqk1nr/ppp2ppp/3p4/2b1p3/4PP2/2P2N2/PP1P2PP/RNBQKB1R b KQkq - | e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 c2c3 |
C30 | King's Gambit Declined: Classical Variation | rnbqk1nr/pppp1ppp/8/2b1p3/4PP2/8/PPPP2PP/RNBQKBNR w KQkq - | e2e4 e7e5 f2f4 f8c5 |
Where two or lines would have the same name yet one would have additional continuations/moves. I'm not sure whether this is intended or not or if this is even a problem but when I see
eco | name | fen | moves |
---|---|---|---|
C30 | King's Gambit Declined: Classical Variation, Rotlewi Countergambit | rnbqk1nr/ppp2ppp/3p4/2b1p3/1P2PP2/5N2/P1PP2PP/RNBQKB1R b KQkq - | e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 b2b4 |
I feel like King's Gambit Declined: Classical Variation
should be e2e4 e7e5 f2f4 f8c5
since it is a strict subset of e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 b2b4
. Whereas e2e4 e7e5 f2f4 f8c5 g1f3 d7d6 c2c3
should perhaps be called e.g. King's Gambit Declined: Classical Variation, Classical Continuation
or something.
Would be interested to know your thoughts!
Line Name: Sicilian Defense: Ginsberg Gambit
ECO Code: B55(Probably)
Line:
e2e4 c7c5 g1f3 d7d6 d2d4 c5d4 f3d4 g8f6 f1c4
chess.com has decided to list The Cow in its opening list. should we too?
https://www.chess.com/openings/The-Cow
https://www.chess.com/openings/The-Cow-2...e5
https://www.chess.com/openings/The-Cow-2...e5-3.Ne2
https://www.chess.com/openings/The-Cow-2...e5-3.Ne2-Bd6
https://www.chess.com/openings/The-Cow-2...e5-3.Ne2-Bd6-4.Ng3
https://www.chess.com/openings/The-Cow...3.Ne2-Bd6-4.Ng3-Nf6
https://www.chess.com/openings/The-Cow...3.Ne2-Bd6-4.Ng3-Nf6-5.Nd2
https://www.chess.com/openings/The-Cow...4.Ng3-Nf6-5.Nd2-O-O
https://www.chess.com/openings/The-Cow-Bull-Variation
I noticed there are some contradictory entries with the same name, for example the Kadas Opening: Kadas Gambit
:
A00 Kadas Opening: Kadas Gambit 1. h4 c5 2. b4
A00 Kadas Opening: Kadas Gambit 1. h4 d5 2. d4 c5 3. Nf3 cxd4 4. c3
A00 Kadas Opening: Kadas Gambit 1. h4 e5 2. d4 exd4 3. c3
Also, some entries seem redundant; for example, of the following, I suspect it would be sufficient to only include the final one:
A01 Nimzo-Larsen Attack: Modern Variation 1. b3 e5
A01 Nimzo-Larsen Attack: Modern Variation 1. b3 e5 2. Bb2
A01 Nimzo-Larsen Attack: Modern Variation 1. b3 e5 2. Bb2 Nc6
A01 Nimzo-Larsen Attack: Modern Variation 1. b3 e5 2. Bb2 Nc6 3. e3
Are these intentional? If so, what is the rationale behind them?
What is the standpoint on this? Wikipedia says "Indian Defense" and the sublines have "Defense" in their name as well (Queen's Indian Defense, Nimzo-Indian Defense...)
"rnb1kbnr/ppp2p1p/8/3B2p1/4Pp1q/6P1/PPPP3P/RNBQ1KNR b kq -" => Opening { eco: "C33", name: "King's Gambit Accepted, bishop's gambit, Gifford variation" },
"rnb1kbnr/ppp2p1p/8/3B2p1/4Pp1q/6P1/PPPP3P/RNBQ1KNR b kq -" => Opening { eco: "C33", name: "King's Gambit Accepted, bishop's gambit, Chigorin's attack" },
"rnbqkb1r/pppp1ppp/5n2/8/2BpP3/5N2/PPP2PPP/RNBQK2R b KQkq -" => Opening { eco: "C43", name: "Petrov, Urusov gambit" }
"rnb1kb1r/pp3pp1/2p1pq1p/3p4/2PP4/1QN2N2/PP2PPPP/R3KB1R b KQkq -" => Opening { eco: "D43", name: "Queen's Gambit Declined semi-Slav, Hastings variation" },
"rnbqkb1r/pp2pp1p/2p2np1/3p4/2PP4/2N2N2/PP2PPPP/R1BQKB1R w KQkq -" => Opening { eco: "D90", name: "Gruenfeld, Schlechter variation" },
Thanks for your great list! Having looked at a lot of ECO databases, I gather that accuracy is a challenge. I have found the Scid ECO database to be mostly accurate, so in case it's helpful, I generated a list of ECO-code differences between your list and Scid's (only for those paths that can be matched to equal ply depth). In the attached file, your ECO code is in the first column and Scid's (simplified) code is in the second column. After that is the move list.
ecocomp.txt
The datasets have multiple duplicate lines in them. Take for eg L30 and L31 in the b.tsv. I'd assume this pollutes the db. I suggest adding a python script to remove duplicate lines, lines where all the ECO, full name and the PGN is the same, that runs when you make. I'd be happy to create a PR if it makes sense.
import io
import re
import sys
import argparse # Added argparse for better command-line argument parsing
from typing import Dict, List, TextIO
try:
import chess
import chess.pgn
except ImportError:
print("Need python-chess:", file=sys.stderr)
print("$ pip3 install chess", file=sys.stderr)
print(file=sys.stderr)
raise
ECO_REGEX = re.compile(r"^[A-E]\d\d\Z")
INVALID_SPACE = re.compile(r"\s{2,}|^\s|\s\Z|\s,")
INVALID_WITH = re.compile(r"[^,:]\swith\b")
class Stats:
def __init__(self) -> None:
self.errors = 0
self.warnings = 0
class Reporter:
def __init__(self, stats: Stats, file_name: str) -> None:
self.stats = stats
self.file_name = file_name
def error(self, lno: int, err_msg: str) -> None:
print(f"::error file={self.file_name},line={lno}::{err_msg}", file=sys.stderr)
self.stats.errors += 1
def warning(self, lno: int, err_msg: str) -> None:
print(f"::warning file={self.file_name},line={lno}::{err_msg}", file=sys.stderr)
self.stats.warnings += 1
def parse_args():
parser = argparse.ArgumentParser(description="Chess Opening Data Validator")
parser.add_argument("input_files", nargs='+', help="Input TSV files to process")
parser.add_argument("--disable-warnings", action="store_true", help="Disable warning checks")
return parser.parse_args()
def main(f: TextIO, reporter: Reporter, by_epd: Dict[str, List[str]], shortest_by_name: Dict[str, int]) -> None:
prev_eco = ""
prev_name = ""
for lno, line in enumerate(f, 1):
cols = line.rstrip("\n").split("\t")
if len(cols) != 3:
reporter.error(lno, f"expected 3 columns, got {len(cols)}")
continue
if lno == 1:
if cols != ["eco", "name", "pgn"]:
reporter.error(lno, f"expected eco, name, pgn")
continue
eco, name, pgn = cols
if not ECO_REGEX.match(eco):
reporter.error(lno, f"invalid eco")
continue
if INVALID_SPACE.search(name):
reporter.error(lno, f"invalid whitespace in name")
continue
try:
board = chess.pgn.read_game(io.StringIO(pgn), Visitor=chess.pgn.BoardBuilder)
except ValueError as err:
reporter.error(lno, f"{err}")
continue
if not board:
reporter.error(lno, f"Empty pgn")
continue
allowed_lowers = ["with", "de", "der", "del", "von", "and"]
if not all([word[0].isupper() for word in re.split(r"\s|-", name) if word not in allowed_lowers and word.isalpha()]):
reporter.warning(lno, f"{name!r} word(s) beginning with lowercase letters")
if INVALID_WITH.search(name):
reporter.warning(lno, f"'with' not separated with ',' or ':'")
for blacklisted in ["refused"]:
if blacklisted in name.lower():
reporter.warning(lno, f"blacklisted word ({blacklisted!r} in {name!r})")
if shortest_by_name.get(name, -1) == len(board.move_stack):
reporter.warning(lno, f"{name!r} does not have a unique shortest line")
try:
shortest_by_name[name] = min(shortest_by_name[name], len(board.move_stack))
except KeyError:
shortest_by_name[name] = len(board.move_stack)
clean_pgn = chess.Board().variation_san(board.move_stack)
if clean_pgn != pgn:
reporter.error(lno, f"unclean pgn: expected {clean_pgn!r}, got {pgn!r}")
if name.count(":") > 1:
reporter.error(lno, f"multiple ':' in name: {name}")
epd = board.epd()
if epd in by_epd:
reporter.error(lno, f"duplicate epd: {by_epd[epd]}")
else:
by_epd[epd] = cols
if eco < prev_eco:
reporter.error(lno, f"not ordered by eco ({eco} after {prev_eco})")
elif (eco, name) < (prev_eco, prev_name):
reporter.error(lno, f"not ordered by name ({name!r} after {prev_name!r})")
prev_eco = eco
prev_name = name
print(eco, name, clean_pgn, " ".join(m.uci() for m in board.move_stack), epd, sep="\t")
if __name__ == "__main__":
args = parse_args()
print("eco", "name", "pgn", "uci", "epd", sep="\t")
stats = Stats()
by_epd: Dict[str, List[str]] = {}
shortest_by_name: Dict[str, int] = {}
for file_name in args.input_files:
with open(file_name) as f:
main(f, Reporter(stats, file_name), by_epd, shortest_by_name)
if stats.errors:
sys.exit(1)
Re naming of the below:
a) C50 Blackburne Shilling Gambit 1. e4 e5 2. Nf3 Nc6 3. Bc4 Nd4 4. Nxe5 Qg5 5. Nxf7 Qxg2 6. Rf1 Qxe4+ 7. Be2 Nf3#
b) C50 Italian Game: Schilling-Kostic Gambit 1. e4 e5 2. Nf3 Nc6 3. Bc4 Nd4
I believe the legend behind this is that Blackburne played games for a shilling using this line.
Looking around the web, it sounds like in version b), "Schilling" comes from a misunderstanding that "shilling" was a name.
Borislav Kostić was apparently the person who actually played the line in recorded games.
If this is accurate, either "Blackburne Shilling-Kostić", or "Blackburne-Kostić", or "Kostić" would make sense as the name for b), and likely the middle one strikes the right balance between accuracy and common usage.
Also note the name is spelled Kostić, not Kostic.
A separate issue is whether the current a) entry makes sense as an inclusion in the opening database given it's a full game and a "mainline" extension of b)?
If it should be kept, it would sound like a) should be classified along the lines of: "Italian Game: Blackburne-Kostić Gambit, Blackburne Shilling Variation" to make it consistent.
Therefore I would propose to change the names as follows:
a) to "Italian Game: Blackburne-Kostić Gambit, Blackburne Shilling Variation"
b) to "Italian Game: Blackburne-Kostić Gambit"
Relevant wikiepedia entry: https://en.wikipedia.org/wiki/Blackburne_Shilling_Gambit
After suggesting the idea in discord it was suggested for me to open an issue here.
https://en.wikipedia.org/wiki/Gunderam_Defense
Although its known as gunderam defense, the first one to play it was Hélder Câmara, 4 years before. Also known as the brazilian defense, i think it would be a nice name to have since there is no openings with the Brazil name.
We have a problem: incorrect description on the 1. e4 page
https://lichess.org/opening/Kings_Pawn_Game
Currently, there are a instances, where Bogoljubo{v,w} is spelled with either v or w.
It would be reasonable to unify the spelling but I'm not quite sure in which direction.
Bogoljubow makes sense as he was naturalized German citizen and e.g. that's how it is written on his grave, cf. https://en.wikipedia.org/wiki/Efim_Bogoljubow#/media/File:Bogoljubow_Grabstein_Triberg.JPG
On the other hand, the spelling Bogoljubov is more readable to non-German speakers and also more consistent with the handling of Russian names in this library in general (cf. Yusupov v Jussupow).
When I was analyzing the King's gambit in detail, I just realized that in the Opening Explorer, when you play 1.e4 e5 2.f4 exf4 3.Nf3 it says "King's gambit : King's Knight gambit" but when you play g5 it says "King's gambit : King Knight's gambit".
https://lichess.org/forum/lichess-feedback/kings-knight-gambit-vs-king-knights-gambit
I'm unsure which it is supposed to be.
I think this wound up in the C list through an error in transpositions that did not take into account the option for en passant.
The two conflicting transpositions:
These should be treated as distinct positions. Nowakunski belongs back in the A00 uncommon openings section. Some variation of Mason-Keres(Steinitz?) should replace it in C33.
In the database the line "1.e4 e5 2.Nf3 Nc6 3.Bc4 Bc5" is called the "Giuoco Piano", which is correct I think. But then every other line continuing from there is called "Italian Game: ...". There's also "Italian Game: Giuoco Piano, ...". But then the first line above should also be named "Italian Game: Giuoco Piano". This seems to be somewhat inconsistent, or is there an explanation?
The DB already follows the format Opening family: variation1, var2, var3...
It follows that French Defense: Advance Variation derives from French Defence. We should also ensure that the corresponding pgns also follow this logic: 1. e4 e6 2. d4 d5 3. e5
should be derived from 1. e4 e6 2. d4 d5
and not 1. d4 e6 2. e4 d5
. This is already the case for most openings. However, there are still a lot of bad transpositions.
To fix them we should:
name
ends in "...Gambit", ensure "...Gambit Accepted" and "...Gambit Declined" are children of "Gambit"Encyplopedia of Chess Openings data set
The correct spelling is "Encyclopedia"
When KE2 is played and black replies with KE7, it should be known as the Zelensky Defence in light of Ukraine president's efforts to fight against russia
FEN: rnbq1bnr/ppppkppp/8/4p3/4P3/8/PPPPKPPP/RNBQ1BNR w - - 2 3
So i'm currently using this dataset to do some analysis on how openings transposition into each other. I realized there are some positions with the same name, but different move orders, positions and different eco codes ( eg Queens Pawn Game
has five entries in A40
, A41
and D00
. The Caro-Kann has six entries in B10
, B12
and B15
). So if neither name or eco is unique, what could you use for one variation?
Related: Wikipedia and Wikibooks refer to all 1. d4 d5
openings as Closed Game
or Double Queens Pawn Game
, so maybe we could differentiate this from other 1. d4
positions?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.