Don't know much about NES in general, but I had a thought: AFAIK, dumping NES is difficult because it requires knowing PRG/CHR size beforehand. Hence the need for manually sourcing and inputting these values from places like http://nes.dnsabr.com. Cart dumper automates this by storing a database of hashes of globally known seekable sections of the PRG ROM. However, this does not always work because there can be conflicts when these sections are identical between different games.
My mind immediately jumps to a b-tree index type strategy: Progressively scan and hash the PRG and CHR ROMs using the current match set to increase the seekable area and narrow down possibilities.
Let's pretend we have a database of the entire NES library which contains hashes for the first n
bytes of each PRG ROM:
name |
PRG ROM size |
16k CRC |
128k CRC |
512k CRC |
Mario |
16k |
9f115a9e |
- |
- |
Zelda |
128k |
a3b3e36e |
53b88a7a |
- |
Contra |
128k |
a3b3e36e |
b0c8c11e |
- |
Kirby |
512k |
a3b3e36e |
2864f7cc |
4f3ad289 |
Tetris |
512k |
a3b3e36e |
2864f7cc |
d2dce641 |
Gradius |
512k |
dd611655 |
f5525447 |
707c2b2f |
Frogger |
512k |
4bf93c55 |
3b2dc183 |
12c70afd |
PacMan |
512k |
4bf93c55 |
3b2dc183 |
8c7869e6 |
Now we are dumping a cart. We know the minimum size is 16k so it is safe to read and hash the first 16k. Doing so produces a3b3e36e
. This matches Zelda, Contra, Kirby, and Tetris. Of the 4, the smallest size is 128k, so we continue reading and hashing to 128k. Now we produce 2864f7cc
which matches Kirby and Tetris. Both games are 512k, so we continue reading and hashing to 512k and produce d2dce641
which matches Tetris.
Theoretically this would resolve all ambiguity except for (very rare) cases in which both the PRG ROM and CHR ROM begin with the entirety of another PRG ROM and CHR ROM.
The database could be minimized drastically to only hashes required to resolve conflicts. For example, there is no reason to store the 128k and 512k hashes for Gradius because its 16k hash is unique. Same with the 128k hashes for Frogger and PacMan as they provide no disambiguation. In fact, rather than storing a flat lookup table, you could store an index-tree-like structure instead that contained disambiguation instructions:
Read 16k
β 9f115a9e: Mario
β a3b3e36e: Read 128k
β β 53b88a7a: Zelda
β β b0c8c11e: Contra
β β 2864f7cc: Read 512k
β β 4f3ad289: Kirby
β β d2dce641: Tetris
β dd611655: Gradius
β 4bf93c55: Read 512k
β 12c70afd: Frogger
β 8c7869e6: PacMan
I tested this theory on a headerless no-intro ROM set using NES2.0 DB. It was able to index and distinguish all but 21 of the 3,560 ROMs. The Virtual Console and cassette dumps can be excluded which brings the number down to 9, only 3 of which are "standard" games:
9FFE2F55 PRG:65536 CHR:131072
ββ 9FFE2F55 Sky Shark (USA) - PRG:65536 CHR:131072
ββ 4AF742FA Sky Shark (USA) (Rev 1) - PRG:131072 CHR:131072
E41220D8 PRG:262144 CHR:0
ββ E41220D8 Assimilate (USA) (RetroUSB) (Aftermarket) (Homebrew) - PRG:262144 CHR:0
ββ 7145F667 Assimilate (USA) (RetroUSB) (Aftermarket) (Homebrew) (Alt) - PRG:524288 CHR:0
CD8233EF PRG:16384 CHR:8192
ββ 2F55BE88 Lunar Ball (Japan) - PRG:16384 CHR:8192
ββ 80CBCACB Golden Game 100-in-1 (Asia) (En) (Pirate) - PRG:1048576 CHR:0
ββ 6175B9A0 Golden Game 150-in-1 (Asia) (En) (Pirate) - PRG:2097152 CHR:0
ββ 46A1AE7B Golden Game 210-in-1 (Asia) (En) (Pirate) - PRG:2097152 CHR:0
ββ 4E5668A9 Golden Game 260-in-1 (Asia) (En) (Pirate) - PRG:3145728 CHR:0
20F98977 PRG:16384 CHR:16384
ββ 20F98977 City Connection (Japan) - PRG:16384 CHR:16384
ββ D20775DA City Connection (Japan) (Virtual Console, Switch Online) - PRG:32768 CHR:16384
0F05FF0A PRG:32768 CHR:8192
ββ 0F05FF0A Seicross (Japan) (Rev 1) - PRG:32768 CHR:8192
ββ 3413E33B Seicross (Japan) (Virtual Console) - PRG:32768 CHR:16384
E37A39AB PRG:131072 CHR:65536
ββ E37A39AB Yoshi's Cookie (Europe) - PRG:131072 CHR:65536
ββ CAA76927 Yoshi's Cookie (Europe) (Virtual Console) - PRG:131072 CHR:131072
A2623BC1 PRG:131072 CHR:131072
ββ A2623BC1 Nantettatte!! Baseball (Japan) - PRG:131072 CHR:131072
ββ 6C039D11 Nantettatte!! Baseball + Nantettatte!! Baseball - Ko-Game Cassette - '91 Kaimaku Hen (Japan) - PRG:147456 CHR:131072
ββ A5275B36 Nantettatte!! Baseball + Nantettatte!! Baseball - Ko-Game Cassette - OB All Star Hen (Japan) - PRG:147456 CHR:131072
ADFAD6B6 PRG:131072 CHR:0
ββ ADFAD6B6 Karaoke Studio (Japan) - PRG:131072 CHR:0
ββ 4B6EF399 Karaoke Studio Senyou Cassette - Top Hit 20 Vol. 1 (Japan) - PRG:262144 CHR:0
ββ 50F3E338 Karaoke Studio Senyou Cassette - Top Hit 20 Vol. 2 (Japan) - PRG:262144 CHR:0
All of these are instances in which the original/parent ROM is included in its entirety at the start of the child ROM.
I attached the generated index in JSON. Right now it's just a map of partial CRC32 to full CRC32, but it could instead map to game name, PRG ROM size, mapper, etc.
nesIndex2.json.txt
Thoughts?