deep-symmetry / crate-digger Goto Github PK

View Code? Open in Web Editor NEW

129.0 15.0 17.0 2.26 MB

Java library for fetching and parsing rekordbox exports and track analysis files.

License: Other

RPC 20.62% Java 41.55% Kaitai Struct 37.84%

pioneer network rekordbox java-library

crate-digger's Introduction

Crate Digger

A Java library for fetching and parsing rekordbox media exports and track analysis files.

or secondary options

This project uses the Kaitai Struct compiler with the help of a Maven plugin to create classes that can parse and output binary data structures in a convenient and efficient way. It uses them to create Java classes, because it was created to support Beat Link, but other projects can use them to output code for other languages.

It also uses the jrpcgen tool (which is part of the Remote Tea project), via another plugin, to generate classes that know how to talk to the non-standard NFSv2 file servers running in Link-capable players, so the rekordbox data can be reliably obtained even during big shows where all four players are in use.

Getting Help

Deep Symmetry’s projects are generously sponsored with hosting by Zulip, an open-source modern team chat app designed to keep both live and asynchronous conversations organized. Thanks to them, you can chat with our community, ask questions, get inspiration, and share your own ideas.

PDB Database

The file rekordbox_pdb.ksy contains the structure definitions needed to parse exported rekordbox databases (export.pdb files).

ℹ️	Huge thanks to Fabian Lesniak for figuring out the details of how to interpret these files in his python-prodj-link project and Mikhail Yakshin for helping me quickly learn the more subtle aspects of Kaitai Struct. And this was all started by a question Evan Purkhiser posted on Stack Exchange.

There is an Export Structure Analysis site describing the details of what we have learned about these file formats. Reading that will help make sense of the exploration tools and the objects returned by this library.

Exploring the Analysis

One of the amazingly cool things about Kaitai Struct is that you can use its Web IDE to see how the structure definitions work and visually explore the contents of files you are analyzing. This also means you can look inside your own .pdb files and check my work, or get a better understanding of how to use the generated parsers. To do that, simply upload the .pdb file you want to examine to the Web IDE (it doesn’t actually go to the web, it just gets put in your local browser storage), then also upload my rekordbox_pdb.ksy file, and the Web IDE will parse the exported database, letting you explore the structures in the tree view, and see the corresponding raw bytes in the hex viewer.

💡

You can find the export.pdb file on a media stick prepared by rekordbox inside the PIONEER folder, which may be invisible in the Finder, but you can open it using Terminal commands if you have to. To download my rekordbox_pdb.ksy, click on its link, then click on the Raw button in the header above the first line of the listing, then tell your browser to save it to disk. Be sure to keep the .ksy extension. Then you can upload it to the Kaitai Struct Web IDE.

ANLZ Data

Each track in a rekordbox database also has ANLZnnnn.DAT and ANLZnnnn.EXT files associated with it, containing the beat grid, an index allowing rapid seeking to any time in variable-bit rate audio files, the waveforms, memory cues and loop points. The paths to these files are found inside the corresponding track record.

The structure definitions for these files are in rekordbox_anlz.ksy. You can use it with the Kaitai Struct Web IDE as described above to explore analysis files found in your own exported media.

Using the Library

Crate Digger is available through Maven Central, so to use it in your Maven project, all you need is to include the appropriate dependency.

Click the maven central badge above to view the repository entry for crate-digger. The proper format for including the latest release as a dependency in a variety of tools, including Leiningen if you are using beat-link from Clojure, can be found in the Dependency Information section.

There are two halves to what Crate Digger offers. The first is an ability to talk to the nonstandard Network File System servers that are running in Pioneer players, and ask them to deliver the rekordbox data export and track analysis files (programs like Beat Link need these files to provide smooth integrations with the music being performed). The second half is the ability to parse the contents of those files, as described above.

Retrieving Files

The class org.deepsymmetry.cratedigger.FileFetcher is a singleton, so to work with it you will start by calling getInstance(), as is customary. Retrieving a file is then as simple as this:

FileFetcher fetcher = FileFetcher.getInstance();
fetcher.fetch(playerAddress, mountPath, sourcePath, destination);

playerAddress is an InetAddress object holding the address of the player from which you want to download a file. mountPath identifies the media slot you want to get information from, as shown in the table below. sourcePath is the path to the specific file you want within the mounted media, and destination is a File object identifying where you want the downloaded data to be stored.

Table 1. Media Slot Mount Paths

Media Slot	Mount Path
SD	`/B/`
USB	`/C/`

❗	The `FileFetcher` caches information about players to make requests more efficient, so it is important for you to tell it when a player goes away, or unmounts one of its media slots, by calling: fetcher.removePlayer(playerAddress);

Parsing Structures

The class org.deepsymmetry.cratedigger.Database provides support for accessing the contents of rekordbox database export files. You can create an instance to wrap a File instance that contains such an export (for example one that you downloaded using the fetch method above). Then you can query it for track and other information:

Database database = new Database(downloadedFile);
RekordboxPdb.TrackRow track = database.findTrack(1);
System.out.println(database.getText(track.title()));

Strings (like titles, artist names, etc.) are represented by a variety of structures with different encodings, so a getText() method is provided to convert them into ordinary Java strings.

See the API documentation for more details about these classes, and the Export Structure Analysis for more details about the file formats.

Logging

Crate Digger uses slf4j to allow you to integrate it with whatever Java logging framework your project is using, so you will need to include the appropriate slf4j binding on your class path.

Unfinished Tasks

There are still more tables to be figured out. Columns looks like the list of things that can be searched by, so perhaps it will hold some clues for how to find and use the index tables, which must exist because it would be horribly slow for the players to do a linear scan through the main sparse tables whenever they wanted a record.
If we could figure out how to use the indices ourselves, we could avoid having to load the whole file and index it ourselves.

Building the source

As noted above, he Maven project uses a plugin to run the the jrpcgen tool (which is part of the Remote Tea project) to generate Java classes to implement the ONC RPC specifications found in src/main/rpc. (These are used for communicating with the NFS servers in CDJs.) It also uses the Kaitai Struct Compiler through another plugin to generate Java classes that can parse the rekordbox databases it downloads from the players, based on the specifications found in src/main/kaitai.

These things happen for you automatically during the code generation phase of the Maven build. If you want to use something other than Maven, you will need to figure out how to configure and run the tools yourself.

Building the Structure Analysis

I started out using pdfLaTeX to write and format the document, but then, at the recommendation of one of the Kaitai Struct developers, switched to XeLaTeX in order to take advantage of newer features. But over time some of the packages I was using, especially for tables, became unsupported and started having issues. So this and the dysentery project’s protocol analysis document have been ported to more modern Asciidoc source in the form of Antora sites.

To re-create (and even improve on) the byte field diagrams I was able to achieve in LaTeX, I ended up writing my own diagram generator, bytefied-svg, which runs as an Antora plugin with the help of David Jencks' generic-svg-extension.

This documentation site can be built alongside the dysentery project’s protocol analysis, by following the directions in that project.

Contributing

If you have ideas, discoveries, or even code you’d like to share, that’s fantastic! Please take a look at the guidelines and get in touch!

Licenses

Distributed under the Eclipse Public License 2.0. By using this software in any fashion, you are agreeing to be bound by the terms of this license. You must not remove this notice, or any other, from this software. A copy of the license can be found in LICENSE within this project.

Secondary Licenses: This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: Mozilla Public License 2.0, or GNU Lesser General Public License v. 3.

Library and Build Tool Licenses

Remote Tea is licensed under the GNU Library General Public License, version 2.

The Kaitai Struct Compiler is licensed under the GNU General Public License, version 3 and the Kaitai Java runtime embedded in crate-digger is licensed under the MIT License.

crate-digger's People

Contributors

Stargazers

Watchers

Forkers

teora mganss b020 ehendrikd paulyc bosconi bergader holzhaus swiftb0y christilut moondisco modernmediagrp freddykat strangercacaus fragmede nzoschke codekill3r

crate-digger's Issues

Database class playlistFolderIndex is missing Javadoc

The next time work is done on this library, let’s add an explanation for this!

Using Crate Digger for the .edb Device SQL database

I've stumbled upon Deep-Symmetry many times over the last couple of years because I keep wanting to make something that makes my life much easier when it comes to managing my collection. I'm kind of just stuck with modifying the exported XML file for now, but I'm looking to see if the process can be somewhat automated if I can hook into the Rekordbox Database somehow.

The work you've done is amazing! You've found a lot of information on the most over-engineered protocols, and you've done excellent documentation too - something that's far too rare. 🙏

I'm willing to place bets that you've looked into the .edb files as well to see any similarities between the .pdb files, and I'm curious to know if you've found anything?

Looking at its hex/ascii content, it looks somewhat similar to what I would expect, but it's also too far different for me to make any connections in my mind.

Do you reckon it could be reversed? Unfortunately, Ubiquitous AI (the makers of Device SQL - afai could tell) run a pretty tight ship with their SDK.

Document `.2EX` analysis files

Similar to .EXT files which added new tag types, the .2EX files seem to contain some new tags:

Tag	`len_header`	`len_tag` for demo track
`PWV7`	24	77622
`PWV6`	20	3620
`PWVC`	14	20

Judging from the tag names and sizes, these seem to be related to waveforms.

History Entries table does not contain last played track, although rekordbox see it

I use https://ide.kaitai.io/# and rekordbox_pdb.ksy to read export.pdb files.

I noticed that HISTORY_ENTRIES tabel of export.pdb has one less track than what I played, last track missing, but Recordbox sees all tracks.

LaTeX issues

Here are some suggestions how to improve your LaTeX:

use XeLaTeX . It works with ttf and otf and gives better appearance.
\usepackage{cleveref} - automatic crossreferences with links.
\usepackage{hyperxmp} - embeds XMP metadata block
\usepackage{smartref} - refs with human-readable names
set your metadata with \hypersetup
use \usepackage[pdfusetitle]{hyperref} to make hyperref to use the title set with \title

Add documentation for `*SETTING.DAT` files

The PIONEER directory contains these files:

DEVSETTING.DAT
DJMMYSETTING.DAT
MYSETTING.DAT
MYSETTING2.DAT

These apparently allow the device to configure itself according to the preferences of the DJ that the plugged in device belongs to.

Some very basic reverse engineering of the format has been done here: https://github.com/Holzhaus/rekordcrate/pull/27/files#diff-811635f5c64ddc67c4db2a60451b9d96feec1542bb4728595586c7c51ff25003R17-R32

Support the History menu

Someone finally has need for figuring out how the History playlists are represented in database exports, so it ist time to figure that out.

Update Remote Tea ONC/RPC library

We are using 1.1.4, and it is up to 1.1.6. The big changes came in 1.1.5, which generates better Java code, including the use of real enums rather than interfaces. This will be a backwards-incompatible change for us so it will take a bit of work to adjust to it, but it is probably worth it.

See https://sourceforge.net/p/remotetea/news/

exportExt.pdb File

Hi guys, hope you are all well.

I have already read in the Rekordbox Export Structure Analysis manual that the database export file is the export.pdb. I have created a testing one on Rekordbox using analysed tracks with extra features like colours and expected from export.pdb that the 𝑎𝑛𝑎𝑙𝑦𝑧𝑒_𝑝𝑎𝑡ℎ field will contain/point an .EXT file. Instead, all 𝑎𝑛𝑎𝑙𝑦𝑧𝑒_𝑝𝑎𝑡ℎ fields containing/pointing to .DAT files only.

Am suspecting that exportExt.pdb points to .EXT files. If not then how .EXT files are pointed? Does anyone knows how this works? What is the purpose of exportExt.pdb?

Parsing the exportExt.pdb on Web IDE using the .ksy file seems not working.

Can anyone help please?

Thanks

Cannot parse specific .DAT files

Hello, hope you are all well.
First of all I want to say that you did great work.
I tried to parse around 60000 analysis files (.DAT and .EXT files originally created not from Rekordbox but from DJ mixer CDJ 2000 NXS2) using C# code generated from the .ksy file. I have managed to parse most of the files successfully but for around 2500 .DAT files I got the following Exception Unhandled: System.IO.EndOfStreamException: 'requested 400 bytes, but got only 0 bytes' in “public byte[] ReadBytes(ulong count)” method.
The strange thing is that when am using Web IDE to see the structure and visually explore the contents of these files, I get no error and I can see them normally. Also am using the latest version of your code.
Can you please help me on this issue? I can provide you more samples of these .DAT files if you want.
Thanks

DAT FILES.zip

Support PQT2

.EXT files have a section called PQT2. It sounds like it has to do with quantization, but from the data I can't tell what it does 🤷‍♂

Creating metadata archive crashes if same file referenced more than once

Describe the bug
TomRex on Zulip discovered that he could not import his large SSD because it kept crashing on a particular track:

2024-Jun-17 20:30:55.054 MacBook-Pro-2.fritz.box ERROR [beat-link-trigger.track-loader:2672] - Problem Creating Metadata Archive
                                     java.lang.Thread.run                        
       java.util.concurrent.ThreadPoolExecutor$Worker.run                        
        java.util.concurrent.ThreadPoolExecutor.runWorker                        
                      java.util.concurrent.FutureTask.run                        
                                                      ...                        
                      clojure.core/binding-conveyor-fn/fn          core.clj: 2047
beat-link-trigger.track-loader/create-metadata-archive/fn  track_loader.clj: 2670
     org.deepsymmetry.cratedigger.Archivist.createArchive    Archivist.java:  113
                                 java.nio.file.Files.copy                        
         java.nio.file.CopyMoveHelper.copyToForeignTarget                        
java.nio.file.FileAlreadyExistsException: /PIONEER/USBANLZ/P061/0001A42B/ANLZ0000.DAT
    file: "/PIONEER/USBANLZ/P061/0001A42B/ANLZ0000.DAT"

It looks like sometimes track entries share the same analysis file somehow, so we need to cope with this.

Binary files like PDF or images should be stored in git lfs. History should be rewritten

Hi. Could you move all the binary files into git lfs? Also I am not sure that storing PDF in the repo is a good approach. I prefer to store only sources in repos, and make CI to generate artifacts like PDF.

EDB File format

Is there anything known about the EDB file format? E.g. the datafile.edb that Rekordbox 5 uses. It seems similar to the PDB files but not quite the same. I haven't been able to find anything about the format anywhere.

Crash parsing some tracks’ cue lists.

This was reported by @drummerclint at the end of 2019, and thanks to his hunting down of the actual ANLZ000.DAT file, I finally was able to figure out and fix the problem.

Unicode DeviceSQL strings in PDB are UTF-16-BE, not UTF-16-LE

I heard from another person who is implementing parsing of PDB files, and he was working with some Russian text, and discovered we were wrong to think these strings are UTF-16LE. Here is what he said, and I validated this by creating a playlist containing the same string in its name:

I could have something off, but here is what I'm seeing about the strings. My PDB includes a Russian song called Покинула чат ("left the chat"). The first letter here is U+041F. All the Cyrillic letters start with 04, but the spacebar between the words is the same U+0020 as in English. Here's how the track name looks in the pdb in hex:

If I skip the 0 and read little endian, I get back the desired "Покинула чат"

If I don't skip and read big endian, I get back the incorrect " окинулаРGат" It gets a lot of the letters right because there is usually a 04 every other byte, but the first letter (which turns out as U+001F "Information separator one") and the characters around the space get messed up (because of the momentary switch from leading 04 to leading 00).
English titles come out right either way, because the leading 00s for each ASCII character in UTF-16 make it forgiving.

Document PSSI tag

I want to stop discussing this in #5, it deserves its own issue! 😆

@mganss I have updated the online pdf to include my draft so far, if you would be so kind as to review it. I need to add the table of actual phrase ID interpretations, which I have enough information to do already, but I don’t know for certain things like the value of len_header so I hope you can send me a PSSI file to study as well.

Database Exports Documentation: Color ID really sometimes 8 bit and sometimes 16 bit large?

In track rows in PDB files, the color ID field is just a single byte (c_id at offset 0x58). The same goes for the color index in Extended (nxs2) Cue List entries of analysis files (c_id at offset 0x1c).

But for PDB color rows, the documentation states that the field is 2 bytes large (id at offset 0x05). Is there a reason to believe that this field is not just also a single byte, and the second byte belongs to u₃ or is yet another unknown byte field?

Handle even shorter nxs2 cue entries

Describe the bug
PCPT2 tags (nxs2 extended cue lists) can have entries short enough to omit comments.

Samples
@evanpurkhiser reported this on the Gitter chat with some samples (headers already stripped off, but length as shown):

[
  '00000000: 7c 00 00 00 01 00 01 00  00 00 e8 03 7b 69 04 00 ||.........è.{i..|',
  '00000010: ff ff ff ff 00 00 00 00  00 00 00 00 00 00 00 00 |ÿÿÿÿ............|',
  '00000020: ff ff ff ff b6 54 00 00  00 00 00 00 54 e4 78 00 |ÿÿÿÿ¶T......Täx.|',
  '00000030: 00 00 00 00 42 00 00 00  00 00 00 00 00 00 00 00 |....B...........|',
  '00000040: 00 00 00 00 00 00 00 00  00 00 2c 00 00 00 15 00 |..........,.....|',
  '00000050: ff 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |ÿ...............|',
  '00000060: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|',
  '00000070: 00 00 00 00 00 00 00 00  00 00 00 00             |............    |'
]

[
  '00000000: 8c 00 00 00 02 00 01 00  00 00 e8 03 f5 57 02 00 |..........è.õW..|',
  '00000010: ff ff ff ff 00 00 00 00  00 00 00 00 00 00 00 00 |ÿÿÿÿ............|',
  '00000020: ff ff ff ff ff 2c 00 00  00 00 00 00 b9 0e 40 00 |ÿÿÿÿÿ,......¹.@.|',
  '00000030: 00 00 00 00 54 00 00 00  00 00 00 00 00 00 00 00 |....T...........|',
  '00000040: 00 00 00 00 00 00 00 00  12 00 54 00 6f 00 20 00 |..........T.o. .|',
  '00000050: 48 00 6f 00 74 00 2d 00  41 00 00 00 2c 00 00 00 |H.o.t.-.A...,...|',
  '00000060: 38 b3 00 ff 00 00 00 00  00 00 00 00 00 00 00 00 |8³.ÿ............|',
  '00000070: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|',
  '00000080: 00 00 00 00 00 00 00 00  00 00 00 00             |............    |'
]

[
  '00000000: 38 00 00 00 00 00 01 00  00 00 e8 03 3d 04 00 00 |8.........è.=...|',
  '00000010: ff ff ff ff 00 00 00 00  00 00 00 00 00 00 00 00 |ÿÿÿÿ............|',
  '00000020: ff ff 01 00 51 00 00 00  00 00 00 00 e0 58 00 00 |ÿÿ..Q.......àX..|',
  '00000030: 00 00 00 00 00 00 00 00                          |........        |'
]

Expected behavior
We should handle these and just return a nil comment, rather than failing with a parse error.

Additional context
We will want to update the analysis document, and the dbserver version of the code in Beat Link as well.