martinellimarco / libzstd-seek Goto Github PK
View Code? Open in Web Editor NEWA library that mimic fread, fseek and ftell for reading zstd compressed files.
License: MIT License
A library that mimic fread, fseek and ftell for reading zstd compressed files.
License: MIT License
I noticed you had a "TODO" about this issue in zstd-seek.c and I tested the solution on a 98TB backup using ratarmount and it eliminated over 24 hours regenerating the JumpTable during the mount. The solution is based on the code in zstd/contrib/seekable_format/zstdseek_decompress.c
I am new to git, so I don't know the right way to submit a change, so I am posting the C-code here : feel free to use it or adapt it(right after your "TODO" comment) :
// Adapted from zstd/contrib/seekable_format/zstdseek_decompress.c
#define ZSTD_seekTableFooterSize 9
#define ZSTD_SEEKABLE_MAGICNUMBER 0x8F92EAB1
#define ZSTD_SKIPPABLEHEADERSIZE 8
#define ZSTD_MAGIC_SKIPPABLE_START 0x184D2A50
if(1) {
void *buff = sctx->buff;
size_t size = sctx->size;
void *footer = buff + (size - ZSTD_seekTableFooterSize);
unsigned magicnumber = *((unsigned *)(footer + 5));
if(magicnumber == ZSTD_SEEKABLE_MAGICNUMBER){
unsigned char sfd = *((unsigned char*)(footer + 4));
unsigned checksumFlag = sfd >> 7;
/* check reserved bits */
if ((sfd >> 2) & 0x1f) {
DEBUG("last frame checksumFlag= %x: Bits 3-7 should be zero\n",(unsigned int)sfd);
return -1;
}
unsigned const numFrames = *((unsigned *)footer);
unsigned const sizePerEntry = 8 + (checksumFlag ? 4 : 0);
unsigned const tableSize = sizePerEntry * numFrames;
unsigned const frameSize = tableSize + ZSTD_seekTableFooterSize + ZSTD_SKIPPABLEHEADERSIZE;
void *frame = buff + (size - frameSize);
unsigned skippableHeader = *((unsigned *)frame);
if(skippableHeader != (ZSTD_MAGIC_SKIPPABLE_START | 0xE)){
DEBUG("last frame Header = %u does not match magic number %u\n",skippableHeader, (ZSTD_MAGIC_SKIPPABLE_START | 0xE));
return -1;
}
unsigned FrameSize = *((unsigned *)(frame + 4));
if(FrameSize + ZSTD_SKIPPABLEHEADERSIZE != frameSize){
DEBUG("last frame size = %u does not match expected size = %u\n", FrameSize + ZSTD_SKIPPABLEHEADERSIZE, frameSize);
return -1;
}
void *table = frame + ZSTD_SKIPPABLEHEADERSIZE;
for(unsigned i = 0; i < numFrames; i++){
unsigned cOffset = *((unsigned *)(table + (i * sizePerEntry)));
unsigned dOffset = *((unsigned *)(table + (i * sizePerEntry) + 4));
ZSTDSeek_addJumpTableRecord(sctx->jt, cOffsetCum, dOffsetCum);
}
sctx->jumpTableFullyInitialized = 1;
return 0;
}
}
I've a 45GB file compressed with t2sz to ~10 GB. Tar lists the files fine.
I ran the example tar-zst-list
against it and then it stops at
data182478.dat - ftell: 1121145480
Error while seeking
log2(1121145480) = 30.0623263486 bits?
Very rough guess, but, maybe it's a integer overflow & need to convert a int (un)signed 32bit int value to a unsigned 64bit int?
P.S.
When you fix this issue can you also update indexed_zstd
to include this fix?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.