GithubHelp home page GithubHelp logo

martinellimarco / libzstd-seek Goto Github PK

View Code? Open in Web Editor NEW
17.0 17.0 3.0 122 KB

A library that mimic fread, fseek and ftell for reading zstd compressed files.

License: MIT License

CMake 0.62% C 96.68% Shell 2.70%

libzstd-seek's People

Contributors

ap-- avatar martinellimarco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

libzstd-seek's Issues

libzstd-seek can't handle large amount of files/ a large file size?

I've a 45GB file compressed with t2sz to ~10 GB. Tar lists the files fine.
I ran the example tar-zst-list against it and then it stops at
data182478.dat - ftell: 1121145480
Error while seeking
log2(1121145480) = 30.0623263486 bits?
Very rough guess, but, maybe it's a integer overflow & need to convert a int (un)signed 32bit int value to a unsigned 64bit int?
P.S.
When you fix this issue can you also update indexed_zstd to include this fix?

Support for seekable zstd format with Jumptable

I noticed you had a "TODO" about this issue in zstd-seek.c and I tested the solution on a 98TB backup using ratarmount and it eliminated over 24 hours regenerating the JumpTable during the mount. The solution is based on the code in zstd/contrib/seekable_format/zstdseek_decompress.c

I am new to git, so I don't know the right way to submit a change, so I am posting the C-code here : feel free to use it or adapt it(right after your "TODO" comment) :

// Adapted from zstd/contrib/seekable_format/zstdseek_decompress.c
#define ZSTD_seekTableFooterSize 9
#define ZSTD_SEEKABLE_MAGICNUMBER 0x8F92EAB1
#define ZSTD_SKIPPABLEHEADERSIZE 8
#define ZSTD_MAGIC_SKIPPABLE_START 0x184D2A50
if(1) {
void *buff = sctx->buff;
size_t size = sctx->size;
void *footer = buff + (size - ZSTD_seekTableFooterSize);
unsigned magicnumber = *((unsigned *)(footer + 5));

    if(magicnumber == ZSTD_SEEKABLE_MAGICNUMBER){
        unsigned char sfd = *((unsigned char*)(footer + 4));
        unsigned checksumFlag = sfd >> 7;

        /* check reserved bits */
        if ((sfd >> 2) & 0x1f) {
           DEBUG("last frame checksumFlag= %x: Bits 3-7 should be zero\n",(unsigned int)sfd);
       return -1;
        }
    unsigned const numFrames = *((unsigned *)footer);
    unsigned const sizePerEntry = 8 + (checksumFlag ? 4 : 0);
    unsigned const tableSize = sizePerEntry * numFrames;
        unsigned const frameSize = tableSize + ZSTD_seekTableFooterSize + ZSTD_SKIPPABLEHEADERSIZE;
    
        void *frame = buff + (size - frameSize);
    unsigned skippableHeader = *((unsigned *)frame);
    if(skippableHeader != (ZSTD_MAGIC_SKIPPABLE_START | 0xE)){
             DEBUG("last frame Header = %u does not match magic number %u\n",skippableHeader, (ZSTD_MAGIC_SKIPPABLE_START | 0xE));
             return -1;
        }

        unsigned FrameSize = *((unsigned *)(frame + 4));
        if(FrameSize + ZSTD_SKIPPABLEHEADERSIZE != frameSize){
            DEBUG("last frame size = %u does not match expected size = %u\n", FrameSize + ZSTD_SKIPPABLEHEADERSIZE, frameSize);
            return -1;
        }

        void *table = frame + ZSTD_SKIPPABLEHEADERSIZE;
        for(unsigned i = 0; i < numFrames; i++){
        unsigned cOffset = *((unsigned *)(table + (i * sizePerEntry)));
        unsigned dOffset = *((unsigned *)(table + (i * sizePerEntry) + 4));
        ZSTDSeek_addJumpTableRecord(sctx->jt, cOffsetCum, dOffsetCum);
        }

        sctx->jumpTableFullyInitialized = 1;
        return 0;
}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.