GithubHelp home page GithubHelp logo

zraorg / zra Goto Github PK

View Code? Open in Web Editor NEW
29.0 4.0 2.0 66 KB

ZStandard Random Access (ZRA) allows random access inside an archive compressed using ZStandard

License: BSD 3-Clause "New" or "Revised" License

CMake 3.09% C++ 96.91%
zstd zstandard random-access compression archive zra

zra's Introduction

ZStandard Random Access (ZRA) allows random access inside an archive compressed using ZStandard


Note: It isn't recommended to use this library in newer projects as it's no longer being actively maintained, feel free to submit PRs regarding any issues and they'll be merged when possible.

Format

How is this done?

ZSTD has the concept of a Frame which can be decompressed independently from the rest of the file. A ZSTD archive is made of multiple concatenated frames which are decompressed one after another.

We exploit that fact to break the file into uniformly sized frames (Frame Size) and creating a seek-table which contains the offset of each frame within in the file which can be indexed by simply dividing the offset by the frame size.

Header

We store data that's required for decompression or other functionality inside an archive header, that contains the following:

  • ZSTD Skippable Frame - The entire header is inside a ZSTD Skippable Frame so that ZRA is fully compatible with any regular ZSTD decompressor
  • CRC-32 Hash - A CRC-32 hash of the entire header to ensure integrity of the file is always preserved
  • Metadata Section - A section where data which might be used by a ZRA decompressor on the other side but not a part of the archive's contents itself
  • Seek-Table - A table with 40-bit entries containing the offset of individual frames

Usage

Compression

  • In-Memory
    zra::Buffer input = GetInput(); // A `zra::Buffer` full of data to be compressed
    zra::Buffer output = zra::CompressBuffer(input);
  • Streaming
    auto size = input.size();
    zra::Compressor compressor(size);
    output.seek(compressor.GetHeaderSize());
    
    zra::Buffer buffer; // The buffer is reused to prevent constant reallocation
    while (size) {
        auto readSize = std::min(maxChunkSize, size);
        compressor.Compress(input.read(readSize), buffer);
        output.write(buffer);
        size -= readSize;
    }
    
    output.seek(0);
    output.write(compressor.GetHeader());
    
    // Note: `input` and `output` in the example hold an internal offset that is automatically 
    // modified based on operations performed on them, similar to ifstream/ofstream from C++ 
    // Standard Library

Decompression (Entire File)

  • In-Memory
      zra::Buffer input = GetInput(); // A `zra::Buffer` with the entire archive
      zra::Buffer output = zra::DecompressBuffer(input);
  • Streaming
    zra::FullDecompressor decompressor([&input](size_t offset, size_t size, void* output) {
      input.seek(offset);
      input.read(output, size);
    });
    
    auto remaining = decompressor.header.uncompressedSize;
    zra::Buffer buffer(bufferSize); // The buffer is reused to prevent constant reallocation
    while (remaining) {
        auto amount = decompressor.Decompress(buffer);
        output.write(buffer, amount);
        remaining -= amount;
    }

Decompression (Random-Access)

  • In-Memory
    zra::Buffer input = GetInput();
    zra::Buffer output = zra::DecompressRA(input, offset, size);
  • Streaming
    zra::Decompressor decompressor([&input](size_t offset, size_t size, void* output) {
      input.seek(offset);
      input.read(output, size);
    });
    zra::Buffer output = decompressor.Decompress(offset, size);
    // or, to prevent buffer reallocation
    decompressor.Decompress(offset, size, output);

Retrieving Header

  • Using readFunction
    zra::Header header([&input](size_t offset, size_t size, void* output) {
      input.seek(offset);
      input.read(output, size);
    });
  • Using a pointer to the archive
    zra::Header header(input.data());
  • From Decompressor/FullDecompressor
    zra::Decompressor decompressor(...); // or zra::FullDecompressor
    decompressor.header;

License

We use a simple 3-clause BSD license located at LICENSE for easy integration into projects while being compatible with the libraries we utilize

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.