GithubHelp home page GithubHelp logo

Comments (6)

Cyan4973 avatar Cyan4973 commented on April 27, 2024

If you were to concatenate all the same small files into a single blob, and then compress the resulting blob with zstd, what compression ratio would you obtain ?

from zstd.

tomerr90 avatar tomerr90 commented on April 27, 2024

I havent setup a test case of my own, Im asking based on what I saw in the GitHub readme "The Case For Small Data"

from zstd.

Cyan4973 avatar Cyan4973 commented on April 27, 2024

When using different inputs, one logically obtains different compression performance.

If you want to compare small data with large data, use the same data.
Either concatenate the small ones to create large data (preferable), or split the large data to create small data.
Otherwise, it's not comparable.

Compressing large data should always win, though by how much depends on the data (incompressible data remain incompressible). Dictionary will help to close the gap, but typically cannot overtake the large data scenario, especially when the cost of the dictionary (its size) is taken in consideration.
It's likely possible to create a contrarian scenario where above statement is false, by taking advantage of imperfections in the compression process, since it's using fast imperfect heuristics. That would be an exception though, not the general expectation.

from zstd.

tomerr90 avatar tomerr90 commented on April 27, 2024

Thank you for your response!
Thats not quite what Im asking though, you are explaining a general scenario, Im asking about the results that are published in Zstd's GitHub page.
It makes sense to me the the ratio for big files show be the upper limit, however, it seems like for small data, its able to achieve much more, is it just because the input used happens to be such that is ver compressible?
Meaning if I would take all the samples in the scenario described and concatenate them, would I achieve the same ratio (~10) without the dictionary as well?

from zstd.

Cyan4973 avatar Cyan4973 commented on April 27, 2024

Yes,
dictionary compression works best on structured data featuring a lot of redundancy across messages, though very little within the message.
This is what the github collection sample achieves: it's just a bunch of json records, with very similar structure.
If they were compressed all concatenated together, the compression ratio would be greater than 10.

Dictionary compression is for scenarios where one cannot concatenate these similar records together, for example because the records must be sent immediately and can't wait inside a batch queue.

from zstd.

tomerr90 avatar tomerr90 commented on April 27, 2024

I see, now I understand, thank you!

from zstd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.