GithubHelp home page GithubHelp logo

Comments (2)

Cyan4973 avatar Cyan4973 commented on May 4, 2024

The second compression pass is expected to be much faster precisely because data is already compressed during the first pass . As a consequence, the second pass is expected to provide little, if any, compression benefit.

The fact that you found 35% savings in the second pass contradicts the second statement. But this is just one sample, it should not be construed as a generality.
The general expectation is that the second pass brings almost nothing, but there are counter examples possible. Unfortunately, these counter examples are less easy to define. A general idea is that there might be so much redundancy in the source data that the first pass cannot get rid of them all, which generally means that the compression ratio is very high. It is also related to the specific set of parameters selected for the first and second passes.

Even when the second pass brings benefits, it generally means that, with proper parameters, a single pass would have been able to produce a better compression ratio. However, it's unlikely to be as fast.

If you are really into using 2 passes to compress your data because it seems to fit a pattern that benefits from it, I suggest trying lz4 for the first pass. It messes with data less, meaning that the outcome of the first pass will likely be better compressed by a second zstd pass. And it's also faster.

from zstd.

shuhuajack avatar shuhuajack commented on May 4, 2024

Hello Yann,

Thank you for your detailed explanation. I truly appreciate it.

It appears that my specific sample data doesn't accurately represent the common cases. Surprisingly, in my tests, the "zstd -c --fast=5" command runs faster than "lz4 -1" and achieves a better compression ratio. Additionally, for the second compression pass, when using "zstd -c compressed-filed" on a file compressed by zstd with the --fast option in the initial pass, it also runs faster and produces slightly better compression ratio.

To gain more insights, I plan to conduct experiments using a wider range of data sources and compare the performance of the two algorithms in both the first and second passes.

Thank you once again for your help. I hope you have a fantastic weekend.

Best Regards,
Jack

from zstd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.