Comments (2)
The second compression pass is expected to be much faster precisely because data is already compressed during the first pass . As a consequence, the second pass is expected to provide little, if any, compression benefit.
The fact that you found 35% savings in the second pass contradicts the second statement. But this is just one sample, it should not be construed as a generality.
The general expectation is that the second pass brings almost nothing, but there are counter examples possible. Unfortunately, these counter examples are less easy to define. A general idea is that there might be so much redundancy in the source data that the first pass cannot get rid of them all, which generally means that the compression ratio is very high. It is also related to the specific set of parameters selected for the first and second passes.
Even when the second pass brings benefits, it generally means that, with proper parameters, a single pass would have been able to produce a better compression ratio. However, it's unlikely to be as fast.
If you are really into using 2 passes to compress your data because it seems to fit a pattern that benefits from it, I suggest trying lz4
for the first pass. It messes with data less, meaning that the outcome of the first pass will likely be better compressed by a second zstd
pass. And it's also faster.
from zstd.
Hello Yann,
Thank you for your detailed explanation. I truly appreciate it.
It appears that my specific sample data doesn't accurately represent the common cases. Surprisingly, in my tests, the "zstd -c --fast=5" command runs faster than "lz4 -1" and achieves a better compression ratio. Additionally, for the second compression pass, when using "zstd -c compressed-filed" on a file compressed by zstd with the --fast option in the initial pass, it also runs faster and produces slightly better compression ratio.
To gain more insights, I plan to conduct experiments using a wider range of data sources and compare the performance of the two algorithms in both the first and second passes.
Thank you once again for your help. I hope you have a fantastic weekend.
Best Regards,
Jack
from zstd.
Related Issues (20)
- [Question] For static (non-malloc'd) streaming compression, when estimating CStream size, is estimated src size the total size to be compressed, or the max of each time ZSTD_compressStream2 called? HOT 1
- build failed on Centos 7 x64 docker
- incorrect pointer tested against NULL HOT 1
- Allow ID placeholders in dictionary filenames to automatically select the correct dictionary
- Add installing to the tests HOT 3
- Update rowhash code comment
- Unable to compile ZSTD with BOOST HOT 3
- The block size from ZSTD_compressBlock_targetCBlockSize API is not accurate as targetCBlockSize HOT 8
- Documentation: better explanations on compressor behaviour, compression levels and parameters are welcome HOT 4
- truncated file name in error message HOT 2
- Hidden files and folders are not mirrored correctly when source is a relative path HOT 2
- How can I add my software to https://facebook.github.io/zstd/#other-languages HOT 2
- position of out buffer is not updated when using stream decompress API HOT 2
- Document best practices on how to use zstd in a memory constrained environment
- Support getting block info for decompression HOT 5
- Compatibility between compress/decompress APIs HOT 2
- Zstd workspace poisoning doesn't unpoison before memory is freed HOT 1
- Document and the compiler support and specify the C/C++ standard to use HOT 1
- Unclear license status HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.