Is your feature request related to a problem? Please describe. <p

A question about compressing a file twice about zstd HOT 2 OPEN

shuhuajack commented on May 4, 2024

A question about compressing a file twice

from zstd.

Comments (2)

Cyan4973 commented on May 4, 2024

The second compression pass is expected to be much faster precisely because data is already compressed during the first pass . As a consequence, the second pass is expected to provide little, if any, compression benefit.

The fact that you found 35% savings in the second pass contradicts the second statement. But this is just one sample, it should not be construed as a generality.
The general expectation is that the second pass brings almost nothing, but there are counter examples possible. Unfortunately, these counter examples are less easy to define. A general idea is that there might be so much redundancy in the source data that the first pass cannot get rid of them all, which generally means that the compression ratio is very high. It is also related to the specific set of parameters selected for the first and second passes.

Even when the second pass brings benefits, it generally means that, with proper parameters, a single pass would have been able to produce a better compression ratio. However, it's unlikely to be as fast.

If you are really into using 2 passes to compress your data because it seems to fit a pattern that benefits from it, I suggest trying lz4 for the first pass. It messes with data less, meaning that the outcome of the first pass will likely be better compressed by a second zstd pass. And it's also faster.

from zstd.

shuhuajack commented on May 4, 2024

Hello Yann,

Thank you for your detailed explanation. I truly appreciate it.

It appears that my specific sample data doesn't accurately represent the common cases. Surprisingly, in my tests, the "zstd -c --fast=5" command runs faster than "lz4 -1" and achieves a better compression ratio. Additionally, for the second compression pass, when using "zstd -c compressed-filed" on a file compressed by zstd with the --fast option in the initial pass, it also runs faster and produces slightly better compression ratio.

To gain more insights, I plan to conduct experiments using a wider range of data sources and compare the performance of the two algorithms in both the first and second passes.

Thank you once again for your help. I hope you have a fantastic weekend.

Best Regards,
Jack

from zstd.

Recommend Projects

A question about compressing a file twice about zstd HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs