Hi! Can someone please provide technical or benchmark information about effects of mul

Several GB of random projects cloned from GitHub. <p di

Several GB of random projects cloned from GitHub. </bloc

Thanks for detailed deion. So several days passed and I've backed with adjus

How do multiple threads affect compression? about zstd HOT 12 CLOSED

baterflyrity commented on April 28, 2024

How do multiple threads affect compression?

from zstd.

Comments (12)

Cyan4973 commented on April 28, 2024

I have tested about 30 different combinations of compressions level and threads

On what content ?

from zstd.

baterflyrity commented on April 28, 2024

I have tested about 30 different combinations of compressions level and threads

On what content ?

Several GB of random projects cloned from GitHub.

Also I've checked 1-16 zstd threads together with 1-60 OS threads - always same results (except RAM threading overhead).

I guess there can be size check, e.g. if (target.ST_SIZE > MINIMUM_THREADING_SIZE) compress_single_thread(target);.

from zstd.

Cyan4973 commented on April 28, 2024

Several GB of random projects cloned from GitHub.

Do you mean a giant tarball (single file) containing multiple repositories ?

from zstd.

baterflyrity commented on April 28, 2024

Several GB of random projects cloned from GitHub.

Do you mean a giant tarball (single file) containing multiple repositories ?

No, just different git clone many times.

from zstd.

Cyan4973 commented on April 28, 2024

I don't understand the scenario being tested here.
And it doesn't look like the scenario for which multi-threaded mode was created.

from zstd.

baterflyrity commented on April 28, 2024

I don't understand the scenario being tested here. And it doesn't look like the scenario for which multi-threaded mode was created.

Let me explain again...

1. Prepare some initial data:

mkdir test
cd test
git clone https://github.com/kivy/buildozer.git
git clone https://github.com/fomantic/Fomantic-UI.git
git clone https://github.com/marktext/marktext.git
git clone https://github.com/bonigarcia/webdrivermanager.git

2. Now we want to compress `test` directory.

So, how can we compress the directory with all subdirectories? I have tested several solutions:

Compress them all using zstd 1-16 threads.
Compress each separately in 1-4 OS threads.
Compress each separately in 1-4 OS threads using zstd 1-16 threads.

Despite any compression parameters there will be always equal RAM usage, CPU usage, compression time and compression ratio.

3. Question

What do zstd threads do? How do 1, 2, 3, e.t.c threads affect results?

p.s.

Here I define only 4 subdirectories but actually I have tested 61 different projects from GitHub within 1-60 OS threads. Also I have 4 physical and 8 logical CPU cores. Also I have enough RAM to compress them all in parallel. Also I have tested compressions levels: 22, 17, 12, 6, 3, 1.

from zstd.

Cyan4973 commented on April 28, 2024

It's the "compress the directory" part which is not clearly specified.

This could be done in multiple ways using zstd,
and for example one could do it this way:

tar cvf - test_dir | zstd > test_dir.tar.zst

This is not a random example, this is how I would do it if I were tasked to compress such a directory. It would result in a "relatively large" stream (depending on repositories' content), for which the multithreaded compression within zstd would be effective.

Another way to interpret "compress the directory" could be:

zstd -r test_dir

in which case each and every file within test subdirectories is individually compressed, and its corresponding .zst compressed output is written into the same directory, alongside the original data.

I struggle to see a use case where doing it this way is a useful thing to do, but it is nonetheless technically possible (since this is a capability of the gzip interface and we want to keep parity with that), so one could decide to employ that method, even if only for a test scenario.

The problem here is that, in this case, there are likely a lot of small files, each of which is a separate compression job, and multi-threaded compression will not be useful in this case because there is not enough data to even start parallelizing the processing.

Another idea here could be to start multiple different compression jobs in parallel.
One way to achieve multiple compression in parallel would be to leverage the shell, to make it batch-generate the compression commands and run them in parallel, typically with xargs for example (other methods are certainly possible). This is the usual answer to such scenario, but it requires some sh-fu skill, which is not always trivial.

I guess that gets us to what is likely the request: do the same work as the shell, but within zstd.
However, I hope that it's clear that this is a completely different layer : multithreaded compression, of a same source, is provided by libzstd, and this one works. Multiple compression jobs in parallel would be something that the CLI program zstd would have to manage on its side, scrubbing through the list of targets, creating and maintaining multiple ZSTD_CCtx* compression states in flight, ensuring that the global pool of threads would remain constant, etc.
In short, it's not a logic part of libzstd, it's fairly specific to the CLI use case, and it's complex to implement.
Unfortunately, such an effort is not part of the current code base. It must be noted that most of our efforts are concentrated on libzstd, it is a vastly more important capability tool, integrated into many programs, which also serves the zstd CLI as one of its users.

Of course, since this is open source, someone could step in and do it. Be aware though that this is not a trivial fix.

My recommendation here would be to stick to the tar archiving method, provided it achieves the wanted goal (presumably, the compression of a large directory, for later regeneration).

from zstd.

baterflyrity commented on April 28, 2024

Thanks for detailed description.
So several days passed and I've backed with adjusted tests.

Source data

I used directory with 40-50 randomly chosen repositories from GitHub as a compression source.

cd test
git clone https://github.com/kivy/buildozer.git
git clone https://github.com/fomantic/Fomantic-UI.git
git clone https://github.com/marktext/marktext.git
git clone https://github.com/bonigarcia/webdrivermanager.git
... repeat 40 times more with random repositories

Testing methodology

I want to compress each repository into separate archive using zstd compression algorithm. Also this process should be performed the fastest way.

So I create separate compressions job per each repository. These jobs I run in parallel using OS threads. Additionally I use zstd threads within each job.

The job consists of two steps:

Pack source repository directory into single .tar file.
Compress .tar file with zstd.

To detect all compression stats I use:

special monitor (like process explorer)
calculate summarized (total) size of test directory before (2340 Mb) and after compression

Also my machine has 4 physical CPUs and 8 logical CPUs.

Difference from original methodology

In initial comment posted results of different job algorithm:

Open zstd compression stream.
Find all files in repository.
Push each file individually to the stream.

Other aspects stay equal.

Benchmark results

I combined 3 parameters: compression level, compression zstd threads, OS threads. Tested many combinations (the first is cold, others are hot).

------------------------------------------------------
Level | OS | ZSTD |   t   | RAM  | CPU % | S    | S  %
------------------------------------------------------
6     | 10 | 4    | 4.62  | 800  | 0   % | 1540 | 66 %
3     | 1  | 1    | 6.17  | 200  | 11  % | 1568 | 67 %
3     | 1  | 4    | 5.97  | 200  | 11  % | 1568 | 67 %
3     | 1  | 8    | 5.21  | 200  | 12  % | 1568 | 67 %
3     | 4  | 1    | 2.56  | 500  | 0   % | 1568 | 67 %
3     | 4  | 4    | 2.59  | 500  | 0   % | 1568 | 67 %
3     | 4  | 8    | 2.54  | 500  | 0   % | 1568 | 67 %
3     | 8  | 1    | 2.51  | 700  | 0   % | 1568 | 67 %
3     | 8  | 4    | 2.56  | 700  | 0   % | 1568 | 67 %
3     | 8  | 8    | 2.37  | 700  | 0   % | 1568 | 67 %
3     | 16 | 1    | 2.59  | 1000 | 0   % | 1568 | 67 %
3     | 16 | 4    | 2.44  | 1000 | 0   % | 1568 | 67 %
3     | 16 | 8    | 2.55  | 1000 | 0   % | 1568 | 67 %
6     | 1  | 1    | 6.0   | 200  | 12  % | 1540 | 66 %
6     | 1  | 4    | 5.97  | 200  | 12  % | 1540 | 66 %
6     | 1  | 8    | 5.96  | 200  | 12  % | 1540 | 66 %
6     | 4  | 1    | 2.61  | 500  | 0   % | 1540 | 66 %
6     | 4  | 4    | 2.65  | 500  | 0   % | 1540 | 66 %
6     | 4  | 8    | 2.61  | 500  | 0   % | 1540 | 66 %
6     | 8  | 1    | 2.54  | 700  | 0   % | 1540 | 66 %
6     | 8  | 4    | 2.52  | 700  | 0   % | 1540 | 66 %
6     | 8  | 8    | 2.62  | 700  | 0   % | 1540 | 66 %
6     | 16 | 1    | 2.7   | 1000 | 0   % | 1540 | 66 %
6     | 16 | 4    | 2.81  | 1000 | 0   % | 1540 | 66 %
6     | 16 | 8    | 2.79  | 1000 | 0   % | 1540 | 66 %
12    | 1  | 1    | 7.95  | 200  | 12  % | 1503 | 64 %
12    | 1  | 4    | 7.66  | 200  | 12  % | 1503 | 64 %
12    | 1  | 8    | 9.13  | 200  | 11  % | 1503 | 64 %
12    | 4  | 1    | 3.39  | 500  | 0   % | 1503 | 64 %
12    | 4  | 4    | 3.13  | 500  | 0   % | 1503 | 64 %
12    | 4  | 8    | 2.98  | 500  | 0   % | 1503 | 64 %
12    | 8  | 1    | 2.99  | 700  | 0   % | 1503 | 64 %
12    | 8  | 4    | 3.1   | 700  | 0   % | 1503 | 64 %
12    | 8  | 8    | 3.02  | 700  | 0   % | 1503 | 64 %
12    | 16 | 1    | 3.22  | 1100 | 0   % | 1503 | 64 %
12    | 16 | 4    | 2.99  | 1100 | 0   % | 1503 | 64 %
12    | 16 | 8    | 3.0   | 1100 | 0   % | 1503 | 64 %

where:

Level - zstd compression level
OS - OS threads (to run jobs in parallel)
ZSTD - zstd compression threads (within single job)
t - total time in minutes
RAM - mean total RAM usage
CPU % - mean total CPU usage in percents (0% - 100%)
S - total compressed size of test directory
S % - total compression ratio of test directory in percents (0% - 100%)

These results match with original methodology.

Question

From the table above it can be seen that non of RAM usage, compression time and compression ratio are affected by zstd threads count parameter. So what does this parameter do?

from zstd.

Cyan4973 commented on April 28, 2024

These results imply that multithreading is not effective in these tests.
Time difference is within noise, and memory measurements are identical, which should not be the case.
It's difficult to guess remotely why, but here are a few possible investigation directions :

What's the size of the tar archives ? I'm assuming "big", in the sense of > 16 MB. In which case, multithreading should be effective. If it was small (say < 1 MB), then it would not be enough to trigger parallelism.
If tar outcome is piped into zstd: what's the speed of the tar operation ?
In some cases, when there are a lot of files, it could be slow. In which case, it might be so slow that zstd never needs more than 1 thread to keep up. This is less likely at higher levels though.
How fast is the underlying storage medium ? Variant of above scenario.

Of course, there are a tons of other reasons that could produce this outcome, from having the binary compiled without MT supports, to having the nb of threads redefined to 1 somewhere in the chain of calls. Difficult to guess remotely.

But in a more known environment, multithreading is effective, given a large enough input.
Here is an example, measured locally on a macbook, compressing the silesia.tar corpus (~200 MB):

level	threads	Compression speed
1	1	607 MB/s
3	1	386 MB/s
6	1	128 MB/s
12	1	38 MB/s
1	2	1160 MB/s
3	2	729 MB/s
6	2	245 MB/s
12	2	78 MB/s
1	4	2221 MB/s
3	4	1359 MB/s
6	4	469 MB/s
12	4	121 MB/s

from zstd.

baterflyrity commented on April 28, 2024

So, there must be > 16 MB single file and ssd storage to use multiple zstd threads otherwise they self-diactivate, am I right?

from zstd.

Cyan4973 commented on April 28, 2024

Somehow yes.
The exact answer is a bit more complex and actually varies depending on compression level (for example, level 19 will still create only one job for a 16 MB file, while level 1 will create up to 8 ones), but you get the idea: only "large" files benefit from multi-threading.

from zstd.

baterflyrity commented on April 28, 2024

Thanks, got it.
Nevertheless I'm going to do some more tests. Seems like there is no need to post results here so I just close this question.

from zstd.

How do multiple threads affect compression? about zstd HOT 12 CLOSED

Comments (12)

1. Prepare some initial data:

2. Now we want to compress `test` directory.

3. Question

Source data

Testing methodology

Difference from original methodology

Benchmark results

Question

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

Comments (12)

1. Prepare some initial data:

2. Now we want to compress test directory.

3. Question

Source data

Testing methodology

Difference from original methodology

Benchmark results

Question

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

2. Now we want to compress `test` directory.