GithubHelp home page GithubHelp logo

Comments (7)

Cyan4973 avatar Cyan4973 commented on April 28, 2024 1

The first run used ~10 out of 12 physical cores (close to what I'd expect), the second one barely more than 3.

Is that expected?

Yes,
as the level increases, the window size tends to increase,
and as a consequence, the size of each job tends to increase too.
At level 21, each job is likely ~256 MB, so there are less jobs in parallel.

It's possible to take direct control of job size, as explained in the man page. Quoting :

ADVANCED COMPRESSION OPTIONS
       ### -B#: Specify the size of each compression job. This parameter is only available when multi-threading is enabled. Each compression job is run in parallel,
       so this value indirectly impacts the nb of active threads. Default job size varies depending on compression level (generally 4 * windowSize).  -B#  makes  it
       possible to manually select a custom size. Note that job size must respect a minimum value which is enforced transparently. This minimum is either 512 KB, or
       overlapSize, whichever is largest. Different job sizes will lead to non-identical compressed frames.

from zstd.

Cyan4973 avatar Cyan4973 commented on April 28, 2024

If multithreading was working, I would expect user time > real time.
That's not the case here, suggesting something is wrong.

For reference, here is what I'm getting on a local Ubuntu desktop :

time ./zstd -4 linux-6.2.9.tar -c > /dev/null
4.52s user 0.21s system 108% cpu 4.348 total

time ./zstd -4 -T0 linux-6.2.9.tar -c > /dev/null
11.91s user 0.42s system 735% cpu 1.675 total

from zstd.

Dmitri555 avatar Dmitri555 commented on April 28, 2024

Test with your file looks good!

time ./zstd1.5.5 -4 linux-6.2.9.tar -c > /dev/null
real 0m4.046s user 0m4.260s sys 0m0.681s

time ./zstd1.5.5 -4 -T0 linux-6.2.9.tar -c > /dev/null
real 0m0.774s user 0m4.362s sys 0m0.817s

But with another file (just a core file of crashed process) does NOT

time ./zstd1.5.5 -4 core-file -c > /dev/null
real 0m6.837s user 0m5.990s sys 0m6.653s

time ./zstd1.5.5 -4 -T0 core-file -c > /dev/null
real 0m6.886s user 0m6.720s sys 0m7.051s

ls -l linux-6.2.9.tar core-file
-rw-rw-r--. 1 test test 14083944448 Dec 2 14:15 core-file
-rw-rw-r--. 1 test test 1371432960 Mar 30 2023 linux-6.2.9.tar

from zstd.

Cyan4973 avatar Cyan4973 commented on April 28, 2024

It implies that this outcome is data dependent, therefore not a generality.

Data compression is indeed data dependent, compression ratio of course, but even compression speed.
That being said, that multithreading isn't effective on some type of data compared to others is a new one, and I would not have expected it.

I'm afraid that, without access to a reproduction case, it will be difficult to investigate further.

from zstd.

gyscos avatar gyscos commented on April 28, 2024

On my machine, -T0 seems to work on level 19, but not as well on level 21:

% # Prepare test file
% seq 100000000 > seq.txt
% zstd -19 -T0 seq.txt
seq.txt              :  2.83%   (   848 MiB =>   24.0 MiB, seq.txt.zst)        
zstd -19 -T0 seq.txt  410.49s user 0.84s system 989% cpu 41.560 total
% zstd --ultra -21 -T0 seq.txt
zstd: seq.txt.zst already exists; overwrite (y/n) ? y
seq.txt              :  2.79%   (   848 MiB =>   23.6 MiB, seq.txt.zst)        
zstd --ultra -21 -T0 seq.txt  471.95s user 0.62s system 325% cpu 2:25.22 total

The first run used ~10 out of 12 physical cores (close to what I'd expect), the second one barely more than 3.

Is that expected?

from zstd.

Dmitri555 avatar Dmitri555 commented on April 28, 2024

Am I understand you correctly ?
ZSTD can not scale linear. So when I increase number of cores 8x times (from 4 to 32) I can not expect 8x increase in performance, just no or little increase.

from zstd.

Cyan4973 avatar Cyan4973 commented on April 28, 2024

It's a combination of factors.

The level 1 is the most likely to scale linearly, because its amount of "hot" memory typically fits inside each core.
After that, as level increases, memory requirement increases, and it becomes more and more likely (depending on exact cpu model) that hot memory will spill over into shared resource, such as L3 cache or RAM. At which point, by increasing the nb of cores, there will be increased contention on the shared resource. Adding cores will still increase performance, but no longer linearly.

The issue reported by @gyscos is different though : it's a question of quantity of input.
Given an infinite input stream and an infinite input bandwidth, all threads will be occupied compressing a section each.
But if the input is "too small", there will not be enough jobs to distribute. So even if 100 threads are available, if there are only 5 jobs to distribute, for example, it won't be possible to employ the 100 threads.
The problem is especially acute in the --ultra range, because each job becomes huge : at level 21, each job is 256 MB large by default. So it becomes probable that only one job, or very few, will be distributed.
This issue is mostly a problem for --ultra levels. Lower levels have more reasonable job sizes. Level 19 defines a 32 MB job size by default, and level 1 defines a 2 MB one. "by default" is stated because it's also possible to manually take control of this value when need be.

from zstd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.