Comments (7)
The first run used ~10 out of 12 physical cores (close to what I'd expect), the second one barely more than 3.
Is that expected?
Yes,
as the level increases, the window size tends to increase,
and as a consequence, the size of each job tends to increase too.
At level 21, each job is likely ~256 MB, so there are less jobs in parallel.
It's possible to take direct control of job size, as explained in the man
page. Quoting :
ADVANCED COMPRESSION OPTIONS
### -B#: Specify the size of each compression job. This parameter is only available when multi-threading is enabled. Each compression job is run in parallel,
so this value indirectly impacts the nb of active threads. Default job size varies depending on compression level (generally 4 * windowSize). -B# makes it
possible to manually select a custom size. Note that job size must respect a minimum value which is enforced transparently. This minimum is either 512 KB, or
overlapSize, whichever is largest. Different job sizes will lead to non-identical compressed frames.
from zstd.
If multithreading was working, I would expect user time > real time
.
That's not the case here, suggesting something is wrong.
For reference, here is what I'm getting on a local Ubuntu desktop :
time ./zstd -4 linux-6.2.9.tar -c > /dev/null
4.52s user 0.21s system 108% cpu 4.348 total
time ./zstd -4 -T0 linux-6.2.9.tar -c > /dev/null
11.91s user 0.42s system 735% cpu 1.675 total
from zstd.
Test with your file looks good!
time ./zstd1.5.5 -4 linux-6.2.9.tar -c > /dev/null
real 0m4.046s user 0m4.260s sys 0m0.681s
time ./zstd1.5.5 -4 -T0 linux-6.2.9.tar -c > /dev/null
real 0m0.774s user 0m4.362s sys 0m0.817s
But with another file (just a core file of crashed process) does NOT
time ./zstd1.5.5 -4 core-file -c > /dev/null
real 0m6.837s user 0m5.990s sys 0m6.653s
time ./zstd1.5.5 -4 -T0 core-file -c > /dev/null
real 0m6.886s user 0m6.720s sys 0m7.051s
ls -l linux-6.2.9.tar core-file
-rw-rw-r--. 1 test test 14083944448 Dec 2 14:15 core-file
-rw-rw-r--. 1 test test 1371432960 Mar 30 2023 linux-6.2.9.tar
from zstd.
It implies that this outcome is data dependent, therefore not a generality.
Data compression is indeed data dependent, compression ratio of course, but even compression speed.
That being said, that multithreading isn't effective on some type of data compared to others is a new one, and I would not have expected it.
I'm afraid that, without access to a reproduction case, it will be difficult to investigate further.
from zstd.
On my machine, -T0
seems to work on level 19, but not as well on level 21:
% # Prepare test file
% seq 100000000 > seq.txt
% zstd -19 -T0 seq.txt
seq.txt : 2.83% ( 848 MiB => 24.0 MiB, seq.txt.zst)
zstd -19 -T0 seq.txt 410.49s user 0.84s system 989% cpu 41.560 total
% zstd --ultra -21 -T0 seq.txt
zstd: seq.txt.zst already exists; overwrite (y/n) ? y
seq.txt : 2.79% ( 848 MiB => 23.6 MiB, seq.txt.zst)
zstd --ultra -21 -T0 seq.txt 471.95s user 0.62s system 325% cpu 2:25.22 total
The first run used ~10 out of 12 physical cores (close to what I'd expect), the second one barely more than 3.
Is that expected?
from zstd.
Am I understand you correctly ?
ZSTD can not scale linear. So when I increase number of cores 8x times (from 4 to 32) I can not expect 8x increase in performance, just no or little increase.
from zstd.
It's a combination of factors.
The level 1 is the most likely to scale linearly, because its amount of "hot" memory typically fits inside each core.
After that, as level increases, memory requirement increases, and it becomes more and more likely (depending on exact cpu model) that hot memory will spill over into shared resource, such as L3 cache or RAM. At which point, by increasing the nb of cores, there will be increased contention on the shared resource. Adding cores will still increase performance, but no longer linearly.
The issue reported by @gyscos is different though : it's a question of quantity of input.
Given an infinite input stream and an infinite input bandwidth, all threads will be occupied compressing a section each.
But if the input is "too small", there will not be enough jobs to distribute. So even if 100 threads are available, if there are only 5 jobs to distribute, for example, it won't be possible to employ the 100 threads.
The problem is especially acute in the --ultra
range, because each job becomes huge : at level 21, each job is 256 MB large by default. So it becomes probable that only one job, or very few, will be distributed.
This issue is mostly a problem for --ultra
levels. Lower levels have more reasonable job sizes. Level 19 defines a 32 MB job size by default, and level 1 defines a 2 MB one. "by default" is stated because it's also possible to manually take control of this value when need be.
from zstd.
Related Issues (20)
- Export "selected" CMake target for zstd HOT 5
- [Question] Understanding of compression level with external sequence producer HOT 2
- zstd fails to process some filenames on Windows [we need a hero] HOT 3
- We need a ZStd JavaScript library HOT 1
- Compiler warnings present when integrated with Swift Package Manager
- will zstd get nvcomp acceleration or a gpu support like g-brotli? HOT 1
- How can I change the window size? HOT 1
- Increase minimum C standard from C89/C90 to C11 HOT 3
- New zstd 1.5.5 version is two times slower in compression speed than older 1.4.5 version HOT 11
- Question about FSE Huffman literal part
- C++ Builder and mem.h ambiguity HOT 7
- Reducing DCtx Size for Embedded Systems (like esp32) HOT 2
- Automatic code formatting? HOT 1
- No `uncompressed` and `ratio` information in `zstd --list` output if the zstd file is created via pipe HOT 3
- lz4 "legacy" format support HOT 1
- Add common file types that are compressed to ' --exclude-compressed' HOT 3
- windows
- compressing files containing multiple similar portions HOT 5
- Using ZSTD_compressBound for Streaming Input HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.