Comments (8)
Thanks for the update, I'll try this out in the next few weeks!
from zstd.
Hi @iksaif sorry for the delay, but I have several updates:
- PR #3827 drastically speeds up the fast C & assembly Huffman decoders on very small data like in the
github
corpus. - PR #3826 may improve
c7g Graviton 3
performance. It manually unrolls the inner loops, because I've found that in certain scenarios the compiler wasn't unrolling the inner loops. If that was the case forc7g
, then I would expect significant performance improvements. - PR #3826 also provides the ability to disable the fast C decoding loops at compile time by defining the C macro
HUF_DISABLE_FAST_DECODE
, in case you still see subpar performance after these PRs.
Please let me know if you see any more issues with performance after these PRs, and we will look into them. I can't guarantee a super fast turnaround time, but we always try to handle outstanding issues before we make releases.
from zstd.
👋 any finding ? thanks!
from zstd.
I can confirm that the latest version from git is back to v1.5.2
level of performances on c7g !
1# 9114 files : 7484607 -> 2603666 (x2.875), 183.3 MB/s, 435.3 MB/s
1#silesia.tar : 211957760 -> 132055991 (x1.605), 700.0 MB/s, 1143.6 MB/s
from zstd.
CC: @terrelln since you authored the original PR you are probably interested about this
I haven't looked at why things might be slower, but given the results it might be interesting to offer the option to not build the generic C versions of the fast decoding loops since it's unclear that they offer a significant performance boost on modern CPUs and compilers
Also maybe I did something wrong in those tests, I'm happy to re-run them if necessary
from zstd.
This also relates to:
- #3278 - which shows that assembly versions are not as performant as we think (for benchmark 1)
- #3155 which looks at custom ASM functions for aarch64 (but it looks like trusting the compiler could be faster)
from zstd.
Thanks for the report @iksaif!
I will spend some time investigating next week.
At first glance, the performance on c7g Graviton 3
looks bad across the board. But, everything else has just regressed the GitHub case, which is a bunch of small files. Which maybe suggests that we need to either improve the Fast-C & ASM loops for small literals sections, or automatically use the old code versions for small literals sections.
from zstd.
Great, I'm glad to hear it! Looks like it was running into the loop unrolling problem.
from zstd.
Related Issues (20)
- Can zstd decompress files such as .zst.001, .zst.002, and so on? HOT 5
- Question: how does dictionary achieve superior compression for small data? HOT 6
- Any way to skip incorrect data and try next data block when decompressing? HOT 3
- higher zstd compression level resulting in larger compressed data HOT 2
- aarch64/x86 causing different compression outputs with row match finder HOT 2
- ZStd 64 bit library compiles with VS 2022 crashes on old CPUs HOT 1
- Weird code size when -mbmi2 or -mno-bmi2 is specified HOT 2
- Compressing and decompressing with dictionaries, between different zstd versions HOT 3
- A question about the streaming compression interface HOT 1
- Question in understanding Zstd Digested Dictionaries HOT 2
- Take fixes of zstd tool before it included in latest HOT 4
- question: does `zstd_decompress` function has tolerance of data race HOT 1
- Provide Linux & Darwin (macOS) builds via GitHub Releases
- Disable auto vectorization of xxhash64, when AVX512 is present. HOT 5
- No check if Reserved of Symbol_Compression_Modes is 0 HOT 8
- Spec cleanup: Should fixup behavior when repeat1-1==0 be specified or changed to an error? HOT 3
- Strange tags make automation crazy HOT 1
- Modernize macros to use `do { } while (0)` instead of `{ }` HOT 9
- [question] Seek for insights on the suitable case for zstd dictionary compression HOT 5
- zstd not buildable with PAC/BTI becauseof `huf_decompress_amd64.S` HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.