Comments (7)
Yes, I suspect repeating pattern to have devastating effects on btlazy2 strategy.
Not sure if there is a simple, non-detrimental, solution to it.
Repeating patterns are not that common, except for the trivial case of a single repeating character (which is already taken care of). Complex solution take a toll for every operation, and ends up making btlazy2 slower for the general use case.
That being said, I'm opened to any suggestion / patch.
from zstd.
Sometimes JSON, XML/HTML/SVG, & simple array data (like some NoSQL blobs that are templated, maps in games, etc) can be repetitive.
Only suggestion: Have a separate mode that is better at repeated patterns.
Phase 1: repeated pattern mode switched on manually via CLI flag. Someone who doesn't care about compression time could write CLI script to compare with vs without, & throw out the worst compression.
Phase 2: The above comparison could be done internally via 2nd switch.
Phase 3. Comparison can be done with smaller segments.
from zstd.
Couldn't repetitions in JSON/XML be addressed by good dictionary support that would take care of that, leaving only the core content to be handled by a more suitable algorithm?
from zstd.
I originally noticed this trying to compress Windows\Logs\CBS\CBS.log, so it does occur with some real-world data. It makes the high compression levels DOSable, which seems like a cause for concern. It's tricky for clients that need to avoid that to do so right now because the maximum non-btlazy2 compression level depends on the input size hint and the library version.
Is the problem that the hash chains grow linearly so the total search time is quadratic? It's not obvious to me how to avoid that, but other libraries (such as LZMA SDK) do somehow avoid it in their maximum compression modes with large windows.
from zstd.
The problem is limited to btlazy2, as it uses a binary tree.
The hash chain methodology, used within lazy2, is less affected : it pays a heavy price during search, but following insertions are fast, and the dangerous section is quickly skipped since it compresses well.
By contrast, binary tree pays a heavy price during search _and_ insertion.
Let's keep this issue opened. A solution will be needed to handle such case gracefully, without impacting too much the more general situation where repetitive data is either absent or in limited proportion.
from zstd.
There's a new update into "dev" branch (https://github.com/Cyan4973/zstd/tree/dev)
which specific targets repetitive data with btlazy2 (strongest) modes.
In my tests, it dramatically improves speed in presence of repetitive segments of any period.
The cost for it is pretty small : normal data tend to compress slightly less, but I expect this difference to be negligible in most circumstances. The good news is that speed is not worsened for normal data.
For your testings. This is still experimental stuff.
from zstd.
Merged into master
from zstd.
Related Issues (20)
- Provide Linux & Darwin (macOS) builds via GitHub Releases
- Disable auto vectorization of xxhash64, when AVX512 is present. HOT 5
- No check if Reserved of Symbol_Compression_Modes is 0 HOT 8
- Spec cleanup: Should fixup behavior when repeat1-1==0 be specified or changed to an error? HOT 3
- Strange tags make automation crazy HOT 1
- Modernize macros to use `do { } while (0)` instead of `{ }` HOT 9
- [question] Seek for insights on the suitable case for zstd dictionary compression HOT 5
- zstd not buildable with PAC/BTI becauseof `huf_decompress_amd64.S` HOT 4
- get a core dump on use ZSTD_compressCCtx (Unhandled exception access violation) HOT 3
- Adding a library to https://facebook.github.io/zstd/#other-languages HOT 1
- Allow files bigger than 2GiB for --patch-from option HOT 1
- Allow files bigger than 2GiB for --patch-from option HOT 2
- [Question] How to force single literal streams during compression HOT 1
- Export "selected" CMake target for zstd HOT 5
- [Question] Understanding of compression level with external sequence producer HOT 2
- zstd fails to process some filenames on Windows [we need a hero] HOT 3
- We need a ZStd JavaScript library HOT 1
- Compiler warnings present when integrated with Swift Package Manager
- will zstd get nvcomp acceleration or a gpu support like g-brotli? HOT 1
- How can I change the window size? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.