Comments (4)
If we went that route, we would probably need a combination of reserving one of the magic's as well as a signature header on the hash itself in case the same magic was also used by someone else for watermarking, etc.
from zstd.
Currently, Zstandard supports 2 modes for dictionaries :
- Without identifier : it can be any content
- With an identifier (up to 32-bit) : it must be a Zstandard-formatted dictionary (with its specified header format).
The current format of Zstandard has been frozen in RFC8878, so if we want to remain within the boundaries of what has been specified, these are pretty much the only options.
Now, introducing format-breaking novelties is not impossible, but it will come at a cost: existing (already deployed) Zstandard decoders will be incompatible with these changes. So this is an option we want to be careful about, and trigger only for a very good reason.
Regarding the described request to transmit a hash of the dictionary to compare against, there is an existing work around that might help here: the skippable frame.
These frames can be appended or prepended in a flow of regular Zstandard frames, and the decoder will skip them.
Which means, their content can be anything that an external application defines.
This is frequently used for watermarking for example, allowing fleet-scale investigations, and could be used here to store the wanted hash.
The advantage is that the application is fully in charge, so it can make the choices it wants, and change them, without having to coordinate with libzstd
. For example, what's the format of the hash ? Is that SHA256
? or something else ? will it evolve tomorrow ? I presume it means the hash is controlled, hence the reference scanned with the desired algorithm ? Or maybe it was already scanned, and the value is already cached somewhere ?
All these decisions could be made, and updated, at application level.
A skippable frame is fairly light weight, it introduces a cost of only 8 bytes, for the magic header and the content size.
The main cost is actually logic complexity at application level.
On the other hand, if we were willing to push that logic inside libzstd
, it would add a few more topics to consider :
- First, since it's incompatible with the existing
zstd
format, the format would need an evolution, breaking compatibility with existing coders. - Second, the question of the "type of hash" is not neutral, and needs to be decided upfront. It may impact the dependency surface of the
libzstd
library (which is currently very small, which is preferable to support a broad range of applications). Finally, updating this choice later on can become quite tricky.
So, with these trade-offs in mind, a method based on skippable frames to transport the information feels like a reasonable option to consider.
There are probably other ways to send this information too, but I'm not familiar enough with the domain to correctly list the pros and cons.
from zstd.
Thanks. Without a tagging mechanism for the skippable frames (and a registry for ID's of some kind) I don't think we want to be adding them to all of the dictionary-compressed streams served on the web.
A web-specific container (header) in front of the zstd file format might work for transport but the raw resources wouldn't be usable by the cli tools.
Sounds like an out-of-band negotiation is the best we can hope for for now and just ask that you keep it in mind for any future revisions to the file format (if there end up being any).
from zstd.
Yeah. Although note that the skippable frame magic has a range of 16 values. If we were going to pursue this, we could probably reserve one of those values for this purpose.
from zstd.
Related Issues (20)
- Raise version's in win32 binaries header HOT 3
- Why was the new release 1.5.6 removed? HOT 15
- long file names are cut off in output HOT 3
- Should zstd check archive consistency before overwriting files? HOT 1
- Should zstd delete incomplete archives? HOT 5
- 32-bit x86 build failure with 1.5.6 HOT 3
- v1.5.6 breaks 32-bit Windows clang-cl build HOT 3
- Decompress multiple zstaa backups on FAT32 drives HOT 4
- Replication of bug #3517 HOT 24
- Separate dictionary references to enable dictionary usage for any combination of window size and content size HOT 1
- Decompression speed regression in zstd 1.5.6 (win)
- Decompression crash after upgrading from zstd 1.4.5 to 1.5.6 HOT 12
- Missing check on failed allocation leads to NULL-ptr dereference
- libzstd.lib missed in package, also VC sample seems include wrong mem.h or ambigious including!
- Environment variable for --memory HOT 2
- Improve misleading wording in the streaming decompression howto HOT 1
- erro
- Add library and cli flags for file format with embedded dictionary
- Question about ZSTD protocole
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.