GithubHelp home page GithubHelp logo

Comments (4)

pmeenan avatar pmeenan commented on May 3, 2024 1

If we went that route, we would probably need a combination of reserving one of the magic's as well as a signature header on the hash itself in case the same magic was also used by someone else for watermarking, etc.

from zstd.

Cyan4973 avatar Cyan4973 commented on May 3, 2024

Currently, Zstandard supports 2 modes for dictionaries :

  • Without identifier : it can be any content
  • With an identifier (up to 32-bit) : it must be a Zstandard-formatted dictionary (with its specified header format).

The current format of Zstandard has been frozen in RFC8878, so if we want to remain within the boundaries of what has been specified, these are pretty much the only options.

Now, introducing format-breaking novelties is not impossible, but it will come at a cost: existing (already deployed) Zstandard decoders will be incompatible with these changes. So this is an option we want to be careful about, and trigger only for a very good reason.

Regarding the described request to transmit a hash of the dictionary to compare against, there is an existing work around that might help here: the skippable frame.
These frames can be appended or prepended in a flow of regular Zstandard frames, and the decoder will skip them.
Which means, their content can be anything that an external application defines.
This is frequently used for watermarking for example, allowing fleet-scale investigations, and could be used here to store the wanted hash.

The advantage is that the application is fully in charge, so it can make the choices it wants, and change them, without having to coordinate with libzstd. For example, what's the format of the hash ? Is that SHA256 ? or something else ? will it evolve tomorrow ? I presume it means the hash is controlled, hence the reference scanned with the desired algorithm ? Or maybe it was already scanned, and the value is already cached somewhere ?
All these decisions could be made, and updated, at application level.

A skippable frame is fairly light weight, it introduces a cost of only 8 bytes, for the magic header and the content size.
The main cost is actually logic complexity at application level.

On the other hand, if we were willing to push that logic inside libzstd, it would add a few more topics to consider :

  • First, since it's incompatible with the existing zstd format, the format would need an evolution, breaking compatibility with existing coders.
  • Second, the question of the "type of hash" is not neutral, and needs to be decided upfront. It may impact the dependency surface of the libzstd library (which is currently very small, which is preferable to support a broad range of applications). Finally, updating this choice later on can become quite tricky.

So, with these trade-offs in mind, a method based on skippable frames to transport the information feels like a reasonable option to consider.
There are probably other ways to send this information too, but I'm not familiar enough with the domain to correctly list the pros and cons.

from zstd.

pmeenan avatar pmeenan commented on May 3, 2024

Thanks. Without a tagging mechanism for the skippable frames (and a registry for ID's of some kind) I don't think we want to be adding them to all of the dictionary-compressed streams served on the web.

A web-specific container (header) in front of the zstd file format might work for transport but the raw resources wouldn't be usable by the cli tools.

Sounds like an out-of-band negotiation is the best we can hope for for now and just ask that you keep it in mind for any future revisions to the file format (if there end up being any).

from zstd.

felixhandte avatar felixhandte commented on May 3, 2024

Yeah. Although note that the skippable frame magic has a range of 16 values. If we were going to pursue this, we could probably reserve one of those values for this purpose.

from zstd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.