Comments (9)
Yeap you describe my current plan quite well, some kind of helpers for cut/stitch and encodings. I had some momentum and motivation for a while to work on it but think i got stuck at how to make it feel jq:y and how to not "pollute" the namespace to much lots of small functions etc. One thing i've thought about is that fq has some machinery already do to query rewrites so it's possible to "extend" the jq language a bit if that would help.
About ultimate visions i think you summerise the problem quite well also. You more or less end up having to writing a transmuxer, linker etc and one that should handle and preserve lots of strange things or should it "normalize"? :) I've thought some about different ways, i'll list them:
-
"symmetric"/"bi-directional" approach. Would probably require something declarative. I know kaitai struct is working on doing serialisation but it seems to be far from trivial and probably will have limitations. Also an issue with declarative is that some formats have logic that i think is very complicated to describe purely declarativly. For example how to describe sample ranges in an mp4 file and then also how to describe what metadata from a certain box that is needed to decode samples for a specific track and so on.
-
Support format specific encoders. Similar to how currently
to_yaml
etc work but instead produce binary. Possibly a decoder and encoder could share a common JSON "schema" somehow? this also have some questions how it should behave in regard to nested decoding, reassembly, symbolic mappings etc? if lots of details are needed maybe the JSON representation would become less usable. Maybe how "decode value" current work could be extended to help? a decode value is a jq value plus some metadata (bit range, source buffer, sym mappings etc)
Also with any of the approaches it needs to fit well with how jq works.
from fq.
It's complicated :) fq at the moment have very limited editing support for "decode values", tourl
/fromurl
work on JSON so it's just normal jq beaviour. But it can do bits and bytes slicing and combine things together again into a binary. When you do .header.entry=0
the "decode value" will be first convert into JSON and then the update is done.
The reason the support is limited is a mix of lack of need myself for it and that it's complex for some of the format supported by fq. For a lot of formats it's not really clear what should happen on an update, encode with same encoding but what about encoding that are ambiguous like varints can encode value in many ways with differente sizes? should size be preserved/truncated? update checksums? fields that control number of entries in an array? also fq has support for sub buffers for demuxing/tcp reasssembly etc... yeah you see :) But maybe the "clearest" would be to just support updating a specific bit/byte range using some helpers bit-size/endian helpers etc.
And you can kind-of do this already using the slicing support, for example update .header.entry
in an ELF:
# this assume the entry is 64 bit
$ fq '(.header.entry | tobytesrange) as $e | tobytes | [.[:$e.start],0,1,2,3,4,5,6,7,.[$e.stop:]] | tobytes | elf | .header' some_elf
# or to write it out to a file
$ fq '(.header.entry | tobytesrange) as $e | tobytes | [.[:$e.start],0,1,2,3,4,5,6,7,.[$e.stop:]] | tobytes' some_elf > changed
This uses slicing, just normal [start:stop]
jq syntax, on bytes (there is also tobits
/tobitsrange
to use bit indexes) and "binary arrays" in fq (similar to iolist:s in erlang). So any array that include only these values can be convert to an binary (via tobytes
/tobits
).
- 0-255 will be one byte
- strings will be UTF-8 bytes
- nested binary array
- bits and bytes values
Also the difference between tobytes
and tobytesrange
is that the range-version "remembers" its source start/stop range.
That said all of this can probably be improved in many ways, let me know your ideas.
from fq.
Yeah, that's really nice that you can use tobytesrange in that way -- definitely a missing recipe in the docs in the interim!
A next small step would be to provide an ergonomic way to inject bytes. overwritebytes($e, newvalue | asuint32)
or whatever would be appropriate as syntactic sugar for the recipe you suggested above. I guess it gets fun when you have to consider all possible encodings and endiannesses and alike. overwritebytes
could at least check the length of the bytes being inserted matches the range being inserted into.
The ultimate vision would be to be able to update any value in any format and then propagate that change to anything else in the binary that needs to be updated to make it semantically correct. I'm guessing though that this is difficult-to-impossible, in the most extreme case requiring essentially a recompilation of the binary (imagine for my use case(s) for ELF patching, changing the length of a string, which changes the offsets of everything else in a section, suddenly all absolute addresses referring to points after that string may need changing and those new addresses might not be representable anymore with the same sequences of instructions in the binary, which would need propagating and so-on and so on).
from fq.
How about adding a "Big thing" TODO about binary modifications that may as well modify length?
from fq.
How about adding a "Big thing" TODO about binary modifications that may as well modify length?
Yeap that is good idea, maybe can link this issue also.
Could you clarify what you mean by "may as well modify length"? about if the modification changes the length of the thing being modified?
from fq.
I think I don't have a use case right now, but the idea is as following. Let's take fMP4 container. "ftyp" box contains a list of "brands". Assume adding a "brand" to this list of 4-char identifiers. This operation would change the length of the binary representation of the list. The box containing the list would also grow in size. The boxes that follow the "ftyp" box would change their position (start+=4). Basically, the idea is to allow this sort of manipulations: not just replace few bytes but also do inserts and deletes.
from fq.
Ok i see, yeah that would be nice but not sure how one would do it and i have thought about it quite a lot. For example in the mp4 case if the brands list change affect the size of the ftyp box then all boxes after it will move which in turn will most likely affect offsets in stco boxes etc and so on if it should still be playable. So to support that kind of thing my guess is that one would have to write an encoder per format that want to support it (nearly a mp4 muxer in this case). But there are other issues and ambiguities encoding creates also, should an encoder try to "preserve" number, string etc encodings that can encode the same value in multiple ways? or normalize? (varint for example), would assign to a field that has symbolic mapping do reverse map back? lots of questions :)
from fq.
Agree. Likely the whole thing would look like "manual muxer" specific for every format. Specificity in not an issue per se as every format is already custom. I was thinking about e.g. sidx box. In the above case it would become invalid, so one would need either to manually patch its values, or fq
would parse the file, maintain internal representation including references, and write correct values of sidx
during serialization. The latter would mean the sidx
references are computable fields that don't fit well into the whole concept. The former approach may be practical in some cases. I guess the right way is to collect real-world use cases and go from that.
from fq.
Yeap collect use cases sounds good. I've mostly used the technique i mentioned in an comment above to stitch things together, i wonder if one would come up with some helper function(s) to make that easier
from fq.
Related Issues (20)
- demo.svg looks wired in my environment HOT 11
- [feature] shell completions HOT 3
- [Feature request] Support cwf, swf, zwf HOT 1
- [Feature request] Support pdf HOT 1
- [Feature] Support for Doom WAD Files HOT 5
- [feature] add decimal floating-point number support HOT 2
- mp3 file with id3 2.4.0 got killed from console output HOT 6
- typo HOT 3
- [Documentation] Any interest in creating a man page? HOT 5
- Feature request: zero-length start/end properties HOT 7
- Support for non-canonical tags with html HOT 5
- Color output is unreadable on terminals using light backgrounds. HOT 4
- make use of kaitai struct for additional formats? HOT 4
- [Feature request] Support image/bmp HOT 1
- Format Decoder Conventions HOT 3
- zip: last_modification_date and last_modification_time are mislabeled or swapped HOT 2
- gzip files can contain multiple concatenated gzips HOT 8
- Investigate Data Format Description Language (DFDL) HOT 4
- Consider relicensing internal/mathex/float80.go HOT 3
- Enhancing Stream Processing Capabilities for Real-Time Binary Data Analysis HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fq.