GithubHelp home page GithubHelp logo

Editing a binary about fq HOT 9 OPEN

wader avatar wader commented on May 12, 2024
Editing a binary

from fq.

Comments (9)

wader avatar wader commented on May 12, 2024 1

Yeap you describe my current plan quite well, some kind of helpers for cut/stitch and encodings. I had some momentum and motivation for a while to work on it but think i got stuck at how to make it feel jq:y and how to not "pollute" the namespace to much lots of small functions etc. One thing i've thought about is that fq has some machinery already do to query rewrites so it's possible to "extend" the jq language a bit if that would help.

About ultimate visions i think you summerise the problem quite well also. You more or less end up having to writing a transmuxer, linker etc and one that should handle and preserve lots of strange things or should it "normalize"? :) I've thought some about different ways, i'll list them:

  • "symmetric"/"bi-directional" approach. Would probably require something declarative. I know kaitai struct is working on doing serialisation but it seems to be far from trivial and probably will have limitations. Also an issue with declarative is that some formats have logic that i think is very complicated to describe purely declarativly. For example how to describe sample ranges in an mp4 file and then also how to describe what metadata from a certain box that is needed to decode samples for a specific track and so on.

  • Support format specific encoders. Similar to how currently to_yaml etc work but instead produce binary. Possibly a decoder and encoder could share a common JSON "schema" somehow? this also have some questions how it should behave in regard to nested decoding, reassembly, symbolic mappings etc? if lots of details are needed maybe the JSON representation would become less usable. Maybe how "decode value" current work could be extended to help? a decode value is a jq value plus some metadata (bit range, source buffer, sym mappings etc)

Also with any of the approaches it needs to fit well with how jq works.

from fq.

wader avatar wader commented on May 12, 2024

It's complicated :) fq at the moment have very limited editing support for "decode values", tourl/fromurl work on JSON so it's just normal jq beaviour. But it can do bits and bytes slicing and combine things together again into a binary. When you do .header.entry=0 the "decode value" will be first convert into JSON and then the update is done.

The reason the support is limited is a mix of lack of need myself for it and that it's complex for some of the format supported by fq. For a lot of formats it's not really clear what should happen on an update, encode with same encoding but what about encoding that are ambiguous like varints can encode value in many ways with differente sizes? should size be preserved/truncated? update checksums? fields that control number of entries in an array? also fq has support for sub buffers for demuxing/tcp reasssembly etc... yeah you see :) But maybe the "clearest" would be to just support updating a specific bit/byte range using some helpers bit-size/endian helpers etc.

And you can kind-of do this already using the slicing support, for example update .header.entry in an ELF:

# this assume the entry is 64 bit
$ fq '(.header.entry | tobytesrange) as $e | tobytes | [.[:$e.start],0,1,2,3,4,5,6,7,.[$e.stop:]] | tobytes | elf | .header' some_elf
# or to write it out to a file
$ fq '(.header.entry | tobytesrange) as $e | tobytes | [.[:$e.start],0,1,2,3,4,5,6,7,.[$e.stop:]] | tobytes' some_elf > changed

This uses slicing, just normal [start:stop] jq syntax, on bytes (there is also tobits/tobitsrange to use bit indexes) and "binary arrays" in fq (similar to iolist:s in erlang). So any array that include only these values can be convert to an binary (via tobytes/tobits).

  • 0-255 will be one byte
  • strings will be UTF-8 bytes
  • nested binary array
  • bits and bytes values

Also the difference between tobytes and tobytesrange is that the range-version "remembers" its source start/stop range.

That said all of this can probably be improved in many ways, let me know your ideas.

from fq.

peterwaller-arm avatar peterwaller-arm commented on May 12, 2024

Yeah, that's really nice that you can use tobytesrange in that way -- definitely a missing recipe in the docs in the interim!

A next small step would be to provide an ergonomic way to inject bytes. overwritebytes($e, newvalue | asuint32) or whatever would be appropriate as syntactic sugar for the recipe you suggested above. I guess it gets fun when you have to consider all possible encodings and endiannesses and alike. overwritebytes could at least check the length of the bytes being inserted matches the range being inserted into.

The ultimate vision would be to be able to update any value in any format and then propagate that change to anything else in the binary that needs to be updated to make it semantically correct. I'm guessing though that this is difficult-to-impossible, in the most extreme case requiring essentially a recompilation of the binary (imagine for my use case(s) for ELF patching, changing the length of a string, which changes the offsets of everything else in a section, suddenly all absolute addresses referring to points after that string may need changing and those new addresses might not be representable anymore with the same sequences of instructions in the binary, which would need propagating and so-on and so on).

from fq.

ksa-real avatar ksa-real commented on May 12, 2024

How about adding a "Big thing" TODO about binary modifications that may as well modify length?

from fq.

wader avatar wader commented on May 12, 2024

How about adding a "Big thing" TODO about binary modifications that may as well modify length?

Yeap that is good idea, maybe can link this issue also.

Could you clarify what you mean by "may as well modify length"? about if the modification changes the length of the thing being modified?

from fq.

ksa-real avatar ksa-real commented on May 12, 2024

I think I don't have a use case right now, but the idea is as following. Let's take fMP4 container. "ftyp" box contains a list of "brands". Assume adding a "brand" to this list of 4-char identifiers. This operation would change the length of the binary representation of the list. The box containing the list would also grow in size. The boxes that follow the "ftyp" box would change their position (start+=4). Basically, the idea is to allow this sort of manipulations: not just replace few bytes but also do inserts and deletes.

from fq.

wader avatar wader commented on May 12, 2024

Ok i see, yeah that would be nice but not sure how one would do it and i have thought about it quite a lot. For example in the mp4 case if the brands list change affect the size of the ftyp box then all boxes after it will move which in turn will most likely affect offsets in stco boxes etc and so on if it should still be playable. So to support that kind of thing my guess is that one would have to write an encoder per format that want to support it (nearly a mp4 muxer in this case). But there are other issues and ambiguities encoding creates also, should an encoder try to "preserve" number, string etc encodings that can encode the same value in multiple ways? or normalize? (varint for example), would assign to a field that has symbolic mapping do reverse map back? lots of questions :)

from fq.

ksa-real avatar ksa-real commented on May 12, 2024

Agree. Likely the whole thing would look like "manual muxer" specific for every format. Specificity in not an issue per se as every format is already custom. I was thinking about e.g. sidx box. In the above case it would become invalid, so one would need either to manually patch its values, or fq would parse the file, maintain internal representation including references, and write correct values of sidx during serialization. The latter would mean the sidx references are computable fields that don't fit well into the whole concept. The former approach may be practical in some cases. I guess the right way is to collect real-world use cases and go from that.

from fq.

wader avatar wader commented on May 12, 2024

Yeap collect use cases sounds good. I've mostly used the technique i mentioned in an comment above to stitch things together, i wonder if one would come up with some helper function(s) to make that easier

from fq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.