Comments (8)
Huh did not know, thats interesting. I wonder if this is the same or similar to inflate/deflate flush to encode boundaries, i ran into this for TLS compression, but in that case there is no header for the trailing inflates.
Can come up with three ways to model this:
- Root is alway an array. Maybe inconvenient?
- Root can optionally be an array. Currently not possible API-wise.
- Add a trailing field array etc with trailing gzip:s
- Something else?
from fq.
I think "root is always an array" most precisely models the underlying format.
from fq.
Have a look at #794 and i think i agree, always an array is probably best
from fq.
Yeap some of text test were wrong, fixed, thanks.
I wonder if it's bad that we won't provide the full concatenated uncompressed stream somehow? also the nested decoding should happen on the concatenation and not the members uncompressed data. So maybe the root should instead be a struct with a members
array and a uncompressed
raw bytes?
from fq.
I didn't realize fq performs nested decoding. I'm not sure what to do. In most cases it might be better to have "a struct with a members
array and a uncompressed
raw bytes". But today I was analyzing a corrupted gz file where zcat
said CRC and size is wrong, and fq helped me to discover only the last member is corrupted and find out why. It was useful to see uncompressed
of each member and check they're fine. But I know this is an unusual situation.
I don't have a strong preference. I feel multi-member gz files are rare in practice, so either way is a decent choice.
Just for fun: This is how I used fq to analyze it. That was before I filed this issue, so I had to use gap0.
rm -f part* after*; cp original_input.gz after0.gz; i=0; while true; do o=$(./fq '.gap0|tobytesrange.start' after$i.gz) || break; [[ -z $o ]] && break; head -c$o after$i.gz > part$((i+1)).gz; tail -c+$((o+1)) after$i.gz > after$((i+1)).gz; ((i++)); done
from fq.
I didn't realize fq performs nested decoding. I'm not sure what to do. In most cases it might be better to have "a struct with a
members
array and auncompressed
raw bytes". But today I was analyzing a corrupted gz file wherezcat
said CRC and size is wrong, and fq helped me to discover only the last member is corrupted and find out why. It was useful to seeuncompressed
of each member and check they're fine. But I know this is an unusual situation.
Yes it does nested decode by default, with sometimes options to disable it. This was added early for fq as it's roots is in debugging media containers and codecs where it's common with lots of nested subformat and muxers that slice up packets in various ways.
About each member's uncompress: in the PR i now modelled so that you have access to both each members uncompressed data and a concat of them all.
I don't have a strong preference. I feel multi-member gz files are rare in practice, so either way is a decent choice.
I think it makes sense, kind of the point of fq is to not hide details :)
Now i actually remember that alpine packages uses concatted gzip:s.
Just for fun: This is how I used fq to analyze it. That was before I filed this issue, so I had to use gap0.
rm -f part* after*; cp original_input.gz after0.gz; i=0; while true; do o=$(./fq '.gap0|tobytesrange.start' after$i.gz) || break; [[ -z $o ]] && break; head -c$o after$i.gz > part$((i+1)).gz; tail -c+$((o+1)) after$i.gz > after$((i+1)).gz; ((i++)); done
Nice! you wanted to output each uncompressed to a file? what was the o+1 thing, skip one byte from gap0 start?
fq is not great for outputting multiple files atm, not sure how it could be done without adding messy IO-function hmm. But i have used some hack using tar. So something like this:
Copy the to_tar
snippet from https://github.com/wader/fq/wiki/snippets an put in tar.jq
then do:
# -L . adds cwd to include path
# use include "tar" to include tar.jq
# iterate .members as {key: ..., value: ...} objects, as it's an array key will be 0,1,2,... and value the member itself
# to_tar(f) takes a function f as arg that outputs {filename: ..., data: ...} objects
$ fq -L . 'include "tar"; to_tar(.members | to_entries[] | {filename: "part\(.key)", data: .value.uncompressed})' format/gzip/testdata/multi_members.gz | tar tv
-rw-r--r-- 0 user group 11 Jan 1 1970 part0
-rw-r--r-- 0 user group 10 Jan 1 1970 part1
from fq.
Nice! you wanted to output each uncompressed to a file? what was the o+1 thing, skip one byte from gap0 start?
Right, I wanted to output each compressed member to a file, so I can look at them with zcat/fq/hexdump. $((o+1)) is just because tail counts from 1, e.g. "tail -c+9" discards first 8 bytes and starts printing from the 9th byte.
Interesting tar snippet. To be honest I don't really like or understand the jq language, but maybe I'll learn one day.
By the way just for fun, this is not related to fq, but I solved the mystery of the corrupted gz file I mentioned: The uncompressed data looks OK and the footer is present, but the footer CRC and isize are wrong. What could've caused that?
It is generated by a Python program which opens it as with gzip.open(filename, "at") as f:
. The solution is that it got a KeyboardInterrupt exception just after executing this line. The compressed data was written, but self.crc and self.size weren't updated. The with:
statement called the close() method and wrote a gzip footer, but not the correct values.
from fq.
Interesting tar snippet. To be honest I don't really like or understand the jq language, but maybe I'll learn one day.
I can relate and it took quite a while to get my head around it, now i love it. But i think it really fits very well for what i at least use fq for, to do lots of adhoc queries to digg and poke around in half broken and strange media and binary files. And i hope basic jq is easy enough for ppl to use... i've also notice ppl use fq by more or less just with d
and -V
etc and then pipe grep/less or whatnot :) whatever works
By the way just for fun, this is not related to fq, but I solved the mystery of the corrupted gz file I mentioned: The uncompressed data looks OK and the footer is present, but the footer CRC and isize are wrong. What could've caused that? It is generated by a Python program which opens it as
with gzip.open(filename, "at") as f:
. The solution is that it got a KeyboardInterrupt exception just after executing this line. The compressed data was written, but self.crc and self.size weren't updated. Thewith:
statement called the close() method and wrote a gzip footer, but not the correct values.
👍 aha tricky, glad you solved it! so it was just one odd gzip file or something that happened regularly?
from fq.
Related Issues (20)
- demo.svg looks wired in my environment HOT 11
- [feature] shell completions HOT 3
- [Feature request] Support cwf, swf, zwf HOT 1
- [Feature request] Support pdf HOT 1
- [Feature] Support for Doom WAD Files HOT 5
- [feature] add decimal floating-point number support HOT 2
- mp3 file with id3 2.4.0 got killed from console output HOT 6
- typo HOT 3
- [Documentation] Any interest in creating a man page? HOT 5
- Feature request: zero-length start/end properties HOT 7
- Support for non-canonical tags with html HOT 5
- Color output is unreadable on terminals using light backgrounds. HOT 4
- make use of kaitai struct for additional formats? HOT 4
- [Feature request] Support image/bmp HOT 1
- Format Decoder Conventions HOT 3
- zip: last_modification_date and last_modification_time are mislabeled or swapped HOT 2
- Investigate Data Format Description Language (DFDL) HOT 4
- Consider relicensing internal/mathex/float80.go HOT 3
- Enhancing Stream Processing Capabilities for Real-Time Binary Data Analysis HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fq.