GithubHelp home page GithubHelp logo

sile / libflate Goto Github PK

View Code? Open in Web Editor NEW
175.0 7.0 35.0 422 KB

A Rust implementation of DEFLATE algorithm and related formats (ZLIB, GZIP)

Home Page: https://docs.rs/libflate

License: MIT License

Rust 100.00%
rust gzip zlib deflate-algorithm

libflate's Introduction

libflate

libflate Documentation Actions Status Coverage Status License: MIT

A Rust implementation of DEFLATE algorithm and related formats (ZLIB, GZIP).

Documentation

See RustDoc Documentation.

The documentation includes some examples.

Installation

Add following lines to your Cargo.toml:

[dependencies]
libflate = "2"

An Example

Below is a command to decode GZIP stream that is read from the standard input:

extern crate libflate;

use std::io;
use libflate::gzip::Decoder;

fn main() {
    let mut input = io::stdin();
    let mut decoder = Decoder::new(&mut input).unwrap();
    io::copy(&mut decoder, &mut io::stdout()).unwrap();
}

An Informal Benchmark

A brief comparison with flate2 and inflate:

$ cd libflate/flate_bench/
$ curl -O https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz
$ gzip -d enwiki-latest-all-titles-in-ns0.gz
> ls -lh enwiki-latest-all-titles-in-ns0
-rw-rw-r-- 1 foo foo 265M May 18 05:19 enwiki-latest-all-titles-in-ns0

$ cargo run --release -- enwiki-latest-all-titles-in-ns0
# ENCODE (input_size=277303937)
- libflate: elapsed=8.137013s, size=83259010
-   flate2: elapsed=9.814607s, size=74692153

# DECODE (input_size=74217004)
- libflate: elapsed=1.354556s, size=277303937
-   flate2: elapsed=0.960907s, size=277303937
-  inflate: elapsed=1.926142s, size=277303937

References

libflate's People

Contributors

fauxfaux avatar hybrideidolon avatar ignatenkobrain avatar king6cong avatar kupiakos avatar lo48576 avatar lukaslueg avatar mleonhard avatar olback avatar qnighy avatar rmsyn avatar shnatsel avatar sile avatar srijs avatar stargateur avatar tesaguri avatar torokati44 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

libflate's Issues

btype 0x11 of DEFLATE is reserved(error) value

When trying to decompress the following data using this Rust code:

use std::io::Read;
use libflate::deflate::{Decoder};

fn main() {
    let data = [31, 139, 8, 0, 0, 0, 0, 0, 0, 3, 149, 147, 61, 75, 195, 80, 20, 134, 79, 62, 20, 84, 170, 1, 17, 28, 68, 28, 132, 110, 183, 247, 38, 185, 249, 154, 58, 10, 130, 116, 116, 170, 54, 109, 18, 11, 173, 169, 109, 90, 112, 210, 209, 81, 112, 112, 22, 127, 131, 56, 232, 232, 224, 143, 112, 112, 238, 159, 168, 55, 31, 234, 77, 91, 82, 26, 184, 57, 36, 79, 222, 115, 222, 251, 38, 9, 170, 27, 218, 209, 62, 128, 113, 76, 234, 181, 83, 82, 55, 31, 175, 223, 101, 0, 216, 249, 120, 217, 219, 102, 181, 244, 244, 186, 203, 10, 8, 108, 109, 18, 221, 70, 132, 234, 136, 152, 20, 81, 13, 222, 132, 148, 43, 11, 184, 144, 241, 178, 138, 49, 113, 176, 171, 90, 142, 175, 106, 45, 199, 79, 46, 217, 145, 63, 53, 126, 117, 241, 92, 49, 215, 215, 48, 17, 197, 185, 185, 179, 156, 252, 113, 49, 227, 91, 60, 39, 148, 240, 190, 196, 127, 95, 134, 217, 116, 176, 238, 89, 177, 47, 181, 200, 151, 180, 156, 206, 229, 247, 35, 229, 252, 176, 156, 8, 198, 252, 126, 138, 184, 144, 241, 57, 57, 106, 139, 114, 148, 167, 115, 178, 73, 46, 199, 34, 46, 100, 124, 206, 126, 245, 162, 185, 98, 166, 227, 242, 55, 16, 81, 49, 159, 227, 18, 125, 93, 222, 207, 202, 76, 14, 126, 172, 163, 69, 126, 148, 76, 87, 178, 9, 139, 213, 66, 148, 185, 177, 48, 0, 159, 211, 52, 167, 118, 198, 27, 189, 145, 134, 6, 145, 215, 65, 205, 176, 11, 240, 201, 158, 171, 150, 36, 104, 177, 90, 211, 37, 184, 99, 63, 11, 30, 2, 124, 63, 200, 73, 253, 98, 141, 206, 199, 233, 119, 18, 63, 11, 207, 34, 76, 38, 147, 155, 120, 193, 248, 48, 185, 23, 207, 186, 117, 214, 146, 26, 247, 57, 56, 1, 184, 63, 19, 18, 189, 82, 102, 51, 47, 162, 168, 55, 112, 42, 149, 8, 117, 189, 10, 123, 247, 65, 219, 95, 247, 195, 97, 127, 112, 53, 108, 244, 61, 144, 221, 246, 101, 192, 116, 171, 65, 24, 6, 29, 47, 13, 83, 73, 203, 15, 58, 186, 13, 141, 216, 3, 0, 0];
    let mut decoder = Decoder::new(&data[..]);
    let mut decoded_data = Vec::new();
    assert!(decoder.read_to_end(&mut decoded_data).is_ok());
}

I'm hitting the following error:

btype 0x11 of DEFLATE is reserved(error) value

Python's gzip can decompress it just fine, though:

import gzip
data = [31, 139, 8, 0, 0, 0, 0, 0, 0, 3, 149, 147, 61, 75, 195, 80, 20, 134, 79, 62, 20, 84, 170, 1, 17, 28, 68, 28, 132, 110, 183, 247, 38, 185, 249, 154, 58, 10, 130, 116, 116, 170, 54, 109, 18, 11, 173, 169, 109, 90, 112, 210, 209, 81, 112, 112, 22, 127, 131, 56, 232, 232, 224, 143, 112, 112, 238, 159, 168, 55, 31, 234, 77, 91, 82, 26, 184, 57, 36, 79, 222, 115, 222, 251, 38, 9, 170, 27, 218, 209, 62, 128, 113, 76, 234, 181, 83, 82, 55, 31, 175, 223, 101, 0, 216, 249, 120, 217, 219, 102, 181, 244, 244, 186, 203, 10, 8, 108, 109, 18, 221, 70, 132, 234, 136, 152, 20, 81, 13, 222, 132, 148, 43, 11, 184, 144, 241, 178, 138, 49, 113, 176, 171, 90, 142, 175, 106, 45, 199, 79, 46, 217, 145, 63, 53, 126, 117, 241, 92, 49, 215, 215, 48, 17, 197, 185, 185, 179, 156, 252, 113, 49, 227, 91, 60, 39, 148, 240, 190, 196, 127, 95, 134, 217, 116, 176, 238, 89, 177, 47, 181, 200, 151, 180, 156, 206, 229, 247, 35, 229, 252, 176, 156, 8, 198, 252, 126, 138, 184, 144, 241, 57, 57, 106, 139, 114, 148, 167, 115, 178, 73, 46, 199, 34, 46, 100, 124, 206, 126, 245, 162, 185, 98, 166, 227, 242, 55, 16, 81, 49, 159, 227, 18, 125, 93, 222, 207, 202, 76, 14, 126, 172, 163, 69, 126, 148, 76, 87, 178, 9, 139, 213, 66, 148, 185, 177, 48, 0, 159, 211, 52, 167, 118, 198, 27, 189, 145, 134, 6, 145, 215, 65, 205, 176, 11, 240, 201, 158, 171, 150, 36, 104, 177, 90, 211, 37, 184, 99, 63, 11, 30, 2, 124, 63, 200, 73, 253, 98, 141, 206, 199, 233, 119, 18, 63, 11, 207, 34, 76, 38, 147, 155, 120, 193, 248, 48, 185, 23, 207, 186, 117, 214, 146, 26, 247, 57, 56, 1, 184, 63, 19, 18, 189, 82, 102, 51, 47, 162, 168, 55, 112, 42, 149, 8, 117, 189, 10, 123, 247, 65, 219, 95, 247, 195, 97, 127, 112, 53, 108, 244, 61, 144, 221, 246, 101, 192, 116, 171, 65, 24, 6, 29, 47, 13, 83, 73, 203, 15, 58, 186, 13, 141, 216, 3, 0, 0]
assert gzip.decompress(bytes(data)) is not None

As someone not experienced with gzip at all, I would like to know why this happens, and if it could be fixed to support this mode (which Python handles just fine). Otherwise, maybe there is a way to decompress this data regardless? Thanks.

I have also tested flate2, and that's able to decompress the data too.

panic with binary file

[package]
name = "compression_test"
version = "0.1.0"

[dependencies]
libflate = "0.1"
extern crate libflate;

use std::fs::File;
use std::io::copy;
use libflate::gzip::Encoder;

fn main() {

    let mut out_file = File::create("mingw-w64-install.exe.gz").unwrap();
    let mut encoder = Encoder::new(&mut out_file).unwrap();
    let mut in_file = File::open("mingw-w64-install.exe").unwrap();
    copy(&mut in_file, &mut encoder).unwrap();
    encoder.finish().into_result().unwrap();
}
Running "cargo run":
    Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target\debug\compression_test.exe`
thread 'main' panicked at 'symbol:17, table:17', C:\Users\root\.cargo\registry\src\github.com-1ecc6299db9ec823\libflate-0.1.0\src\huffman.rs:188
stack backtrace:
   0:           0x479c64 - <unknown>
   1:           0x476529 - <unknown>
...
  17:           0x4014e7 - <unknown>
  18:     0x7ffa91a08363 - <unknown>
error: process didn't exit successfully: `target\debug\compression_test.exe` (exit code: 101)

"cargo run" completed with code 101
It took approximately 0.901 seconds

Please ignore the <unknown> stack frames, this is known issue with *-pc-windows-gnu target.
The file I'm testing with is mingw-w64-install.exe but happened on basically any binary file I tried.

assertion failed

When using libflate to ungzip gz files from s3, I met this failure.
Not sure how to fix it.

thread 'main' panicked at 'assertion failed: `(left == right)` (left: `1541`, right: `16`)', /home/wooya/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.3/src/huffman.rs:97

LZ77 compressor as a separate crate?

I am working on a crate which implements a compressor and decompressor for a particular encoding of LZ77 used in several video games for asset compression. The LZ77 compressor in this crate is perfect for my needs, but I can't import it in isolation from the rest of the DEFLATE and gzip implementation, so the crate has to pull in dependencies it doesn't really need.

Would you consider making the LZ77 encoder a separate crate to make this sort of use case easier?

panic at ..(self.table[i], MAX_BITWIDTH as u16 + 1) on valid file

The releases for the "joe" / "jupp" text editor, from mirbsd, cause the decompressor to panic (in debug mode) or error (in release mode).

This has happened on multiple releases, so I'm guessing it's a BSD gzip implementation quirk.

Neither GNU gzip or libarchive (bsdtar) gives the slightest hint that something might be wrong.

Download link:
http://deb.debian.org/debian/pool/main/j/jupp/jupp_3.1.30.orig.tar.gz
https://www.mirbsd.org/MirOS/dist/jupp/joe-3.1jupp30.tgz

// cat src/main.rs #  example from the README.md
extern crate libflate;

use std::io;
use libflate::gzip::Decoder;

fn main() {
    let mut input = io::stdin();
    let mut decoder = Decoder::new(&mut input).unwrap();
    io::copy(&mut decoder, &mut io::stdout()).unwrap();
}
% cargo run --release <joe-3.1jupp30.tgz
    Finished release [optimized] target(s) in 0.0 secs
     Running `target/release/flate-test`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Custom(Custom { 
kind: InvalidData, error: StringError("Invalid huffman coded stream")
 }) }', /checkout/src/libcore/result.rs:860
note: Run with `RUST_BACKTRACE=1` for a backtrace.
% RUST_BACKTRACE=1 cargo run <joe-3.1jupp30.tgz
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/flate-test`
thread 'main' panicked at 'assertion failed: `(left == right)` (left: `1`, right: `16`)', 
/home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/huffman.rs:98
stack backtrace:
...
   6: std::panicking::begin_panic_fmt
             at /checkout/src/libstd/panicking.rs:495
   7: <libflate::huffman::DecoderBuilder as libflate::huffman::Builder>::set_mapping
             at /home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/huffman.rs:98
   8: libflate::huffman::Builder::restore_canonical_huffman_codes
             at /home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/huffman.rs:54
   9: libflate::huffman::DecoderBuilder::from_bitwidthes
             at /home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/huffman.rs:80
  10: <libflate::deflate::symbol::DynamicHuffmanCodec as libflate::deflate::symbol::HuffmanCodec>::load
             at /home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/deflate/symbol.rs:330
  11: <libflate::deflate::decode::Decoder<R>>::read_compressed_block
             at /home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/deflate/decode.rs:92
  12: <libflate::deflate::decode::Decoder<R> as std::io::Read>::read
             at /home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/deflate/decode.rs:165
  13: <libflate::gzip::Decoder<R> as std::io::Read>::read
             at /home/faux/.cargo/registry/src/github.com-1ecc6299db9ec823/libflate-0.1.7/src/gzip.rs:859
  14: std::io::util::copy
             at /checkout/src/libstd/io/util.rs:53
  15: flate_test::main
             at src/main.rs:9
...

UnexpectedEof encountered when decompressing data

The following zlib-compressed data fails to decompress in libflate 1.0.2 with an UnexpectedEof error:
data.zip
(The compressed file is data.bin inside the zip file)

rustc 1.47.0 (18bf6b4f0 2020-10-07)

It successfully decompresses using flate2-rs with both the miniz-oxide and zlib backends.

Non-blocking IO

I've using libflate in reqwest for a little while now, and quite pleased with, thanks!

I'm now updating reqwest to use non-blocking IO, find that the reader decoders are not able to handle it. This is due to several read_exact and read_u32 etc calls that require read_exact. It's not really a recoverable operation, since there's no defined way to determine how many bytes were actually available.

A solution would be to create the Decoder immediately, and keep an internal buffer, and an state enum. That way if in a state of needing to 'read exact', but that many bytes aren't available yet, the internal buffer holds the partial bytes, and the state keeps a position, and you can try again when a user calls encoder.read() again.

zlib::Encoder flush() behavior

I have a use case that requires the ability to do incremental writes to an encoder. Instead of finish(), I need to instead be able to call flush() (multiple times) and inspect the inner buffer.

See this comparison with the flate2 crate which demonstrates the desired behvaior:

extern crate libflate;
extern crate flate2;

use flate2::{Compression, write::ZlibEncoder};
use libflate::zlib::Encoder;
use std::io::Write;

fn main() {
    let writes = [
        "fooooooooooooooooo",
        "bar",
        "baz",
    ];

    // libflate:
    let mut encoder = Encoder::new(Vec::new()).unwrap();
    for string in writes.iter() {
        encoder.write(string.as_bytes()).expect("Write failed");
    }
    encoder.flush().expect("Flush failed");
    println!("{:?}", encoder.as_inner_ref());

    // flate2:
    let mut encoder = ZlibEncoder::new(Vec::new(), Compression::default());
    for _ in 0..2 {
        for string in writes.iter() {
            encoder.write(string.as_bytes()).expect("Write failed");
        }
        encoder.flush().expect("Flush failed");
    }
    println!("{:?}", encoder.get_ref());
}

Which outputs (take note of the 0, 0, 255, 255 sync flush sequences)

[120, 156]
[120, 1, 74, 203, 71, 7, 73, 137, 69, 73, 137, 85, 0, 0, 0, 0, 255, 255, 74, 195, 33, 14, 0, 0, 0, 255, 255]

Does libflate support this? Perhaps I'm not using the API correctly?

Not decompressing entire stream

Something about the gzip encoder used to create the CommonCrawl archives doesn't play well with libflate. It only seems to decompress the first few hundred bytes.

Example file.

If I gunzip it then gzip it again, libflate is able to decompress the entire file correctly... so, it's interesting.

Async gzip decoder panicking at assertion in bit.rs

Hi there!

I'm working on using the new WouldBlock friendly gzip decoder in seanmonstar/reqwest#165 and am running into a problem for some kinds of chunked responses.

I'm sure this is something I'm not doing correctly, so am just wondering what the below assertion that's failing is all about so I get a better idea of what I might be doing that's breaking it:

panicked at 'assertion failed: 32 - self.offset >= bitwidth', /$USER/.cargo/registry/src/libflate-0.1.9/src/bit.rs:130:8

I've found debugging this particular part of libflate a bit tricky because a lot of the functions are #[inline(always)] which confuses my poor debugger.

stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:380
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:390
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:611
   5: std::panicking::begin_panic_new
             at /checkout/src/libstd/panicking.rs:553
   6: libflate::non_blocking::deflate::decode::BlockDecoder::decode_symbol::{{closure}}
             at ./<panic macros>:3
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/bit.rs:109
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/deflate/symbol.rs:242
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/deflate/symbol.rs:211
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:256
   7: <libflate::non_blocking::transaction::TransactionalBitReader<R>>::transaction
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/transaction.rs:23
   8: libflate::non_blocking::deflate::decode::BlockDecoder::decode_symbol
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:255
   9: libflate::non_blocking::deflate::decode::BlockDecoder::decode
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:203
  10: <libflate::non_blocking::deflate::decode::Decoder<R> as std::io::Read>::read
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:149
  11: <libflate::non_blocking::gzip::Decoder<R> as std::io::Read>::read
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/gzip.rs:118

FWIW, the full backtrace:

panicked at 'assertion failed: 32 - self.offset >= bitwidth', /$USER/.cargo/registry/src/libflate-0.1.9/src/bit.rs:130:8
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:380
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:390
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:611
   5: std::panicking::begin_panic_new
             at /checkout/src/libstd/panicking.rs:553
   6: libflate::non_blocking::deflate::decode::BlockDecoder::decode_symbol::{{closure}}
             at ./<panic macros>:3
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/bit.rs:109
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/deflate/symbol.rs:242
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/deflate/symbol.rs:211
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:256
   7: <libflate::non_blocking::transaction::TransactionalBitReader<R>>::transaction
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/transaction.rs:23
   8: libflate::non_blocking::deflate::decode::BlockDecoder::decode_symbol
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:255
   9: libflate::non_blocking::deflate::decode::BlockDecoder::decode
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:203
  10: <libflate::non_blocking::deflate::decode::Decoder<R> as std::io::Read>::read
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/deflate/decode.rs:149
  11: <libflate::non_blocking::gzip::Decoder<R> as std::io::Read>::read
             at /$USER/.cargo/registry/src/libflate-0.1.9/src/non_blocking/gzip.rs:118
  12: <reqwest::async_impl::decoder::Gzip as futures::stream::Stream>::poll
             at src/async_impl/decoder.rs:178
  13: <reqwest::async_impl::decoder::Decoder as futures::stream::Stream>::poll
             at src/async_impl/decoder.rs:114
  14: <reqwest::async_impl::decoder::Decoder as futures::stream::Stream>::poll
             at src/async_impl/decoder.rs:118
  15: <futures::stream::concat::ConcatSafe<S> as futures::future::Future>::poll
             at /$USER/.cargo/registry/src/futures-0.1.14/src/stream/concat.rs:133
  16: <futures::stream::concat::Concat2<S> as futures::future::Future>::poll
             at /$USER/.cargo/registry/src/futures-0.1.14/src/stream/concat.rs:46
  17: <futures::future::chain::Chain<A, B, C>>::poll
             at /$USER/.cargo/registry/src/futures-0.1.14/src/future/chain.rs:32
  18: <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll
             at /$USER/.cargo/registry/src/futures-0.1.14/src/future/and_then.rs:32
  19: <futures::future::chain::Chain<A, B, C>>::poll
             at /$USER/.cargo/registry/src/futures-0.1.14/src/future/chain.rs:26
  20: <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll
             at /$USER/.cargo/registry/src/futures-0.1.14/src/future/and_then.rs:32
  21: <futures::task_impl::Spawn<F>>::poll_future_notify::{{closure}}
             at /$USER/.cargo/registry/src/futures-0.1.14/src/task_impl/mod.rs:291
  22: <futures::task_impl::Spawn<T>>::enter::{{closure}}
             at /$USER/.cargo/registry/src/futures-0.1.14/src/task_impl/mod.rs:352
  23: futures::task_impl::std::set
             at /$USER/.cargo/registry/src/futures-0.1.14/src/task_impl/std/mod.rs:90
  24: <futures::task_impl::Spawn<T>>::enter
             at /$USER/.cargo/registry/src/futures-0.1.14/src/task_impl/mod.rs:352
  25: <futures::task_impl::Spawn<F>>::poll_future_notify
             at /$USER/.cargo/registry/src/futures-0.1.14/src/task_impl/mod.rs:291
  26: tokio_core::reactor::Core::run::{{closure}}
             at /$USER/.cargo/registry/src/tokio-core-0.1.8/src/reactor/mod.rs:235
  27: <scoped_tls::ScopedKey<T>>::set
             at /$USER/.cargo/registry/src/scoped-tls-0.1.0/src/lib.rs:135
  28: tokio_core::reactor::Core::run
             at /$USER/.cargo/registry/src/tokio-core-0.1.8/src/reactor/mod.rs:234
  29: async::async_test_gzip_response
             at tests/async.rs:71
  30: <F as test::FnBox<T>>::call_box
             at /checkout/src/libtest/lib.rs:1477
             at /checkout/src/libcore/ops/function.rs:143
             at /checkout/src/libtest/lib.rs:138
  31: __rust_maybe_catch_panic
             at /checkout/src/libpanic_unwind/lib.rs:98

Compression level

Hi all!
Would appreciate it if someone will point me to an example showing how to set a certain compression level when I compress some data using libflate.
Thank you in advance!

GzDecoder seem decode incorrect

let tar_file = File::open(&tar_file_path)?;
        let input = GzDecoder::new(&tar_file)?;
        let mut archive = Archive::new(input);

        archive.set_unpack_xattrs(true);
        archive.set_overwrite(true);
        archive.set_preserve_permissions(true);
        archive.set_preserve_mtime(true);

        let files = archive.entries()?;

        for entry in files {
            let mut file = entry?;

            let file_path = file.path()?;

            if let Some(file_name) = file_path.file_name() {
                if file_name.to_str().unwrap() == extract_file_name {
                    binary_found = true;
                    file.unpack(&output_file_path)?;
                    break;
                }
            }
        }

test file: https://github.com/axetroy/prune.rs/releases/download/v0.1.1/prune_darwin_amd64.tar.gz

The origin file size is : 985,384
The unzip file size is : 965,416

I have tested Tar, he works fine

Streaming data extensions

I'm looking to use libflate for a long running stream of data having periodic small messages (<1k), upto 30 seconds apart.

I'm still in the early stages of investigation, but I think what I need for my use case is:

  1. a method for flushing the current block on Encoder, and (when needed), appending a zero length block to get bitwriter to the next byte boundary
  2. a new huffman type to dynamically choose between fixed and static table, depending on which would give better block size

Is that reasonable? Am I misreading anything / have I missed existing functionality?

Would there be interest in a PR for this?

LZ77 decoder?

Is there interest in implementing/exposing a raw LZ77 decoder to go along with the encoder? This would be great for a project I'm writing, since it also needs to handle zlib streams, and pulling in libflate alone simplifies things greatly, as opposed to depending on a separate crate for handling LZ77. (Though at the moment the separate crate I'm using is actually an old sample living in a git repo only, which is why it's somewhat of an issue.)

Use-after-free on panic in client code

If the code that uses libflate panics, it may trigger a use-after-free in libflate code. Since use-after-free usually poses an arbitrary code execution vulnerability, I will relay further details privately to the maintainer.

Code compiled with panic=abort is not affected. This can be used as a mitigation in the interim.

libflate::gzip::Decoder::read does not handle zero-length buffer properly

According to the documentation for the Read traitโ€™s read function:

If n is 0, then it can indicate one of two scenarios:

  1. This reader has reached its โ€œend of fileโ€โ€ฆ
  2. The buffer specified was 0 bytes in length.

This suggests that it is legal to pass a zero-length buffer to read. However, if a zero-length buffer is passed to libflate::gzip::Decoder::read, then it will pass the buffer to the underlying self.reader, which will of course return zero, setting read_size to zero; then, because read_size is zero, it will set self.eos to true and look for a trailerโ€”something it definitely shouldnโ€™t do in the middle of an input stream.

Error while decoding just encoded data

While testing my code, I encountered an issue where encoded data could not be decoded.

The following example panics with the message "called Result::unwrap() on an Err value: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }" at the last unwarp.

 let mut encoded = Vec::new();
 libflate::gzip::Encoder::new(&mut encoded)
    .unwrap()
    .write_all(b"Hello World")
    .unwrap();
assert!(encoded.len() > 0);

let mut decoded = Vec::default();
libflate::gzip::Decoder::new(encoded.as_slice())
    .unwrap()
    .read_to_end(&mut decoded)
    .unwrap();
        
assert_eq!(decoded.as_slice(), b"Hello World")

libflate version 1.3.0
rust version 1.68.0

UB in the public safe API - where to report?

I was able to trigger undefined behavior (according to MIRI) using the public safe API of libflate (v1.1.1). Is there a private channel where this can be communicated and discussed, or it is ok to report it here?

Empty distance code table may cause problems on Windows 10

In my zip-rs crate I use your library to create compressed zip files. Recently an issue (zip-rs/zip-old#99) was submitted which led me to believe Windows (at least Windows 10) does not like it when the distance codes table is empty. With an empty distance code table, I mean that 'hdist' is 0(+1) and that the single distance code has length 0.

This occurs when compressing the following text (without any newlines):

Windows will show an error when trying to extract this file.

The resulting data is

00000000: 0540 d109 8040 085d e54d d03a 7d1f 6529  .@...@.].M.:}.e)
00000010: 8882 3eb0 b63f 4e8b 3ba7 31e6 8ed6 1cac  ..>..?N.;.1.....
00000020: 8054 6561 5402 acdf e205 13f2 b1d6 4550  .TeaT.........EP
00000030: adf1 98cb b101                           ......

I have analysed this code, and manually set hdist to 7(+1) added 8 distance codes all with length 3. The resulting stream is:

00000000: 0547 d109 8040 085d e54d d03a 7d1f 6529  .G...@.].M.:}.e)
00000010: 8882 3eb0 b6bf f7de 7bef bdd3 e2ce 698c  ..>.....{.....i.
00000020: b9a3 3507 2b20 5559 1895 00eb b778 c184  ..5.+ UY.....x..
00000030: 7cac 7511 546b 3ce6 726c 00              |.u.Tk<.rl.

When putting this stream in a zip file, Windows can extract it just fine.

After reading the RFC, it seems that having no distance codes is allowed. However, I would like to be able to force having at least two distance codes, as many other utilities seem to do this by default.

MultiGzDecoder with non-gzip bytes at end

I have a gzip json file that I did not create that I am using flate2 and serde_json to parse and transform. When I run my code over the unzipped file, everything is file. When I run it on the gzipped file, it throws an unexpected end of file error. I am trying to figure out what is going on.

My working assumption is that the file, which is a multi-member gzip file, has some extra garbage after the end of the last member; and indeed, when I use gzcat to uncompress it, it does say "trailing garbage ignored". The section about multi-member files in the introduction says "If a file contains contains non-gzip data after the gzip data, MultiGzDecoder will emit an error after decoding the gzip data. This behavior matches the gzip, gunzip, and zcat command line tools."

I would like some way of decoding such a file without an error being returned, and just having any trailing garbage be ignored. What would be the simplest way of doing this?

Decoder::read_non_compressed_block() is unsound

The following code is unsound:

let old_len = self.buffer.len();
self.buffer.reserve(len as usize);
unsafe { self.buffer.set_len(old_len + len as usize) };
self.bit_reader
.as_inner_mut()
.read_exact(&mut self.buffer[old_len..])?;

The slice passed to read_exact() is uninitialized. This uses a Read implementation supplied by the API user, and there is no guarantee that it will never read from the provided buffer. If it does, it may cause a memory disclosure vulnerability.

Similar bug in Rust MP4 parser for reference: mozilla/mp4parse-rust#172

The equivalent code in stdlib initializes the vector with zeroes before growing it: https://doc.rust-lang.org/src/std/io/mod.rs.html#355-391

There have been some language proposals to create a contract for never reading from the buffer in this case, but they have not been stabilized: rust-lang/rust#42788

For now replacing unsafe { self.buffer.set_len(old_len + len as usize) }; with self.buffer.resize(old_len + len as usize, 0); should fix it.

I have not read all of the unsafe code in libflate, there may be similar issues in other unsafe blocks, which is why I'm opening an issue instead of a PR right away.

panicked at 'attempt to shift right with overflow', bit.rs:98

Got the panic while just trying decode previously encoded data. Managed to reproduce with some randomly generated bytes.

extern crate libflate;

use std::io::{self, Read, Write};

use libflate::zlib::{Encoder, Decoder};

fn main() {
    test(&[163, 181, 167, 40, 62, 239, 41, 125, 189, 217, 61, 122, 20, 136, 160, 178, 119, 217, 41, 125, 189, 97, 195, 101, 47, 170]);
    test(&[162, 58, 99, 211, 7, 64, 96, 36, 57, 155, 53, 166, 76, 14, 238, 66, 148, 154, 124, 162, 58, 99, 188, 138, 131, 171, 189, 54, 229, 192, 38, 29, 240, 122, 28]);
    test(&[239, 238, 212, 42, 5, 46, 186, 67, 122, 247, 30, 61, 219, 62, 228, 202, 164, 205, 139, 109, 99, 181, 99, 181, 99, 122, 30, 12, 62, 46, 27, 145, 241, 183, 137]);
    test(&[88, 202, 64, 12, 125, 108, 153, 49, 164, 250, 71, 19, 4, 108, 111, 108, 237, 205, 208, 77, 217, 100, 118, 49, 10, 64, 12, 125, 51, 202, 69, 67, 181, 146, 86]);
}

fn test(data: &[u8]) {
    // Encoding
    let mut encoder = Encoder::new(Vec::new()).unwrap();
    encoder.write_all(data).unwrap();
    let encoded_data = encoder.finish().into_result().unwrap();

    // Decoding
    let mut decoder = Decoder::new(io::Cursor::new(encoded_data)).unwrap();
    let mut decoded_data = Vec::new();
    decoder.read_to_end(&mut decoded_data).unwrap();

    assert_eq!(decoded_data, data);
}

Buffer overflow when encoding (detected by address sanitizer)

Repro steps:

[dependencies]
libflate = "0.1.19"
use libflate::lz77::*;
fn main() {
    let mut enc = DefaultLz77Encoder::new();
    let mut sink = Vec::new();
    enc.encode(b"aaaaa", &mut sink);
    enc.flush(&mut sink);
}
$ cargo +nightly rustc -- -Zsanitizer=address
$ target/debug/a
Expected to finish without any messages. However, got an ASAN error instead.
=================================================================
==45039==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000155 at pc 0x00010334529b bp 0x7ffeec8ea850 sp 0x7ffeec8ea848
READ of size 1 at 0x602000000155 thread T0
    #0 0x10334529a in libflate::lz77::default::prefix::hbc520ed13fc89540 default.rs:114
    #1 0x10333a510 in _$LT$libflate..lz77..default..DefaultLz77Encoder$u20$as$u20$libflate..lz77..Lz77Encode$GT$::flush::hccf4bdcba320d36a default.rs:89
    #2 0x103342b34 in a::main::h655e7f278e236686 main.rs:6
    #3 0x10332498d in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hba64913d4b77ef20 rt.rs:64
    #4 0x103360c47 in std::panicking::try::do_call::h6e8c55b5404ca92b panicking.rs:297
    #5 0x10336322e in __rust_maybe_catch_panic lib.rs:92
    #6 0x10336160d in std::rt::lang_start_internal::h616fb8704166427f rt.rs:48
    #7 0x1033248fe in std::rt::lang_start::h75c0523eec08eff1 rt.rs:64
    #8 0x103342d11 in main (a:x86_64+0x10002ed11)
    #9 0x7fff6363eed8 in start (libdyld.dylib:x86_64+0x16ed8)

0x602000000155 is located 0 bytes to the right of 5-byte region [0x602000000150,0x602000000155)
allocated by thread T0 here:
    #0 0x1034384c3 in wrap_malloc (lib__rustc__clang_rt.asan_osx_dynamic.dylib:x86_64+0x594c3)
    #1 0x103350076 in alloc::alloc::alloc::h53fc6441db79abe3 alloc.rs:72
    #2 0x10334ff60 in _$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Alloc$GT$::alloc::hd3d2d4d4a708d0da alloc.rs:148
    #3 0x10334ec7b in _$LT$alloc..raw_vec..RawVec$LT$T$C$$u20$A$GT$$GT$::reserve_internal::h53e5cbdb4cc68b7c raw_vec.rs:668
    #4 0x10334fc58 in _$LT$alloc..raw_vec..RawVec$LT$T$C$$u20$A$GT$$GT$::reserve::h9bfdae6bf8258c89 raw_vec.rs:491
    #5 0x10334cf8f in _$LT$alloc..vec..Vec$LT$T$GT$$GT$::reserve::h6fc500c442273c7e vec.rs:457
    #6 0x10334c930 in _$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$alloc..vec..SpecExtend$LT$$RF$$u27$a$u20$T$C$$u20$core..slice..Iter$LT$$u27$a$C$$u20$T$GT$$GT$$GT$::spec_extend::h022b22ee3d0d61a3 vec.rs:1906
    #7 0x10334cc84 in _$LT$alloc..vec..Vec$LT$T$GT$$GT$::extend_from_slice::h1dfb6096c32a99e1 vec.rs:1351
    #8 0x10333b177 in _$LT$libflate..lz77..default..DefaultLz77Encoder$u20$as$u20$libflate..lz77..Lz77Encode$GT$::encode::hba9253edc2c258af default.rs:65
    #9 0x103342b17 in a::main::h655e7f278e236686 main.rs:5
    #10 0x10332498d in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hba64913d4b77ef20 rt.rs:64
    #11 0x103360c47 in std::panicking::try::do_call::h6e8c55b5404ca92b panicking.rs:297
    #12 0x10336322e in __rust_maybe_catch_panic lib.rs:92
    #13 0x10336160d in std::rt::lang_start_internal::h616fb8704166427f rt.rs:48
    #14 0x1033248fe in std::rt::lang_start::h75c0523eec08eff1 rt.rs:64
    #15 0x103342d11 in main (a:x86_64+0x10002ed11)
    #16 0x7fff6363eed8 in start (libdyld.dylib:x86_64+0x16ed8)

SUMMARY: AddressSanitizer: heap-buffer-overflow default.rs:114 in libflate::lz77::default::prefix::hbc520ed13fc89540
Shadow bytes around the buggy address:
  0x1c03ffffffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c0400000000: fa fa fd fd fa fa fd fd fa fa 00 00 fa fa 00 07
  0x1c0400000010: fa fa 00 04 fa fa 00 00 fa fa 00 06 fa fa fd fa
=>0x1c0400000020: fa fa 05 fa fa fa 00 00 fa fa[05]fa fa fa fd fd
  0x1c0400000030: fa fa fd fd fa fa fd fa fa fa 00 04 fa fa fa fa
  0x1c0400000040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==45039==ABORTING
Abort trap: 6

The relevant function contains unchecked read, which is totally unsafe (there is no guarantee that buf.len() >= 3; in this issue the buf.len() == 2).

#[inline]
fn prefix(buf: &[u8]) -> [u8; 3] {
unsafe {
[
*buf.get_unchecked(0),
*buf.get_unchecked(1),
*buf.get_unchecked(2),
]
}
}


A longer, equivalent version:

use libflate::gzip::Encoder;
use std::error::Error;
use std::io::Write;

fn main() -> Result<(), Box<Error>> {
    let mut encoder = Encoder::new(Vec::new())?;
    encoder.write(b"aaaaa")?;
    encoder.finish();
    Ok(())
}

Encoder panic on specific input data...

Greetings, and thanks for all your hard work on this very useful library!

When I run this code, I see:

Test data size is 16052 bytes
compressed 16031 bytes into 2707
thread 'main' panicked at 'symbol:15, table:12', /rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/macros.rs:16:9
stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: std::io::Write::write_fmt
   3: std::panicking::default_hook::{{closure}}
   4: std::panicking::default_hook
   5: std::panicking::rust_panic_with_hook
   6: rust_begin_unwind
   7: std::panicking::begin_panic_fmt
   8: <libflate::deflate::symbol::DynamicHuffmanCodec as libflate::deflate::symbol::HuffmanCodec>::save::{{closure}}
   9: core::iter::traits::iterator::Iterator::position::check::{{closure}}
  10: core::iter::traits::double_ended::DoubleEndedIterator::try_rfold
  11: <core::iter::adapters::Rev<I> as core::iter::traits::iterator::Iterator>::try_fold
  12: core::iter::traits::iterator::Iterator::position
  13: <libflate::deflate::symbol::DynamicHuffmanCodec as libflate::deflate::symbol::HuffmanCodec>::save
  14: libflate::deflate::encode::CompressBuf<H,E>::flush
  15: libflate::deflate::encode::BlockBuf<E>::flush
  16: libflate::deflate::encode::Block<E>::finish
  17: libflate::deflate::encode::Encoder<W,E>::finish
  18: bug_repro_case::main
  19: std::rt::lang_start::{{closure}}
  20: std::rt::lang_start_internal
  21: std::rt::lang_start
  22: main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I have reproduced this on macOSX 10.15 and Ubuntu Linux. My compiler is rustc 1.44.1 (c7087fe00 2020-06-17)

Here's the logic from the linked repo (with the test data removed for brevity):

use std::io::Write;

use libflate::deflate::Encoder;

fn main() -> std::io::Result<()> {
    let mut data: Vec<u8> = get_test_data();

    eprintln!("Test data size is {} bytes", data.len());

    let limit1 = 16_031usize;
    let limit2 = limit1 + 1;

    // Attempt 1 (should succeed)
    //
    let mut encoder = Encoder::new(Vec::new());
    encoder.write_all(&data[0..limit1])?;
    let compressed: Vec<u8> = encoder.finish().into_result()?;

    eprintln!("compressed {} bytes into {}", limit1, compressed.len());

    // Attempt 2 (will fail)
    //
    let mut encoder = Encoder::new(Vec::new());
    encoder.write_all(&data[0..limit2])?;
    let compressed: Vec<u8> = encoder.finish().into_result()?;

    eprintln!("compressed {} bytes into {}", limit2, compressed.len());

    Ok(())
}

fn get_test_data() -> Vec<u8> {
...
}

Thanks again for taking a look!

zLib decoding fails with failed to fill whole buffer

I'm trying to decode a zLib Vec<u8> which works fine in C# & TypeScript but I'm trying it in Rust and it instantly fails with that error message.
This is the code I use:

let mut buffer = Vec::new();
buffer = binary_reader.read_bytes_at(file_record.size as usize, file_record.offset as usize).unwrap(); // works fine
println!("Buffer: {:?}", buffer); // Works! Prints: Buffer: [120, 218, 251, 255, 207, 144, 193, 138, 193, 151, 161, 146, 33, 143, 33, 149, 161, 156, 161, 24, 72, 38, 51, 148, 48, 100, 50, 228, 3, 69, 120, 25, 184, 24]
if file_record.is_compressed {
    let mut decoder = Decoder::new(&buffer[..]).unwrap();
    match decoder.read_to_end(&mut buffer.clone()) {
        Ok(_) => {
            println!("{:?}", buffer);
        }
        Err(e) => {
            println!("Unexpected end of file\n[{}]",e)
        }
    }
}

Buffer: [120, 218, 251, 255, 207, 144, 193, 138, 193, 151, 161, 146, 33, 143, 33, 149, 161, 156, 161, 24, 72, 38, 51, 148, 48, 100, 50, 228, 3, 69, 120, 25, 184, 24]

Panics on debug_assert failures

I am able to trigger 3 debug_assert failures by passing inputs to this library. I found them using honggfuzz and through the usvg library.

I opened issues over there but they are caused by this library:

The inputs I gave to that library are given, along with the backtraces. I figure there should be some checks and handling added somewhere.

gzip decode panics on wasm-unknown-unknown

I'm using gzip to minimize the size of files embedded in the WASM. However, to decode the gzip compressed file the std::time module is used here:

https://github.com/sile/libflate/blob/master/src/gzip.rs#L148

This is not implemented on wasm-unknown-unknown and results in the following panic:

panicked at 'Time system call is not implemented by WebAssembly host', src/libstd/sys/wasm/mod.rs:303:13

The time values are overwritten shortly after, so the usage of the std::time module isn't even necessary here.

Decoder may expose contents of uninitialized memory in the output

libflate might expose contents of uninitialized memory in the output when given a crafted input. This may be a devastating vulnerability in some contexts, e.g. if used as deflate backend for a PNG decoder. Details and impact analysis for similar bugs in PNG decoders in C can be found here.

I am confident that a private function is vulnerable, but I am not sure if this vulnerability can be exploiter by supplying a malformed input; there could be some non-local checks that prevent it.

I shall relay further details on the issue to the maintainer privately by email.

Convenience functions?

Hi there,

I was thinking about opening up a pull request with convenience functions, and was wondering if the maintainers of this crate would be open to merging such a PR.

Specifically, I'm thinking of creating a bunch of functions (one per format mod) with signatures similar to this:

pub fn decode(input: &[u8]) -> Result<Vec<u8>> {
    let mut output = Vec::with_capacity(input.len());
    let mut decoder = Decoder::new(input)?;
    decoder.read_to_end(&mut output)?;
    Ok(output)
}

pub fn encode(input: &[u8]) -> Result(Vec<u8>) {
    let output = Vec::with_capacity(input.len());
    let mut encoder = Encoder::new(&mut output)?;
    encoder.write(input)?;
    let output = encoder.finish().into_result();
    output.shrink_to_fit()
    Ok(output)
}

This code is just back-of-the-napkin sketches of what these convenience functions would look like.

I find that these types of convenience functions are very nice when I run into them in other encoding/decoding crates and was thinking they would make a nice addition here too.

If you give me the go-ahead, I'll work up a proper PR for review.

Possible concurrency issue during deflate on libflate 0.1.25

I'm using zip-rs with rayon to zip many files concurrently (I think 16x? Whatever rayon ends up using on this machine) This works fine with compression level store, and with bzip2. (Bzip2 is also way, way faster, 150+MB/s instead of 10MB/s, but that is.... another issue.)

All of the actual zipping is happening inside each loop iteration that is parallelized: open the source and destination files, io::copy, close the file when done.

When not using par_iter it completes successfully (but takes 14 hours to do so on a small test dataset of 283GB.)

With deflate (which uses this library,) I run into the below stack trace eventually, on win64.

It takes about an hour or so to run into the issue and doesn't seem to be on the same file every time, so reducing this test case seems difficult.

I've also gotten an illegal instruction once or twice which makes me think there is undefined behavior somewhere in this lib.

Thread '<unnamed>' panicked at 'index out of bounds: the len is 15 but the index is 15', /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\src\libcore\slice\mod.rs:2695:10
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
thread '<unnamed>' panicked at 'index out of bounds: the len is 15 but the index is 15', /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\src\libcore\slice\mod.rs:2695:10
stack backtrace:
   0:     0x7ff62b6a7aad - std::sys::windows::backtrace::set_frames
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\sys\windows\backtrace\mod.rs:95
   1:     0x7ff62b6a7aad - std::sys::windows::backtrace::unwind_backtrace
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\sys\windows\backtrace\mod.rs:82
   2:     0x7ff62b6a7aad - std::sys_common::backtrace::_print
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\sys_common\backtrace.rs:71
   3:     0x7ff62b6aacad - std::sys_common::backtrace::print
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\sys_common\backtrace.rs:59
   4:     0x7ff62b6aacad - std::panicking::default_hook::{{closure}}
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:197
   5:     0x7ff62b6aa9aa - std::panicking::default_hook
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:211
   6:     0x7ff62b6ab53f - std::panicking::rust_panic_with_hook
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:474
   7:     0x7ff62b6ab073 - std::panicking::continue_panic_fmt
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:381
   8:     0x7ff62b6aaf58 - std::panicking::rust_begin_panic
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:308
   9:     0x7ff62b6b8b1b - core::panicking::panic_fmt
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libcore\panicking.rs:85
  10:     0x7ff62b6b8ad9 - core::panicking::panic_bounds_check
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libcore\panicking.rs:61
  11:     0x7ff62b54505c - <core::iter::adapters::Rev<I> as core::iter::traits::iterator::Iterator>::try_fold::h4df698d468c8a2ab
  12:     0x7ff62b537e30 - <libflate::deflate::symbol::DynamicHuffmanCodec as libflate::deflate::symbol::HuffmanCodec>::save::h0b55c10f5385fd10
  13:     0x7ff62b54023a - libflate::deflate::encode::Encoder<W>::new::h14c0d67a7d5ae201
  14:     0x7ff62b540664 - libflate::deflate::encode::Encoder<W,E>::finish::h5e76f2387f1275f7
  15:     0x7ff62b53ca1e - zip::write::ZipWriter<W>::finalize::h7a22fb7f667ff968
  16:     0x7ff62b53bbc4 - zip::write::ZipWriter<W>::start_file::hae527b1388e644b5
  17:     0x7ff62b53be81 - zip::write::ZipWriter<W>::finalize::h7a22fb7f667ff968
  18:     0x7ff62b53da4e - <zip::write::ZipWriter<W> as core::ops::drop::Drop>::drop::h457d134e7a1b2002
  19:     0x7ff62b52c73f - core::ptr::real_drop_in_place::h8b27e2444c24fda3
  20:     0x7ff62b52ec4a - mame_coalesce::rom::zip::write_zip::hc9fe388a39a2a3b3
  21:     0x7fff553e102f - <unknown>
  22:     0x7fff553e3595 - is_exception_typeof
  23:     0x7fff553ebb23 - _C_specific_handler
  24:     0x7fff553e2fec - is_exception_typeof
  25:     0x7fff553ebfe0 - _CxxFrameHandler3
  26:     0x7fff6aec47fe - _chkstk
  27:     0x7fff6ae2600b - RtlUnwindEx
  28:     0x7fff553ebe48 - _C_specific_handler
  29:     0x7fff553e2688 - is_exception_typeof
  30:     0x7fff553e29e2 - is_exception_typeof
  31:     0x7fff553e30f2 - is_exception_typeof
  32:     0x7fff553ebfe0 - _CxxFrameHandler3
  33:     0x7fff6aec477e - _chkstk
  34:     0x7fff6ae24bee - RtlWalkFrameChain
  35:     0x7fff6ae289e5 - RtlRaiseException
  36:     0x7fff67d99128 - RaiseException
  37:     0x7fff553e486c - CxxThrowException
  38:     0x7ff62b6b0777 - panic_unwind::imp::panic
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libpanic_unwind\seh.rs:281
  39:     0x7ff62b6b0777 - panic_unwind::__rust_start_panic
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libpanic_unwind\lib.rs:101
  40:     0x7ff62b6ab717 - std::panicking::rust_panic
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:523
  41:     0x7ff62b6ab5ed - std::panicking::rust_panic_with_hook
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:494
  42:     0x7ff62b6ab073 - std::panicking::continue_panic_fmt
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:381
  43:     0x7ff62b6aaf58 - std::panicking::rust_begin_panic
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libstd\panicking.rs:308
  44:     0x7ff62b6b8b1b - core::panicking::panic_fmt
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libcore\panicking.rs:85
  45:     0x7ff62b6b8ad9 - core::panicking::panic_bounds_check
                               at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c\/src\libcore\panicking.rs:61
  46:     0x7ff62b54505c - <core::iter::adapters::Rev<I> as core::iter::traits::iterator::Iterator>::try_fold::h4df698d468c8a2ab
  47:     0x7ff62b537e30 - <libflate::deflate::symbol::DynamicHuffmanCodec as libflate::deflate::symbol::HuffmanCodec>::save::h0b55c10f5385fd10
  48:     0x7ff62b54023a - libflate::deflate::encode::Encoder<W>::new::h14c0d67a7d5ae201
  49:     0x7ff62b53fadd - <libflate::deflate::encode::Encoder<W,E> as std::io::Write>::write::h11ec20597a5b3e80
  50:     0x7ff62b53ab33 - std::io::Write::write_all::h9a21696f30eac3c4
  51:     0x7ff62b538631 - std::io::util::copy::h7fe9620ed861a5de
  52:     0x7ff62b52e3a2 - mame_coalesce::rom::zip::write_zip::hc9fe388a39a2a3b3

Out of bounds read when decoding a malformed zlib file

libflate performs reads from uninitialized memory when decoding a zlib file when given certain malformed inputs. The accessed address is out of bounds for any buffer allocated by the code.

This may pose a security vulnerability; I am still investigating the actual impact of this bug. Examples of similar vulnerabilities in C code and discussion of the potential impact can be found here.

This issue has been discovered using afl.rs and Address Sanitizer. I shall relay further details on the issue to the maintainer privately by email.

Panic in `bug` when parsing malformed file

#[test]
fn test_bug() {
    let input = b"\x04\x04\x04\x05:\x1az*\xfc\x06\x01\x90\x01\x06\x01";
    let mut decoder = Decoder::new(&input[..]);
                                                                        
    let result = io::copy(&mut decoder, &mut io::sink());
}
thread 'deflate::decode::tests::test_bug' panicked at 'bug', src/huffman.rs:124:43

Some cooperation?

Hi.

So I've been thinking about making a post on the rust forums about collaborating some efforts to design a full rust replacement for the flate2 crate (that uses c libraries) as DEFLATE a very widely used compression format that's even used in the rust compiler. I thougt I would ask about some cooperation here first.

I've been working on my own pure rust deflate encoder for a long time now. My original goal was to make a companion to the inflate crate (libflate didn't exist back then.) This crate(libflate) seems to be much more complete, and generally faster than inflate though, and the code seem cleaner and more rust-like. (I have a tool for doing comparisons here: https://github.com/oyvindln/compression-testing). The encoder part of libflate is however quite bare-bones compared to flate2. As my deflate crate is more featured, compresses a lot more, and is generally faster at compressing than libflate at comparable compression levels (but not as good as flate2 speed-wise), I was wondering if you were interested in trying to leverage this work to improve the decoding part of libflate (not entirely sure what the best approach would be), rather than having several competing deflate crates.

I hope this doesn't come off too much as self-promotion.

impl Drop

The Encoders should have impl Drop that finishes the compressed stream upon drop(). BufWriter does something similar (flushes its buffer upon drop()).

My use case is Write trait objects. Consider the following trait object that writes compressed output to stdout...

fn make_writer_object() -> Box<Write> {
    let stdout = io::stdout();
    let buf_stdout = io::BufWriter::new(stdout);
    let gzip = gzip::Encoder::new(buf_stdout).unwrap();
    Box::new(gzip)
}

This doesn't work because the last bit of compressed data gets cut off.

Private function prefix() is unsound

The following function may perform out-of-bounds reads if used incorrectly:

fn prefix(buf: &[u8]) -> [u8; 3] {
unsafe {
[
*buf.get_unchecked(0),
*buf.get_unchecked(1),
*buf.get_unchecked(2),
]
}
}

There have already been two known cases of out-of-bounds reads due to misuse of this function: #16, #21.

In the current implementation it's the caller's responsibility to ensure no out-of-bounds reads occur. If left as-is, this function must be marked unsafe. A better option would be getting rid of unsafety entirely.

Compression level

Hi,

Is there any possibility to define compression level for gzip encoder?
I'm using next code for encoding and I found that comparing to Node.js implementation current lib has 2 times bigger resulting size.

    use libflate::gzip::{Encoder, Decoder};

    // .......

    let mut encoder = Encoder::new(Vec::new())?;
    io::copy(&mut v.as_bytes(), &mut encoder)?;
    let compressed_bytes = encoder.finish().into_result()?;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.