image-rs / deflate-rs Goto Github PK
View Code? Open in Web Editor NEWAn implementation of a DEFLATE encoder in rust
License: Apache License 2.0
An implementation of a DEFLATE encoder in rust
License: Apache License 2.0
flate2 is listed as a dev-dependency. Unfortunately, there's no way to have optional dev-dependencies. flate2 has a lot of dependencies of its own and it would be great to avoid having to download and build those dependencies during out Docker builds.
I understand that the flate2 dev-dependency is used for comparative benchmarks. It would be great if the benchmarks could be moved to a separate crate (e.g. deflate-bench, similar to crypto-bench) so that projects that don't run the benchmarks don't need to pull in flate2.
Add configuration options for tweaking the compression behaviour, such as window size, compression level, etc..
When using deflate on a mips64 / mips64el target, the deflate crate panics with the following assertion:
---- parse::llanfair stdout ----
thread 'parse::llanfair' panicked at 'The generated length codes were not valid!', /cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.16/src/length_encode.rs:393:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Hi, I am not quite sure if this project is only used for deflate
. If no, any further plan for inflate
method?
error: macro undefined: 'assert_ne!'
--> src/writer.rs:19:5
|
19 | assert_ne!(flush_mode, Flush::None);
| ^^^^^^^^^
|
= help: did you mean assert_eq!
?
error: aborting due to previous error
error: Could not compile deflate
.
thinks
I am a beginner of rust
The AFL fuzz I started in #37 found a crash (after 37 days and 2 cycles!) where the decompressed data does not match the input (with CompressionOptions::default()
). I'll look into in the next few days and also PR the fuzz binary. In the meantime here's the crash input.
id:000000,sig:06,src:000831,op:havoc,rep:64.zip
Miniz has a special fast compression function that's used when max_hash_checks is set to one. It seems to provide something in-between normal compression using hash_checks = 1 and rle mode.
I'm opening this issue because updating deflate from 0.8.4 to 0.8.5 in dezoomify-rs makes the tests fail. deflate is an indirect dependency of this project, used to generate and read PNG files.
The following image is a PNG created with png v0.16.6 and deflate v0.8.5 which cannot be opened with the same libraries :
The error returned is : CorruptFlateStream
.
Running the same code (generating and reading the PNG) with png v0.16.6 and deflate v0.8.4 works without errors.
Using flush on the writer currently ends the compression stream and writes a trailer (without resetting the encoder.) The writers in flate2 calls miniz/zlib with SYNC_FLUSH, which outputs the current pending data and adds an empty block at the end. This is probably the behaviour we should emulate.
We want to avoid the compressed stream expanding more than needed when encountering incompressible (high entropy) data. Ideally, stored blocks should be output if compressing a block fails to reduce the size, but we want to do this without having to keep an excessively large input buffer
Also happens when I run doc tests on windows.
Encoding huffman length values misses a trailing zero value if the last few length values before it results in outputting the code for repeating the previous length value. If this happens, decoding the data will result in garbage. As this can only happen with data that ends up producing no distance values as any subsequent zeroes are ignored when encoding, it's exceedingly rare for this to actually occur.
A fix is incoming.
This seems to significantly degrade compression ratio on many larger files.
This code panics when compiled in debug mode:
extern crate deflate;
use std::io::Write;
use deflate::CompressionOptions;
use deflate::write::GzEncoder;
fn main() {
let fp = Vec::new();
let mut fp = GzEncoder::new( fp, CompressionOptions::default() );
fp.write( &[0] ).unwrap();
fp.flush().unwrap();
fp.write( &[0] ).unwrap();
fp.write( &[0, 0] ).unwrap();
}
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `0`', /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/chained_hash_table.rs:141:9
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::print
at libstd/sys_common/backtrace.rs:71
at libstd/sys_common/backtrace.rs:59
2: std::panicking::default_hook::{{closure}}
at libstd/panicking.rs:211
3: std::panicking::default_hook
at libstd/panicking.rs:227
4: std::panicking::rust_panic_with_hook
at libstd/panicking.rs:475
5: std::panicking::continue_panic_fmt
at libstd/panicking.rs:390
6: std::panicking::begin_panic_fmt
at libstd/panicking.rs:345
7: deflate::chained_hash_table::ChainedHashTable::add_hash_value
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/chained_hash_table.rs:141
8: deflate::lz77::process_chunk_lazy
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/lz77.rs:346
9: deflate::lz77::process_chunk
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/lz77.rs:217
10: deflate::lz77::lz77_compress_block
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/lz77.rs:655
11: deflate::compress::compress_data_dynamic_n
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/compress.rs:130
12: deflate::writer::compress_until_done
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/writer.rs:25
13: <deflate::writer::DeflateEncoder<W> as std::io::Write>::flush
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/writer.rs:137
14: <deflate::writer::gzip::GzEncoder<W> as std::io::Write>::flush
at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/writer.rs:454
15: repro::main
at src/main.rs:15
16: std::rt::lang_start::{{closure}}
at /checkout/src/libstd/rt.rs:74
17: std::panicking::try::do_call
at libstd/rt.rs:59
at libstd/panicking.rs:310
18: __rust_maybe_catch_panic
at libpanic_unwind/lib.rs:106
19: std::rt::lang_start_internal
at libstd/panicking.rs:289
at libstd/panic.rs:392
at libstd/rt.rs:58
20: std::rt::lang_start
at /checkout/src/libstd/rt.rs:74
21: main
22: __libc_start_main
23: _start
rustc 1.29.0-nightly (6a1c0637c 2018-07-23)
x86_64-unknown-linux-gnu
deflate 0.7.18
I recently made use of this library in an effort to make a mandelbrot set, but in my quest to achieve ever higher resolutions I ran into a problem: an internal assertion fails if I exceed 4800×4800 pixels.
You can find the relevant code here: https://github.com/ElectricCoffee/mandelbrot
The error that is generated goes like this:
PS D:\Code\rust\mandelbrot> cargo run
Compiling num-complex v0.2.4
Compiling mandelbrot v0.1.0 (D:\Code\rust\mandelbrot)
Finished dev [unoptimized + debuginfo] target(s) in 4.04s
Running `target\debug\mandelbrot.exe`
Generating mandelbrot_8000x8000.png.
Generating image data, hold tight...
Flattening...
Writing image data...
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `192008000`,
right: `192008189`', <::std::macros::panic macros>:5:6
stack backtrace:
0: backtrace::backtrace::trace_unsynchronized
at C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\backtrace\mod.rs:66
1: std::sys_common::backtrace::_print_fmt
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:77
2: std::sys_common::backtrace::_print::{{impl}}::fmt
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:59
3: core::fmt::write
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libcore\fmt\mod.rs:1052
4: std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\io\mod.rs:1426
5: std::sys_common::backtrace::_print
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:62
6: std::sys_common::backtrace::print
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:49
7: std::panicking::default_hook::{{closure}}
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:204
8: std::panicking::default_hook
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:224
9: std::panicking::rust_panic_with_hook
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:472
10: std::panicking::begin_panic_handler
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:380
11: std::panicking::begin_panic_fmt
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:334
12: deflate::writer::compress_until_done<alloc::vec::Vec<u8>>
at <::std::macros::panic macros>:5
13: deflate::writer::ZlibEncoder<alloc::vec::Vec<u8>>::output_all<alloc::vec::Vec<u8>>
at C:\Users\Electric Coffee\.cargo\registry\src\github.com-1ecc6299db9ec823\deflate-0.8.3\src\writer.rs:205
14: deflate::writer::ZlibEncoder<alloc::vec::Vec<u8>>::finish<alloc::vec::Vec<u8>>
at C:\Users\Electric Coffee\.cargo\registry\src\github.com-1ecc6299db9ec823\deflate-0.8.3\src\writer.rs:212
15: png::encoder::Writer<std::io::buffered::BufWriter<std::fs::File>>::write_image_data<std::io::buffered::BufWriter<std::fs::File>>
at C:\Users\Electric Coffee\.cargo\registry\src\github.com-1ecc6299db9ec823\png-0.16.1\src\encoder.rs:172
16: mandelbrot::main
at .\src\main.rs:73
17: std::rt::lang_start::{{closure}}<()>
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\src\libstd\rt.rs:67
18: std::rt::lang_start_internal::{{closure}}
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\rt.rs:52
19: std::panicking::try::do_call<closure-0,i32>
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:305
20: panic_unwind::__rust_maybe_catch_panic
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libpanic_unwind\lib.rs:86
21: std::panicking::try
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:281
22: std::panic::catch_unwind
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panic.rs:394
23: std::rt::lang_start_internal
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\rt.rs:51
24: std::rt::lang_start<()>
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\src\libstd\rt.rs:67
25: main
26: invoke_main
at d:\agent\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
27: __scrt_common_main_seh
at d:\agent\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
28: BaseThreadInitThunk
29: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: process didn't exit successfully: `target\debug\mandelbrot.exe` (exit code: 101)
PS D:\Code\rust\mandelbrot>
The above error is for an image 8000×8000 px in size
Need to set up appveyor, it's important to test on windows as the stack size there is smaller.
Either use maybeuninit, or see if the current compiler avoids the excessive stack copies so we can avoid unsafe alltogether.
The current usage should be safe, as it's used with a copy type that does not have invalid values, though as stated in the deprecation blog, mem::uninitialized may actually never be completely safe.
Add something akin to deflateSetDictionary
Should be relatively simple to implement.
Right now we only check one byte ahead, but we should check more bytes, and discard matches if they are short with a very long distance to get a similar compression level to zlib and miniz.
When the underlying writer only accepts very small write request the flush process of a ZlibEncoder
appears to try and write the same data infinitely often.
use std::io::{self, Write};
fn main() {
let _ = deflate::write::ZlibEncoder::new(SmallWriter::new(vec![], 2), deflate::Compression::Fast).flush();
}
struct SmallWriter<W: Write> {
writer: W,
small: usize,
}
impl<W: Write> SmallWriter<W> {
fn new(writer: W, buf_len: usize) -> SmallWriter<W> {
SmallWriter {
writer,
small: buf_len,
}
}
}
impl<W: Write> Write for SmallWriter<W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
// Never write more than `small` bytes at a time.
let small = buf.len().min(self.small);
self.writer.write(&buf[..small])
}
fn flush(&mut self) -> io::Result<()> {
Ok(())
}
}
Hi!
Using deflate through the image crate generates a panic in debug mode on line 50 of writer.rs.
thread 'main' panicked at 'assertion failed: (left == right)
(left: 60100
, right: 32769
)', /home/christer/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.8/src/writer.rs:50
The performance is still not great compared to miniz/zlib on files with long runs of the same byte.
EDIT:
See next post.
Profiling reveals that lz77::longest_match and lz77::get_match_length is where most time is spent.
get_match_length is particularly problematic for data where there is a lot of repetitions of one literal that causes a lot of calls to this function. (As there will be a large amount of entries in the hash chain for the 3-byte sequences of this byte.) Currently it uses two zipped iterators to compare the matches, which may not be ideal performance wise. C implementations of deflate seem to be checking multiple bytes at once by casting the bytes to larger data types. I've tested this, but it didn't seem to make a difference.
In the longest_match function, array lookups seems to be the main cause of the slowdown (maybe because further instructions depend on the array value?). If we can find a way to reduce the number of lookups, or length of the hash chains without impacting compression ratio, that would be helpful to improve performance.
For lower compression levels, other compressors simply hard-limit the length hash chains, and further adaptively reduces the hash chain length when there is a decent match.
Implement support for compression with a gzip header/trailer.
deflate-rs/src/output_writer.rs
Line 101 in 2385f2a
I don't personally think this is a big deal and I think the readme is probably just outdated, but imo the readme should be corrected. Would something like "There is a single instance of unsafe behind a feature flag that proved to measurably affect performance" be ok?
Write is only supposed to be called one on the wrapped writer in each write call. Currently we call write a fair number of times for each call to {deflate/zlibwriter}::write. In addition to being in violation with the trait, the current implementation assumes that the writer will write all bytes on each write call, which is wrong and can cause compression to fail. We also shouldn't solve this by using write_all internally, see rust-lang/flate2-rs#92
Fuzzing (fuzzer code here) triggers an assertion error in this line (looks like an overflow thing):
thread '<unnamed>' panicked at 'assertion failed: `(left == right)` (left: `32767`, right: `0`)', /home/pascal/.cargo/git/checkouts/deflate-rs-44887ade842f84eb/8e1ec1e/src/chained_hash_table.rs:126
You can find the whole log here: https://gist.github.com/killercup/f117fd4a55ba3855b74d04acdfaf46d5 (make sure to look at the crash file in raw mode; it's encoded as a raw Rust byte string).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.