GithubHelp home page GithubHelp logo

weezl's Introduction

weezl

LZW en- and decoding that goes weeeee!

Overview

This library, written in purely safe and dependency-less Rust, provides encoding and decoding for lzw compression in the style as it occurs in gif and tiff image formats. It has a standalone binary that may be used to handle those data streams but it is not compatible with Spencer's compress and uncompress binaries (though a drop-in may be developed at a later point).

Using in a no_std environment is also possible though an allocator is required. This, too, may be relaxed in a later release. A feature flag already exists but currently turns off almost all interfaces.

License

All code is dual licensed MIT OR Apache-2.0.

weezl's People

Contributors

fintelia avatar fornwall avatar grigorenkopv avatar heroickatora avatar kornelski avatar nwin avatar shutton avatar worldsender avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weezl's Issues

Add a `skip` method, discarding some amount of input

In gif it might happen that some part of a frame is outside the region of interest. In these cases it would be interesting to investigate if decoding can be sped up by skipping over and discarding some data. A similar strategy might be useful for seeking in compressed archives.

Support implicit reset

I am reversing a proprietary image format that uses LZW internally for compressing frames and I also plan to write a converter for it in Rust as a practice. I chose weezl because it looks promising and it's already a dependency for image-rs which I am also using in the converter for reading/writing images in common formats.

However one problem that I ran into is that weezl doesn't like bitstreams with no leading clear code. The official converter for that image format apparently always emits such type of bitstreams. Other than that they just seem to be standard LZW LSB bitstreams and should be supported by weezl. I saw that there's a TODO in the decoder source code. Any chance that this will be supported?

Add restore points

Complementing forward seeking, #8 , add the ability to restore a particular state of decoding and encoding. For encoding specifically this may in the future also be used to tune compression ratios by purposefully inserting additional reset codes or continuing with a full dictionary to optimize the dictionary usage.

Rename repository to weezl

This crate is published under the name weezl. It might make sense to rename the repository to match, to avoid possible confusion

New lzw encoder creates invalid streams

Encoding of large-ish images results in GIF images that look broken.

For example re-encoding of this image:

test-input

gives this file:

test-output

(Firefox refuses to render it. Chrome and macOS render only 20-something lines and garbage pixels.)

It's easy to reproduce with the example code. I've verified using another codebase that it's a bug in the GIF encoder, not the reader. The bug is in v0.11. It's not in v0.10.

unit tests can't be run from the crate downloaded from crates.io

This is more of a 'for your information' rather than a bug report about something that is wrong in the project. But I thought it wouldn't hurt to inform upstream about it :)

The unit tests depend on a file named /benches/binary-8-msb.lzw that isn't included in the crate uploaded to crates.io.

test output:

error: couldn't read /tmp/r/weezl-0.1.5/benches/binary-8-msb.lzw: No such file or directory (os error 2)
    --> src/decode.rs:1240:37
     |
1240 |           const FILE: &'static [u8] = include_bytes!(concat!(
     |  _____________________________________^
1241 | |             env!("CARGO_MANIFEST_DIR"),
1242 | |             "/benches/binary-8-msb.lzw"
1243 | |         ));
     | |__________^
     |
     = note: this error originates in the macro `include_bytes` (in Nightly builds, run with -Z macro-backtrace for more info)

error: could not compile `weezl` due to previous error
warning: build failed, waiting for other jobs to finish...
error: build failed

This means that we have to disable to tests when packaging this crate for debian. Would it be possible to include the /benches/binary-8-msb.lzw file in the next release?

Invalid codes being created during decode: `debug_asserts` for invariants

These codes are inserted into the table, but can't be used or referenced in the code text. They are just 'waste'.

I've thrown together a patch for to put in debug_asserts for the actual invariants that the code is working under:

Patch file
From ac575ce26fb081883092536e0fcbf00c2af59cc2 Mon Sep 17 00:00:00 2001
From: Andreas Molzer <[email protected]>
Date: Tue, 19 Apr 2022 21:29:13 +0200
Subject: [PATCH] Add debug assertions on internal invariants

---
 src/decode.rs | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/decode.rs b/src/decode.rs
index 283e31f..49f3bfd 100644
--- a/src/decode.rs
+++ b/src/decode.rs
@@ -711,7 +711,7 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
             Some(tup) => {
                 status = Ok(LzwStatus::Ok);
                 code_link = Some(tup)
-            },
+            }
         };
 
         // Track an empty `burst` (see below) means we made no progress.
@@ -827,6 +827,7 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
                 // the case of requiring an allocation (which can't occur in practice).
                 let new_link = self.table.derive(&link, cha, code);
                 self.next_code += 1;
+                debug_assert!(self.next_code as usize <= MAX_ENTRIES);
                 code = burst;
                 link = new_link;
             }
@@ -918,6 +919,8 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
                 }
 
                 self.next_code += 1;
+                debug_assert!(self.next_code as usize <= MAX_ENTRIES);
+
                 new_link = link;
             } else {
                 // It's actually quite likely that the next code will be a reset but just in case.
@@ -1203,6 +1206,13 @@ impl Table {
     }
 
     fn derive(&mut self, from: &Link, byte: u8, prev: Code) -> Link {
+        debug_assert!(
+            self.inner.len() < MAX_ENTRIES,
+            "Invalid code would be created {:?} {} {:?}",
+            from.prev,
+            byte,
+            prev
+        );
         let link = from.derive(byte, prev);
         let depth = self.depths[usize::from(prev)] + 1;
         self.inner.push(link.clone());
-- 
2.35.1

The trace of running decoding with those suggest that the comparison itself relies on an incorrect assumption. Since it uses == it relies on self.next_code <= self.code_buffer.max_code() but that doesn't hold. When we reach 12-bits then the code buffer does not get larger and max_code() remains at 4095. At the same time next_code will advance to 4096, and never beyond in the sequential code path, a code that will never be created and thus works correctly with the rest of the logic.

But when that is the exact moment that we enter a burst, as is the case with the provided file, then it will advance next_code beyond that and not notice that the maximum code has been reached. An easy fix would be to adjust the condition:

if potential_code >= self.code_buffer.max_code() - Code::from(self.is_tiff) {

I'll measure if that leads to too much of a performance loss due to executing less of the simple code reconstruction.

Originally posted by @HeroicKatora in #30 (comment)

Add dumb methods for lazy users

Sometimes you just want to get data (de)compressed and don't really have the patience to look at the elegance of a very finely adjustable system.

For those cases I propose adding these functions:

  • fn encode(data: &[u8], order: BitOrder, size) -> Vec<u8>
  • fn encode_tiff(data: &[u8], order: BitOrder, size) -> Vec<u8>
  • fn decode(data: &[u8], order: BitOrder, size) -> Vec<u8>
  • fn decode_tiff(data: &[u8], order: BitOrder, size) -> Vec<u8>

In reality the decode functions would probably return a Result instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.