gimli-rs / leb128 Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 15.0 587 KB

Read and write DWARF's "Little Endian Base 128" variable length integer encoding

Home Page: http://gimli-rs.github.io/leb128/leb128/index.html

License: Apache License 2.0

Rust 96.82% Shell 3.18%

leb128's People

Contributors

Stargazers

Watchers

Forkers

philipc fitzgen redrield hywan pombredanne jimblandy olivierlemasle icodein pexien atouchet echoptic jonaskruckenberg

leb128's Issues

Return Number of Bytes consumed

With the current implementation, the number of bytes that are consumed is not returned to the user.

About my use case:
It involves a BufReader reading a typical Binary file. I do need to seek through some parts of the file and I want to take advantage of the seek_relative functionality (so the buffer does not get re-filled every time if not necessary). However, I would manually need to keep track of the offsets, which is absolutely no problem, except I do not know how many bytes leb128 consumed, which is a shame.

It would be pretty simple to just return a tuple. I am very new to Rust and not confident enough to open a PR, but I hope it gets added soon enough!

Overflowing while reading can create "phantom numbers" which can possibly cause corrupt/incorrect output

I'm currently using leb128 to read LEB128-encoded numbers from a stream of data (TcpStream) and I encountered a serious bug that caused my application to generate corrupt/incorrect output and could possibly allow for a DoS attack (in my case).

To describe this bug, assume that cursor is my TcpStream and that I am attempting to read TWO (2) LEB128-encoded numbers from it.

Without an overflow, this library works fine:

let mut cursor = std::io::Cursor::new(vec![
  0b1000_0011, 0b0010_1110,              // 5891
  0b1110_0100, 0b1110_0000, 0b0000_0010, // 45156
]);

for call in 1..4 {
  println!("Call #{}: {:?}", call, leb128::read::unsigned(&mut cursor));
}
// Call #1: Ok(5891)  // Number one
// Call #2: Ok(45156) // Number two
// Call #3: Err(IoError(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }))

However, when an overflow occurs while reading a very long LEB128 value, a phantom 3rd number appears!

let mut cursor = io::Cursor::new(vec![
  0b1111_1111, 0b1111_1111, 0b1111_1111, 0b1111_1111,
  0b1111_1111, 0b1111_1111, 0b1111_1111, 0b1111_1111,
  0b1111_1111, 0b1111_1111, 0b0111_1111, // Overflow!
  0b1110_0100, 0b1110_0000, 0b0000_0010, // 45156
]);

for call in 1..5 {
  println!("Call #{}: {:?}", call, leb128::read::unsigned(&mut cursor));
}

// Call #1: Err(Overflow) // Number one
// Call #2: Ok(127)       // Where did you come from??
// Call #3: Ok(45156)     // Number two
// Call #4: Err(IoError(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }))

This happens because both leb128::read::signed and leb128::read::unsigned exit early if an overflow occurs:

pub fn unsigned<R>(r: &mut R) -> Result<u64, Error>
where
    R: io::Read,
{
    let mut result = 0;
    let mut shift = 0;

    loop {
        let mut buf = [0];
        r.read_exact(&mut buf)?;

        if shift == 63 && buf[0] != 0x00 && buf[0] != 0x01 { // <<<<<<<<<<<<<<<<
            return Err(Error::Overflow);                     // <<<<<<<<<<<<<<<<
        }                                                    // <<<<<<<<<<<<<<<<

        let low_bits = low_bits_of_byte(buf[0]) as u64;
        result |= low_bits << shift;

        if buf[0] & CONTINUATION_BIT == 0 {
            return Ok(result);
        }

        shift += 7;
    }
}

The condition that causes return Err(Error::Overflow); to execute can evaluate to true before the entire LEB128 value has been read, leaving behind extra bytes that can cause serious issues.

Support maximal-length encoded integers

This came up during rustwasm/walrus#30, specifically rustwasm/walrus#30 (comment).

The use case here is that sometimes when dealing with leb128 you'll often have a scenario where a leb128 integer denotes how many bytes left in the region are part of a section or unit. When encoding, though, we often don't know the length of the section up-front, so a common trick is to do something like:

For a u32 encoded as leb128, reserve 5 bytes of space
Encode the entire section
Encode the length of the section into the previously reserved 5 bytes of space

This encoding uses the maximal instead of minimal width, using "padding zero bytes" that look like 0x80 to ensure that leb128 eats up all 5 of the bytes reserved.

It'd be neat if this crate supported such a use case (it's sort of like Seek with writers), although I'd be fine just supporting it with slices for the time being!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble