gimli-rs / leb128 Goto Github PK
View Code? Open in Web Editor NEWRead and write DWARF's "Little Endian Base 128" variable length integer encoding
Home Page: http://gimli-rs.github.io/leb128/leb128/index.html
License: Apache License 2.0
Read and write DWARF's "Little Endian Base 128" variable length integer encoding
Home Page: http://gimli-rs.github.io/leb128/leb128/index.html
License: Apache License 2.0
With the current implementation, the number of bytes that are consumed is not returned to the user.
About my use case:
It involves a BufReader reading a typical Binary file. I do need to seek
through some parts of the file and I want to take advantage of the seek_relative
functionality (so the buffer does not get re-filled every time if not necessary). However, I would manually need to keep track of the offsets, which is absolutely no problem, except I do not know how many bytes leb128
consumed, which is a shame.
It would be pretty simple to just return a tuple. I am very new to Rust and not confident enough to open a PR, but I hope it gets added soon enough!
I'm currently using leb128
to read LEB128-encoded numbers from a stream of data (TcpStream) and I encountered a serious bug that caused my application to generate corrupt/incorrect output and could possibly allow for a DoS attack (in my case).
To describe this bug, assume that cursor
is my TcpStream and that I am attempting to read TWO (2) LEB128-encoded numbers from it.
Without an overflow, this library works fine:
let mut cursor = std::io::Cursor::new(vec![
0b1000_0011, 0b0010_1110, // 5891
0b1110_0100, 0b1110_0000, 0b0000_0010, // 45156
]);
for call in 1..4 {
println!("Call #{}: {:?}", call, leb128::read::unsigned(&mut cursor));
}
// Call #1: Ok(5891) // Number one
// Call #2: Ok(45156) // Number two
// Call #3: Err(IoError(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }))
However, when an overflow occurs while reading a very long LEB128 value, a phantom 3rd number appears!
let mut cursor = io::Cursor::new(vec![
0b1111_1111, 0b1111_1111, 0b1111_1111, 0b1111_1111,
0b1111_1111, 0b1111_1111, 0b1111_1111, 0b1111_1111,
0b1111_1111, 0b1111_1111, 0b0111_1111, // Overflow!
0b1110_0100, 0b1110_0000, 0b0000_0010, // 45156
]);
for call in 1..5 {
println!("Call #{}: {:?}", call, leb128::read::unsigned(&mut cursor));
}
// Call #1: Err(Overflow) // Number one
// Call #2: Ok(127) // Where did you come from??
// Call #3: Ok(45156) // Number two
// Call #4: Err(IoError(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }))
This happens because both leb128::read::signed
and leb128::read::unsigned
exit early if an overflow occurs:
pub fn unsigned<R>(r: &mut R) -> Result<u64, Error>
where
R: io::Read,
{
let mut result = 0;
let mut shift = 0;
loop {
let mut buf = [0];
r.read_exact(&mut buf)?;
if shift == 63 && buf[0] != 0x00 && buf[0] != 0x01 { // <<<<<<<<<<<<<<<<
return Err(Error::Overflow); // <<<<<<<<<<<<<<<<
} // <<<<<<<<<<<<<<<<
let low_bits = low_bits_of_byte(buf[0]) as u64;
result |= low_bits << shift;
if buf[0] & CONTINUATION_BIT == 0 {
return Ok(result);
}
shift += 7;
}
}
The condition that causes return Err(Error::Overflow);
to execute can evaluate to true
before the entire LEB128 value has been read, leaving behind extra bytes that can cause serious issues.
This came up during rustwasm/walrus#30, specifically rustwasm/walrus#30 (comment).
The use case here is that sometimes when dealing with leb128 you'll often have a scenario where a leb128 integer denotes how many bytes left in the region are part of a section or unit. When encoding, though, we often don't know the length of the section up-front, so a common trick is to do something like:
This encoding uses the maximal instead of minimal width, using "padding zero bytes" that look like 0x80 to ensure that leb128 eats up all 5 of the bytes reserved.
It'd be neat if this crate supported such a use case (it's sort of like Seek
with writers), although I'd be fine just supporting it with slices for the time being!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.