GithubHelp home page GithubHelp logo

sharksforarms / deku Goto Github PK

View Code? Open in Web Editor NEW
965.0 9.0 49.0 797 KB

Declarative binary reading and writing: bit-level, symmetric, serialization/deserialization

License: Apache License 2.0

Rust 99.33% HTML 0.67%
rust rust-crate serialization deserialization parse encoder-decoder bits bytes declarative symmetric

deku's People

Contributors

abungay avatar agausmann avatar calebfletcher avatar constfold avatar dependabot-preview[bot] avatar dependabot[bot] avatar dullbananas avatar elast0ny avatar initerworker avatar inspier avatar interruptinuse avatar korrat avatar kraktus avatar myrrlyn avatar samuelsleight avatar sharksforarms avatar soruh avatar vext01 avatar vhdirk avatar vidhanio avatar visse avatar wcampbell0x2a avatar wildcryptofox avatar xlambein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deku's Issues

Impove derive macro error message

Currently when an error happend in deive macro it will just panic, we should use syn::Error::to_compile_error() instead.
For example, an invalid attribute give this message:

error: proc-macro derive panicked
 --> src\main.rs:3:10
  |
3 | #[derive(DekuRead)]
  |          ^^^^^^^^
  |
  = help: message: called `Result::unwrap()` on an `Err` value: Error { kind: UnknownField(ErrorUnknownField { name: "id", did_you_mean: None }), locations: ["b"], span: Some(#0 bytes(79..86)) }

A better message could be:

error: unknown deku field attribute `id`
 --> src\main.rs:3:10
|
7| #[deku(id = "")]
|         ^^

Rename `to_bitvec` to `to_bits`

Why a function convert a type to bytes (Vec<u8>) is to_bytes but a function convert type to bits(BitVec<Msb0, u8>) is to_bitvec? why not to_bits?

deku/src/lib.rs

Lines 251 to 255 in c7e0377

/// Write struct/enum to Vec<u8>
fn to_bytes(&self) -> Result<Vec<u8>, DekuError>;
/// Write struct/enum to BitVec
fn to_bitvec(&self) -> Result<BitVec<Msb0, u8>, DekuError>;

Implement BitsSize, BitsRead and BitsWrite on the struct itself

For composability, it would be nice to do something like the following:

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct FieldB {
    #[deku(bits = "6")]
    data: u8,
}

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct DekuTest {
    #[deku(bits = "2")]
    field_a: u8,
    field_b: FieldB
}

Restrict reader and writer to certain variables, not all internals

readers/writers should have access to:
rest, struct variables, final attribute variables (bit size, input_is_le)

The provided function can be run in a function sandbox where the needed variables are passed with documented names: i.e.

let variant_read_func = if variant_reader.is_some() {
    fn sandbox_reader(rest:, input_is_le:, field_a: field_b:) {
        quote! { #variant_reader; }
    }
    sandbox_reader(rest, input_is_le, field_a, field_b);
} 
...

Why `from_bytes` need `bit_offset`

deku/src/lib.rs

Line 235 in c7e0377

fn from_bytes(input: (&[u8], usize)) -> Result<((&[u8], usize), Self), DekuError>

Why not from_bytes(bytes: Bytes) and from_bits(bits: Bits)? Why do I need to care about which bit the byte start from when I use a function called from_bytes?

Context of enum `id_type` cannot be utilized

For example, passing the top-level endian down to it's child

#[deku(endian = "big")]
struct Parent {
   child: Child
}

#[deku(id_type = "u16", ctx = "_endian: deku::ctx::Endian")] // will default to system endianess, no way to use ctx endian
enum Child {
   Variant
}

Vec type read/write

pub struct MyStruct {
  ext_len: usize,
  #[deku(len_field= "ext_len")]
  extensions: Vec<Extension>,
}

Allow something like this and update the ext_len before dumping bits to acc

Implement DekuRead, DekuWrite for String

Maybe something like this?

use deku::prelude::*;

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct Packet {
    s: String,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test01() {
        //                      [len  | string               ]
        let data: Vec<u8> = vec![5, 104, 101, 108, 108, 111];

        let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();

        assert_eq!(
            Packet {
                s: "hello".to_string(),
            },
            value
        );
    }
}

(len would be an u64 in a real example)

Endian-ness composing

I believe it would be a nice features for child structs and enums to inherit the parents endian type.

Currently the following code produces the following error:

 `deku::DekuRead<deku::ctx::Endian>` is not implemented for B
use deku::prelude::*;

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(endian = "big")]
struct Packet {
    len: u16,
    messages: B,
}

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8")]
enum B {
    #[deku(id = "0x00")]
    one,
    #[deku(id = "0x01")]
    two,
    #[deku(id = "0x02")]
    three,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test01() {
        let data: Vec<u8> = vec![0x04, 0x13, 0x01];

        let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();

        assert_eq!(
            Packet {
                len: 0x0413,
                messages: B::two,
            },
            value
        );
    }
}

In fact, the way of creating a compiling version of this code seems a bit odd. As I only add endian to the len field.

use deku::prelude::*;

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct Packet {
    #[deku(endian = "big")]
    len: u16,
    messages: B,
}

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8")]
enum B {
    #[deku(id = "0x00")]
    one,
    #[deku(id = "0x01")]
    two,
    #[deku(id = "0x02")]
    three,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test01() {
        let data: Vec<u8> = vec![0x04, 0x13, 0x01];

        let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();

        assert_eq!(
            Packet {
                len: 0x0413,
                messages: B::two,
            },
            value
        );
    }
}

add top-level enum attribute `id`

Currently I don't see a way to use ctx as the bytes to parse an enum (instead of reading bits/bytes). The key part is the category and length need to be read before the Messages are parsed. The category field is used to decide which struct from an enum is parsed.

Example

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct AsterixPacket {
    #[deku(bytes = "1", endian = "big")]
    category: u8,
    #[deku(bytes = "2", endian = "big")]
    length: u16,
    #[deku(ctx = "*category")]
    messages: Vec<Message>,
}

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_ctx = "category")]
enum Message {
    #[deku(id = "48")]
    Cat48(Cat48),
}

Conditional Skip

Great idea of a library.

I have a protocol that needs conditional parsing of fields in a struct. I see the skip attribute, but would something like a conditional skip be possible?

use deku::prelude::*;
use std::convert::TryFrom;

#[derive(PartialEq, Debug, DekuRead, DekuWrite)]
pub struct DekuTest {
    pub field_a: u8,
    #[deku(skip, if=(field_a, 1))]
    pub field_b: Option<u8>,
    #[deku(skip, if=(field_b, Some(1)))]
    pub field_c: Option<u8>,
}

fn main() {
    let data: Vec<u8> = vec![0x01, 0x02];

    let value = DekuTest::from_bytes((data.as_ref(), 0)).unwrap();
    println!("{:#?}", value)
}

Differing impls for bits and bytes

If bytes attribute is used and the index is on a byte boundary, it may be quicker to read from a slice of &[u8] instead of reading 8*n bits.

I'd like for more benchmarks to be written before so this optimization can be measured

One option could be to feature flag the bits/bytes attributes

Add `count` attribute

A fixed number of elements to be read

Example:

struct Test {
    #[deku(count = 2)]
    data: Vec<u8>
}

Possibly also rename len attribute to count_field ?

Improve examples

  • Find some good examples for the README/lib.rs landing page
    • Showcase struct, enums, vec, custom reader/writer

Enum attribute improvements

Current enum behavior

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8")]
enum Packet {
    #[deku(id = "0x00")]
    Zero,
    #[deku(id = "0x01")]
    One,
    #[deku(id = "0x02")]
    Two,
    #[deku(id = "0x03")]
    Three,
    #[deku(id = "0x04")]
    Four,
}

attribute inherit which would inherit the id from the value already assigned.

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8", inherit)]
enum Packet {
    Zero = 0x00,
    One =  0x01,
    Two = 0x02,
    Three = 0x03,
    Four = 0x04,
}

attribute ordered which would take the first element and increase the id value for each value after that one.

Maybe this would increase for every enum field that didn't have an id defined.

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8", ordered = "0x00")]
enum Packet {
    Zero,
    One,
    Two,
    Three,
    Four,
    #[deku(id = "44")]
    FourtyFour,
}

Rename BitsWriter and BitsReader

I feel like another name would be better suited, possibly matching the proc-macro's name if that's the convention? DekuRead DekuWrite

Attribute to handle length of bytes(or bits) in Vec<T> with other struct field

Consider the following code:

use deku::prelude::*;

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
pub struct Packet {
    #[deku(bytes = "1")]
    length: u8,
    // byte len of all of messages is length - 2
    messages: Vec<Message>,
}

/// In the real packet, this would be of variable length, so we can't use just the `count` attribute on messages
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
pub struct Message {
    #[deku(bytes = "1")]
    msg: u8,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test01() {
        let data: Vec<u8> = vec![0x04, 0x01, 0x02];

        let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();

        assert_eq!(
            Packet {
                length: 0x04,
                messages: vec![Message { msg: 0x01 }, Message { msg: 0x02 }]
            },
            value
        );
    }
}

I can use the count attribute to give the length of Vec<T> elements, but I see no way of telling the max bytes that a Vec can have in it's container as a whole. Wondering if this would be a feature, or do I need to do some weird custom write/read implementation with a read_bytes field.

Add support for condition and context

Hey, I'm trying to write a simple binary parser with Deku, and here is two problems I found.

Condition

I read your source and found i can pass whatever arguments to it. Sorry about it

Context

Think this binary structure:

struct Data {
    a: u8,
    // This field depends on `Header.flag`
    b: Option<u8>
}
struct Bin {
    flag: u8,
    data: Vec<Data>
}

Because the lack of context, I cant find any way to parse it except writing a custom reading function manully. By the way, add context supportting is a little complicated, I still dont know what the best way is.
Overall, thanks for your great crate.

Print ident name in "Could not match enum variant" Err

In the following line:

deku-derive/src/macros/deku_read.rs:175:                return Err(DekuError::Parse(format!("Could not match enum variant id = {:?}", variant_id)));

It would be nice to print out the ident name (so the name of the Enum) for easier troubleshooting.

I would do it, but I can't for the life of me figure out how to print out the ident. New to proc_macros/quote

`update` attribute

gets called when the struct is .update()'d, kinda like the len attribute but provides a custom impl

Example:

pub struct Ipv4 {
    ....
    #[deku(update = "calc_checksum(...)")]
    pub checksum: u16,       // Header checksum
    ....
}

Conditional reading

Allows for conditional field reading dependent on the return of a lambda

fn my_condition(input: &[u8], index: usize) {

    // TODO: somehow get access to field_a ?
    // if (field_a == 0xAB) {
    //    return true;
    // }

    return false;
}

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct DekuTest {
    field_a: u8,
    #[deku(bits = "7", read_if=my_condition)]
    field_b: Option<u32>,
}

Not sure the best way to give lambda access to the previously parsed fields

assert_eq/assert attributes

It's useful when a prototype has a magic header(e.g. zlib, pyc, jpg) or a field has a limit.

struct Foo {
    #[deku(assert_eq("[0xAA, 0xBB, 0xCC]"))]
    magic: [u8; 3]
    #[deku(assert("a >= 128"))]
    a: u8,
}

`option_as_tokenstream` will eat span

Since option_as_tokenstream uses Option<String> as its input, darling will discard the span while parsing(because String doesn't have a span). It makes error message hard to read.
For example:

#[derive(DekuRead)]
struct Foo {
    #[deku(cond = "'a' == 2")]
    a: u8,
}

error:

error[E0308]: mismatched types
  --> src\main.rs:24:10
   |
24 | #[derive(DekuRead)]
   |          ^^^^^^^^ expected `char`, found `u8`
   |
   = note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)

replace it with LitStr:

error[E0308]: mismatched types
  --> src\main.rs:26:19
   |
26 |     #[deku(cond = "'a' == 2")]
   |                   ^^^^^^^^^^ expected `char`, found `u8`

Add `map` attribute

Allow running a function of the read value

Examples:

struct SomeStruct {
    #[deku(map= "|f: u8| f.to_string()")]
    field_a: String,
}
fn map_string(f: u8) -> String {
    f.to_string()
}
struct SomeStruct {
    #[deku(map= "map_string")]
    field_a: String,
}

Edit:
Can do something with trait calls like so:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9a711662336e1b0059e40947250b05ff

https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#fully-qualified-syntax-for-disambiguation-calling-methods-with-the-same-name

Cleanup enum attribute names

I can't think of a reason why there's both id_bits and bits, for example there's endian but not id_endian (we use endian for enum)

This would be more consistent with field/structs

Also rename id_type to type and id to value

Before:

#[deku(id_type = "u8", id_bits = "5")]
enum Test {
    #[deku(id = "0x01")]
    VarA,
}

After

#[deku(type = "u8", bits = "5")]
enum Test {
    #[deku(value = "0x01")]
    VarA,
}

Add support for unused/padding bits

It would be nice if there was a way to skip over bits or bytes without creating dummy fields in order to save on space e.g :

pub struct SomeStruct {
    field_01: u8,

    // This field is useless but is needed for proper read/write
    unused01: u8,

    field_02: u8,

    // This field is useless but is needed for proper read/write
    unused02: u8,
}

maybe some skip_[bytes|bits] that could be added before and after fields in structs :

#[deku(skip_bytes="1")]

Add `ctx_default` attribute

Add a ctx_default attribute to allow for containers which can both take a context, or default to a fixed context

#[derive(PartialEq, Debug, DekuRead, DekuWrite)]
#[deku(ctx = "a: u8, b: u8", ctx_default = "1, 2")]
pub struct TopLevelCtxStructDefault {
    #[deku(cond = "a == 1")]
    pub a: Option<u8> 
    #[deku(cond = "b == 1")]
    pub b: Option<u8>,
}

#[test]
fn test_ctx_default_struct() {
    let expected = samples::TopLevelCtxStructDefault {
        a: Some(0xff),
        b: None,
    };

    let test_data = [0xffu8];

    // Use default
    let ret_read = samples::TopLevelCtxStructDefault::try_from(test_data.as_ref()).unwrap();
    assert_eq!(expected, ret_read);
    let ret_write: Vec<u8> = ret_read.try_into().unwrap();
    assert_eq!(ret_write, test_data);

    // Use context
    let (rest, ret_read) =
        samples::TopLevelCtxStructDefault::read(test_data.bits(), (1, 2)).unwrap();
    assert!(rest.is_empty());
    assert_eq!(expected, ret_read);
    let ret_write = ret_read.write((1, 2)).unwrap();
    assert_eq!(test_data.to_vec(), ret_write.into_vec());
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.