GithubHelp home page GithubHelp logo

near / borsh-rs Goto Github PK

View Code? Open in Web Editor NEW
268.0 10.0 60.0 69.14 MB

Rust implementation of Binary Object Representation Serializer for Hashing

Home Page: https://borsh.io/

License: Apache License 2.0

Rust 100.00%
rust borsh serialization binary-serialization

borsh-rs's Introduction

Borsh in Rust   Latest Version borsh: rustc 1.67+ License Apache-2.0 badge License MIT badge

borsh-rs is Rust implementation of the Borsh binary serialization format.

Borsh stands for Binary Object Representation Serializer for Hashing. It is meant to be used in security-critical projects as it prioritizes consistency, safety, speed, and comes with a strict specification.

Example

use borsh::{BorshSerialize, BorshDeserialize, from_slice, to_vec};

#[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
struct A {
    x: u64,
    y: String,
}

#[test]
fn test_simple_struct() {
    let a = A {
        x: 3301,
        y: "liber primus".to_string(),
    };
    let encoded_a = to_vec(&a).unwrap();
    let decoded_a = from_slice::<A>(&encoded_a).unwrap();
    assert_eq!(a, decoded_a);
}

Features

Opting out from Serde allows borsh to have some features that currently are not available for serde-compatible serializers. Currently we support two features: borsh(init=<your initialization method name> and borsh(skip) (the former one not available in Serde).

borsh(init=...) allows to automatically run an initialization function right after deserialization. This adds a lot of convenience for objects that are architectured to be used as strictly immutable. Usage example:

#[derive(BorshSerialize, BorshDeserialize)]
#[borsh(init=init)]
struct Message {
    message: String,
    timestamp: u64,
    public_key: CryptoKey,
    signature: CryptoSignature
    hash: CryptoHash
}

impl Message {
    pub fn init(&mut self) {
        self.hash = CryptoHash::new().write_string(self.message).write_u64(self.timestamp);
        self.signature.verify(self.hash, self.public_key);
    }
}

borsh(skip) allows to skip serializing/deserializing fields, assuming they implement Default trait, similarly to #[serde(skip)].

#[derive(BorshSerialize, BorshDeserialize)]
struct A {
    x: u64,
    #[borsh(skip)]
    y: f32,
}

Enum with explicit discriminant

#[borsh(use_discriminant=false|true]) is required if you have an enum with explicit discriminant. This setting affects BorshSerialize and BorshDeserialize behaviour at the same time.

In the future, borsh will drop the requirement to explicitly use #[borsh(use_discriminant=false|true)], and will default to true, but to make sure that the transition from the older versions of borsh (before 0.11 release) does not cause silent breaking changes in de-/serialization, borsh 1.0 will require to specify if the explicit enum discriminant should be used as a de-/serialization tag value.

If you don't specify use_discriminant option for enum with explicit discriminant, you will get an error:

error: You have to specify `#[borsh(use_discriminant=true)]` or `#[borsh(use_discriminant=false)]` for all enums with explicit discriminant

In order to preserve the behaviour of borsh versions before 0.11, which did not respect explicit enum discriminants for de-/serialization, use #[borsh(use_discriminant=false)], otherwise, use true:

#[derive(BorshDeserialize, BorshSerialize)]
#[borsh(use_discriminant=false)]
enum A {
    X,
    Y = 10,
}

Testing

Integration tests should generally be preferred to unit ones. Root module of integration tests of borsh crate is linked here.

Releasing

The versions of all public crates in this repository are collectively managed by a single version in the workspace manifest.

So, to publish a new version of all the crates, you can do so by simply bumping that to the next "patch" version and submit a PR.

We have CI Infrastructure put in place to automate the process of publishing all crates once a version change has merged into master.

However, before you release, make sure the CHANGELOG is up to date and that the [Unreleased] section is present but empty.

License

This repository is distributed under the terms of both the MIT license and the Apache License (Version 2.0). See LICENSE-MIT and LICENSE-APACHE for details.

borsh-rs's People

Contributors

ailisp avatar arshia001 avatar austinabell avatar bowenwang1996 avatar dj8yfo avatar frol avatar fuuzetsu avatar iho avatar ilblackdragon avatar itegulov avatar k06a avatar lazureykis avatar lexfro avatar magicrb avatar maksymzavershynskyi avatar marcoieni avatar matklad avatar mikedotexe avatar mikhailok avatar mina86 avatar miraclx avatar nhynes avatar paolobarbolini avatar pmnoxx avatar preston-evans98 avatar serprex avatar tzemanovic avatar vgrichina avatar volovyks avatar westy92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

borsh-rs's Issues

Failed to derive borsh ser/de in a recursive data structure

Minimal example:

#[derive(BorshSerialize, BorshDeserialize)]
enum A {
    X,
    Y(Box<A>),
}

gives compilation error:

error[E0275]: overflow evaluating the requirement `Box<A>: BorshSerialize`
 --> src/main.rs:3:10
  |
3 | #[derive(BorshSerialize, BorshDeserialize)]
  |          ^^^^^^^^^^^^^^
  |

Zero Sized Types should not enable DoS

We currently special-case zst's when deserializing vectors:

} else if size_of::<T>() == 0 {
let mut result = vec![T::deserialize(buf)?];
let p = result.as_mut_ptr();
unsafe {
forget(result);
let len = len.try_into().map_err(|_| ErrorKind::InvalidInput)?;
let result = Vec::from_raw_parts(p, len, len);
Ok(result)
}
} else {
// TODO(16): return capacity allocation when we can safely do that.

I think the reason to do that is to avoid DOS attacks. If we remove this restriction, then it's possible to hand-craft four-byte-len vec![(); !0], which will take a lot of time to deserialize. There are a couple of problems with this approach though:

  • it is unsound (#19)
  • it doesn't' fully address the problem (we have the same issue for hash maps)

noob: Serialize to specific array

Early (rust, borsh, just about everything) learner.

Is it possible to provide the serialize a specific location to serialize into?

For example, if I have a 1 MB array and I would like to serialize to it after the 1024th byte.

Any help is appreciated.

UB in BorshDeserialize for types that have validity invariants

Repro that ends up executing SIGILL due to undefined behavior:

use borsh::BorshDeserialize;
use std::io::Result;

struct B(u8, Void);
enum Void {}

impl BorshDeserialize for B {
    fn deserialize(_buf: &mut &[u8]) -> Result<Self> {
        unreachable!()
    }
    fn is_u8() -> bool {
        true
    }
}

fn main() {
    let x = Vec::<B>::deserialize(&mut &[1, 0, 0, 0, 0][..]).unwrap();
    match x[0].1 {}
}
$ cargo run --release
Illegal instruction (core dumped)

use `InvalidData` rather than `InvalidInput` for errors

Mostly pedantry, but, for cases like trying to deserialize 92 as bool the correct error type is InvalidData. InvalidInput is for cases where arguments to the syscall are invalid -- that is, it's a bug in the code and a programming error, not a bug in the data.

Borsh derive fails when using private types when derived on public type

Roughly, the pattern is:

#[derive(BorshSerialize, BorshDeserialize)]
struct Outer<T> {
    v: private::Inner<T>,
}

mod private {
    use super::*;

    #[derive(BorshSerialize, BorshDeserialize)]
    pub(crate) struct Inner<T> {
        val: T,
    }
}

And this fails because there is a where clause in the expanded which is unnecessary (Inner<T>: BorshSerialize) on the derive, which leaks the type for the bound on the trait.

https://github.com/austinabell/borsh-derive-repro/blob/main/src/lib.rs is a repro of this issue, and I've included Serde derive for this same case working.

BorshDeserialize shouldn't require `Copy`

Is there any reason the Copy bound for array de serialization can be downgraded to Clone? It's unfortunate in general why this bound exists, seems like it's just for bytes deserialization.

This should just work:

#[derive(BorshDeserialize, BorshSerialize)]
pub struct MyStruct {
    test: [String; 2],
}

and ideally this too:

#[derive(BorshDeserialize, BorshSerialize)]
struct Inner;

#[derive(BorshDeserialize, BorshSerialize)]
pub struct MyStruct {
    test: [Inner; 2],
}

Dynamically Deserialize with BorshSchema

@nearmax

I have a yaml file describing a struct. I would like to create a process that can interpret the yaml declaration and deserialize from a data buffer where the actual struct (or whatever valid and supported Borsh type) has been serialized.

Is there a way, using BorshSchema???, to accomplish this?

Here is what I thought I needed to do, however; the last line throws exception:

    fn test_schema() {
        #[derive(BorshSerialize, Debug)]
        struct A {
            foo: u64,
            bar: String,
        }
        #[derive(BorshSchema, BorshDeserialize, Debug)]
        struct B;
        fn setup_faux_struc() -> Definition {
            Definition::Struct {
                fields: Fields::NamedFields(vec![
                    ("foo".to_string(), "u64".to_string()),
                    ("bar".to_string(), "string".to_string()),
                ]),
            }
        }
        let my_type = setup_faux_struc();
        let mut my_def = HashMap::<String, Definition>::new();
        my_def.insert("B".to_string(), setup_faux_struc());
        B::add_definition("B".to_string(), my_type, &mut my_def);
        // SadDerived::add_definitions_recursively(&mut my_def);
        println!("{:?}", B::declaration());

        let a_with_val = A {
            foo: 1000,
            bar: "goofy".to_string(),
        };
        let b = a_with_val.try_to_vec().unwrap();
        println!("{:?}", b);
        let c = B::try_from_slice(&b).unwrap(); // THROWS EXCEPTION
    }

Enforce canonicity for HashMap and BinaryHeap

Binary heap serialization is very wrong:

for item in self {
item.serialize(writer)?;
}

As per documentation,

Returns an iterator visiting all values in the underlying vector, in arbitrary order.

This is exactly the thing that borsh should make sure never happens, it absolutely must guarantee serialization in a specific order. Fixing this would be a breaking change for serialization format for binary heaps :( Feels we need to do it though -- I doubt anyone really uses binary heaps with borsh. Actually, I think it's better to remove impl for BinaryHeap altogether -- it's not the kind of data strcuture you should be using for serialization anyway, and by elevating this to a compile time error we make sure that we won't subtly break anyone.

HashMap deserialization is subtly wrong:

#[inline]
fn deserialize(buf: &mut &[u8]) -> Result<Self> {
let len = u32::deserialize(buf)?;
// TODO(16): return capacity allocation when we can safely do that.
let mut result = HashMap::with_hasher(H::default());
for _ in 0..len {
let key = K::deserialize(buf)?;
let value = V::deserialize(buf)?;
result.insert(key, value);
}
Ok(result)
}

When we serialize a map, we sort the key-value sequence. We should check on deserialization that the sequence is sorted and emit an error if that's no the case. That's technically a breaking bug fix, though shouldn't affect things in practice, as we do sort stuff on serialization.

Arrays: serialization does not match schema

I think this line:

borsh/borsh-rs/borsh/src/schema.rs
Line 162 in 50c3c5dimpl_arrays!(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 32 64 65);
Should be changed to be the same as this line:

borsh/borsh-rs/borsh/src/ser/mod.rs
Line 347 in 50c3c5dimpl_arrays!(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 64 65 128 256 512 1024 2048);
but with the addition of ‘0’ as the first element as per the schema line.

Schema generates conflicting names for enum variants

The following code fails to compile because the schema name generation for enum variants can produce name clashes.

#[derive(BorshSchema)]
enum Foo {
    Bar(FooBar),
}

#[derive(BorshSchema)]
struct FooBar {}
error[E0275]: overflow evaluating the requirement `<Foo as borsh::BorshSchema>::add_definitions_recursively::FooBar: borsh::BorshSchema`
 --> code.rs:1:10
  |
2 | #[derive(BorshSchema)]
  |          ^^^^^^^^^^^
  |
  = help: see issue #48214
  = note: this error originates in the derive macro `borsh::BorshSchema` (in Nightly builds, run with -Z macro-backtrace for more info)

I believe the error is a result of using a simple concatenation to generate the name.

let full_variant_name_str = format!("{}{}", name_str, variant_name_str);

infra: don't create already existing releases

Please implement serialization and deserialization for [u8; 34]

IPFS CIDv0 identifiers are 34 bytes long, and that's a common identifier that is stored on chain for off chain data.

If possible, please add an impl for [u8;34] for automatic derive-based BorshSerialize and BorshDeserialize. I'm not able to do this from an external crate.

0.9.2 fails to compile with no_std

With this in Cargo.toml:

[dependencies]
borsh = { version = "=0.9.2", default-features = false }

I get the error:

   Compiling borsh v0.9.2
error: cannot find macro `vec` in this scope
   --> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/borsh-0.9.2/src/de/mod.rs:308:30
    |
308 |             let mut result = vec![T::deserialize(buf)?];
    |                              ^^^
    |
    = note: consider importing one of these items:
            alloc::vec
            crate::maybestd::vec

error: could not compile `borsh` due to previous error

Schema does not clean unused attrs from cloned enum struct variants

use borsh::{self, BorshSchema};
use serde::Serialize;

#[derive(Serialize, BorshSchema)]
enum A {
    #[serde(rename = "ab")]
    B {
        #[serde(rename = "abc")]
        c: (),
    },
}

Roughly expands into:

enum A {
    #[serde(rename = "ab")]
    B {
        #[serde(rename = "abc")]
        c: (),
    },
}

impl borsh::BorshSchema for A {
    fn declaration() -> borsh::schema::Declaration {
        "A".to_string()
    }

    fn add_definitions_recursively(
        definitions: &mut borsh::maybestd::collections::HashMap<
            borsh::schema::Declaration,
            borsh::schema::Definition,
        >,
    ) {
        #[derive(BorshSchema)]
        struct AB {
            #[serde(rename = "abc")]
            c: (),
        }

        <AB as BorshSchema>::add_definitions_recursively(definitions);

        let variants = <[_]>::into_vec(
            #[rustc_box]
            ::alloc::boxed::Box::new([("B".to_string(), <AB>::declaration())]),
        );

        let definition = borsh::schema::Definition::Enum { variants };

        Self::add_definition(Self::declaration(), definition, definitions);
    }
}

As seen in the function add_definitions_recursively, the enum variant A::B is effectively cloned into a struct AB:

#[derive(BorshSchema)]
struct AB {
    #[serde(rename = "abc")]
    c: (),
}

Which, oddly, still retains any non-borsh attributes the origin had.

Causing builds to fail, because rust can't find a source for those attributes:

$ cargo build
error: cannot find attribute `serde` in this scope
 --> src/lib.rs:8:11
  |
8 |         #[serde(rename = "abc")]
  |           ^^^^^
  |
  = note: `serde` is in scope, but it is a crate, not an attribute

serde derivations for `BorshSchemaContainer`

It would be nice to be able to (de)serialize BorshSchemaContainer into something human-readable. For some context, we must somehow represent a contract's borsh-serializable types in the ABI schema file. Since this file is in JSON format, we have to serialize borsh schema as JSON, which is doable in a janky way using serde's remote derivation (see near/near-sdk-rs#872). Ideally, this logic should live in this repo, but I also recognize that borsh would rather not depend on serde, so maybe this functionality can be behind an optional feature schema-serde?

As for the BorshSchemaContainer's format, I propose we use untagged Fields and inline all single-fielded Definition variants for the sake of readability while maintaining all other underlying type names (Struct, Tuple etc). Borsh schema for struct Pair(u32, u32) would look like:

{
  "declaration": "Pair",
  "definitions": {
    "Pair": {
      "Struct": [
        "u32",
        "u32"
      ]
    }
  }
}

See near/near-sdk-rs@979b839 for more examples.

CC @austinabell

Const generics support

Would be beneficial to swap manual/macro impls for const generics.

My assumption would be the best path forward would be to put it under a feature, as this lib is more widely used.

Deserialize arrays in Borsh?

I'm trying to use Borsh to deserialize a large array.

pixels: [(Pubkey, u8); 1_000 * 1_000],

I've added the following crate attribute:

#![feature(trivial_bounds)]

And I'm getting the error:

   |
76 | #[account]
   | ^^^^^^^^^^ the trait `BorshSerialize` is not implemented for `[(anchor_lang::prelude::Pubkey, u8); 1000000]`
   | 
  ::: /home/vedantroy/.cargo/registry/src/github.com-1ecc6299db9ec823/borsh-0.9.1/src/ser/mod.rs:44:18
   |
44 |     fn serialize<W: Write>(&self, writer: &mut W) -> Result<()>;
   |                  - required by this bound in `anchor_lang::AnchorSerialize::serialize`
   |
   = help: the following implementations were found:
             <[T; 0] as BorshSerialize>
             <[T; 1024] as BorshSerialize>
             <[T; 10] as BorshSerialize>
             <[T; 11] as BorshSerialize>
           and 37 others
   = note: required because of the requirements on the impl of `BorshSerialize` for `GameState`
   = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0277]: the trait bound `[(anchor_lang::prelude::Pubkey, u8); 1000000]: BorshDeserialize` is not satisfied
  --> programs/auction/src/lib.rs:76:1
   |
76 | #[account]
   | ^^^^^^^^^^ the trait `BorshDeserialize` is not implemented for `[(anchor_lang::prelude::Pubkey, u8); 1000000]`
   |
   = help: the following implementations were found:
             <[T; 0] as BorshDeserialize>
             <[T; 1024] as BorshDeserialize>
             <[T; 10] as BorshDeserialize>
             <[T; 11] as BorshDeserialize>
           and 36 others
   = note: required because of the requirements on the impl of `BorshDeserialize` for `GameState`
   = note: required by `anchor_lang::AnchorDeserialize::deserialize`
   = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0277`.
error: could not compile `auction`

Is there any way to get around this?

Prepare for stability (1.0 release)

It feels like the ecosystem is big enough that the last breaking release, 1.0 is due. I don't think the current APIs are good enough for 1.0 release though, so I expect we need at least one more 0.x release with substantional breakage. This issue collects all know future-compatibility problems, but isn't intended to be full stabilization check-list just yet.

  • Fix soundness (#19 and then audit the code)
  • Hide schema behind an schema cargo feature and future-proof it's API
  • rename schema cargo feature flag to unstable__schema
  • #45
  • Hide maybe_std module
  • Introduce #[doc(hidden)] pub __rt #[doc(hidden)] pub __private module with macro runtime for use by derives
  • add top-level io module.
    • I am not sure defining our own no-std IO is the right approach, but that's a big design discusison out of scope for the issue.
  • hide derive behind an optional feature-flag
  • deprecate BorshDeserialize::try_from_slice in favor of borsh::from_slice top-level function ( 😡 a lot of churn with this one)
  • clarify bounds on Rc/Arc impls we use something significantly funkier than what serde is doing
  • remove const-generic feature (it does nothing these days, and was only left to avoid a breaking change)
  • unsplit borsh-derive-internal
  • cleanup derive syntax to follow serde -- borsh(skip) rather bosh_skip
  • maybe forbid collections of ZSTs? #52
  • move docs from readme into lib.rs
  • setup merge queue for CI
  • #152
  • #46
  • #148
  • cut 1.0.0-alpha.1 release
  • test alpha release on nearcore and capture migration guide steps, create a draft PR (only merge after 1.0.0 release)
  • test alpha release on near-sdk-rs and capture migration guide steps, create a draft PR (only merge after 1.0.0 release)

bug: could not find `result` in `core`

After updating to near-sdk = 3.0.0-pre-release which uses borsh 0.8.1 the contract build generates the following error:

error[E0433]: failed to resolve: could not find `result` in `core`
   --> src/lib.rs:146:10
    |
146 | #[derive(BorshDeserialize, BorshSerialize, PanicOnDefault)]
    |          ^^^^^^^^^^^^^^^^
    |          |
    |          could not find `result` in `core`
    |          in this macro invocation
    |
   ::: /Users/ekwork/.cargo/registry/src/github.com-1ecc6299db9ec823/borsh-derive-0.8.1/src/lib.rs:34:1
    |
34  | pub fn borsh_deserialize(input: TokenStream) -> TokenStream {
    | ----------------------------------------------------------- in this expansion of `#[derive(BorshDeserialize)]`

error[E0433]: failed to resolve: could not find `result` in `core`
   --> src/lib.rs:146:28
    |
146 | #[derive(BorshDeserialize, BorshSerialize, PanicOnDefault)]
    |                            ^^^^^^^^^^^^^^
    |                            |
    |                            could not find `result` in `core`
    |                            in this macro invocation
    |
   ::: /Users/ekwork/.cargo/registry/src/github.com-1ecc6299db9ec823/borsh-derive-0.8.1/src/lib.rs:11:1
    |
11  | pub fn borsh_serialize(input: TokenStream) -> TokenStream {
    | --------------------------------------------------------- in this expansion of `#[derive(BorshSerialize)]`

error: aborting due to 2 previous errors

Repro:

git clone https://github.com/evgenykuzyakov/oysterpack-near-stake-token
cd oysterpack-near-stake-token
git checkout borsh-bug
cd contract
./build.sh

Possibly add option to specify the size of (or otherwise increase) enum variant representation?

I have plans to use borsh with obake, which allows effectively using borsh with a schema. However, obake's AnyVersion trait works by effectively representing the struct as an enum containing all possible versions. With different configurations, that could become many, many versions, and possibly more than 256 of them.

To test the borsh side of things, I created an enum with 299 variants, and as I expected, I got this message:
image

It makes sense as to why this is, because if enum variant counts were represented as a u16 by default, that would be a waste of space, and keeping it u8 but only setting it to u16 with more than 256 variants would likely be backwards-incompatible.

I was wonder if it was possible or worthwhile to add an attribute specifying that enum variants for a particular BorshSerialize/Deserialize should be represented with 2 bytes instead of 1?

Future proof schema API

Currently this is a public type:

#[derive(PartialEq, Debug, BorshSerialize, BorshDeserialize, BorshSchemaMacro)]
pub struct BorshSchemaContainer {
    /// Declaration of the type.
    pub declaration: Declaration,
    /// All definitions needed to deserialize the given type.
    pub definitions: HashMap<Declaration, Definition>,
}

This is ungreat -- all public fields make it impossible to make any kind of change to this structure. See #45 for example of the change we might want to do here (replacing HashMap with a BTreeMap to avoid hashbrown).

We should make it possible to change the internal representation without breaking public API, but that itself would necessitate a semver break.

Discussion: support on 3rd party containers

When implement borshSerialize on some wasmer cache object, there's some fields using cranelift_entity::{PrimaryMap, SecondaryMap} and indexmap::IndexMap, but they do not have BorshSerialize implemented, which make entire structs not derivable.
We also cannot implement it in borsh-rs repo, which would make borsh depends on these 3rd party crates (although we might consider optional dependency, it's still not good, because of 1. version of this libs, 2. hard to exhaust all 3rd party libs)
#8 is also an option, but not good for performance.
The best way I can think of is let these container library user implement helper functions, for example:

fn borsh_serialize_index_map<K: BorshSerialize, V:BroshSerialize, W: std::io::Write>(index_map: &IndexMap<K, V>, writer: &mut W) -> std::io::Result<()> {

fn borsh_deserialize_index_map<K: BorshSerialize, V:BroshSerialize>(buf: &mut &[u8]) -> std::io::Result<IndexMap<K, V>> {

and derive struct that contains these containers with a macro marker:

#[derive(BorshSerialize, BorshDeserialize)]
struct SomeStruct {
...
    #[borsh_ser(borsh_serialize_index_map)]
    #[borsh_deser(borsh_deserialize_index_map)]
    s: IndexMap<String, u32>,
...
}

Any better ways?

BorshDeserialize can cause UB by copying zero sized objects with no safe Copy impl

Repro:

mod singleton {
    use std::sync::atomic::{AtomicBool, Ordering};

    // Important: no Copy or Clone impl
    pub struct Handle(());

    impl Handle {
        pub fn get() -> Self {
            static EXISTS: AtomicBool = AtomicBool::new(false);
            if EXISTS.swap(true, Ordering::Relaxed) {
                panic!("one singleton handle has already been created");
            } else {
                Handle(())
            }
        }

        // Proof of exclusive access to a handle grants exclusive access to some
        // corresponding underlying state. It's guaranteed that at most one
        // handle exists in the entire program so this is sound.
        pub fn access(&mut self) -> &mut State {
            static mut STATE: State = None;
            unsafe { &mut STATE }
        }
    }

    type State = Option<String>;
}

use borsh::BorshDeserialize;
use std::io::Result;

struct Wrap(singleton::Handle);

impl BorshDeserialize for Wrap {
    fn deserialize(_buf: &mut &[u8]) -> Result<Self> {
        Ok(Wrap(singleton::Handle::get()))
    }
}

fn main() {
    let mut x = Vec::<Wrap>::deserialize(&mut &[2, 0, 0, 0][..]).unwrap();
    let (first, rest) = x.split_first_mut().unwrap();
    *first.0.access() = Some("...................".to_owned());
    let s = rest[0].0.access().as_mut().unwrap();
    *first.0.access() = None;
    println!("{}", s);
}

The result is segfault or other corruption:

$ cargo run --release
Segmentation fault (core dumped)

$ cargo run
thread 'main' panicked at 'failed printing to stdout: Bad address (os error 14)', library/std/src/io/stdio.rs:940:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Support deserializing as references to buffer

Was asked the question of why this deserialization didn't work and I always assumed it was because of an incompatible interface or to minimize code size, but after playing with it:

diff --git a/borsh-derive-internal/src/struct_de.rs b/borsh-derive-internal/src/struct_de.rs
index d26192d2..b5abb77a 100644
--- a/borsh-derive-internal/src/struct_de.rs
+++ b/borsh-derive-internal/src/struct_de.rs
@@ -34,7 +34,7 @@ pub fn struct_de(input: &ItemStruct, cratename: Ident) -> syn::Result<TokenStrea
                     );
 
                     quote! {
-                        #field_name: #cratename::BorshDeserialize::deserialize(buf)?,
+                        #field_name: #cratename::BorshDeserializeRef::deserialize_ref(buf)?,
                     }
                 };
                 body.extend(delta);
@@ -47,7 +47,7 @@ pub fn struct_de(input: &ItemStruct, cratename: Ident) -> syn::Result<TokenStrea
             let mut body = TokenStream2::new();
             for _ in 0..fields.unnamed.len() {
                 let delta = quote! {
-                    #cratename::BorshDeserialize::deserialize(buf)?,
+                    #cratename::BorshDeserializeRef::deserialize_ref(buf)?,
                 };
                 body.extend(delta);
             }
diff --git a/borsh/src/de/mod.rs b/borsh/src/de/mod.rs
index eedf6c87..00d3440a 100644
--- a/borsh/src/de/mod.rs
+++ b/borsh/src/de/mod.rs
@@ -59,6 +59,56 @@ pub trait BorshDeserialize: Sized {
     }
 }
 
+pub trait BorshDeserializeRef<'de>: Sized {
+    /// Deserializes this instance from a given slice of bytes.
+    /// Updates the buffer to point at the remaining bytes.
+    fn deserialize_ref(buf: &mut &'de [u8]) -> Result<Self>;
+
+    /// Deserialize this instance from a slice of bytes.
+    fn try_from_slice_ref(v: &'de [u8]) -> Result<Self> {
+        let mut v_mut = v;
+        let result = Self::deserialize_ref(&mut v_mut)?;
+        if !v_mut.is_empty() {
+            return Err(Error::new(ErrorKind::InvalidData, ERROR_NOT_ALL_BYTES_READ));
+        }
+        Ok(result)
+    }
+}
+
+impl<'de, T> BorshDeserializeRef<'de> for T
+where
+    T: BorshDeserialize,
+{
+    fn deserialize_ref<'m>(buf: &'m mut &'de [u8]) -> Result<Self> {
+        <T as BorshDeserialize>::deserialize(buf)
+    }
+}
+
+impl<'de> BorshDeserializeRef<'de> for &'de [u8] {
+    fn deserialize_ref<'m>(buf: &'m mut &'de [u8]) -> Result<Self> {
+        let len = u32::deserialize_ref(buf)?;
+        let len = len.try_into().map_err(|_| ErrorKind::InvalidInput)?;
+        if buf.len() < len {
+            return Err(Error::new(
+                ErrorKind::InvalidInput,
+                ERROR_UNEXPECTED_LENGTH_OF_INPUT,
+            ));
+        }
+        let (front, rest) = buf.split_at(len);
+        *buf = rest;
+        Ok(front)
+    }
+}
+
+impl<'de> BorshDeserializeRef<'de> for &'de str {
+    fn deserialize_ref<'m>(buf: &'m mut &'de [u8]) -> Result<Self> {
+        core::str::from_utf8(<&'de [u8]>::deserialize_ref(buf)?).map_err(|err| {
+            let msg = err.to_string();
+            Error::new(ErrorKind::InvalidData, msg)
+        })
+    }
+}
+
 impl BorshDeserialize for u8 {
     #[inline]
     fn deserialize(buf: &mut &[u8]) -> Result<Self> {
@@ -550,9 +600,10 @@ const _: () = {
 };
 
 #[cfg(feature = "const-generics")]
-impl<T, const N: usize> BorshDeserialize for [T; N]
+impl<'_de, T, const N: usize> BorshDeserialize for [T; N]
 where
-    T: BorshDeserialize + Default + Copy,
+// TODO yeah don't look at this yet
+    T: BorshDeserializeRef<'_de> + BorshDeserialize + Default + Copy,
 {
     #[inline]
     fn deserialize(buf: &mut &[u8]) -> Result<Self> {
@@ -579,10 +630,19 @@ macro_rules! impl_tuple {
       {
         #[inline]
         fn deserialize(buf: &mut &[u8]) -> Result<Self> {
-
             Ok(($($name::deserialize(buf)?,)+))
         }
       }
+
+      // TODO too lazy to resolve this with this setup
+    //   impl<'_de, $($name),+> $crate::BorshDeserializeRef<'_de> for ($($name),+)
+    //   where $($name: $crate::BorshDeserializeRef<'_de>,)+
+    //   {
+    //     #[inline]
+    //     fn deserialize_ref(buf: &mut &'_de [u8]) -> Result<Self> {
+    //         Ok(($($name::deserialize_ref(buf)?,)+))
+    //     }
+    //   }
     };
 }
 
diff --git a/borsh/src/lib.rs b/borsh/src/lib.rs
index 6a13270d..b1fe3eb8 100644
--- a/borsh/src/lib.rs
+++ b/borsh/src/lib.rs
@@ -10,7 +10,7 @@ pub mod schema;
 pub mod schema_helpers;
 pub mod ser;
 
-pub use de::BorshDeserialize;
+pub use de::{BorshDeserialize, BorshDeserializeRef};
 pub use schema::BorshSchema;
 pub use schema_helpers::{try_from_slice_with_schema, try_to_vec_with_schema};
 pub use ser::helpers::{to_vec, to_writer};

I don't see why this would be the case. These changes were me trying to have this change happen with a non-breaking change, but seems infeasible to handle all cases like this. Probably would need to follow the serde pattern of having BorshDeserialize and BorshDeserializeOwned: for<'de> Deserialize<'de>.

Although this would be a breaking change, would allow certain parts of code to be optimized to avoid unnecessary allocations/copies. Was this ever benchmarked that adding the lifetime increased code size or reduced performance? Seems like something that should be supported before 1.0

`BorshSerialize::serialize` method is unergonomic for mutable slices

Hello! First of all thanks for this library ❤️

Problem

BorshSerialize::serialize method signature is not very ergonomic when using mutable slices. This is because a &mut [T] already implements Write but the method expects a &mut W where W itself implements Write. This means that in order to pass a mutable slice, you need to pass a mutable reference to the mutable slice: &mut &mut [T].

Proposed Solution

Deprecate serialize and create a bincode inspired serialize_into method which looks like this: https://docs.rs/bincode/1.3.3/bincode/fn.serialize_into.html

However this is a 0.x library, so up to you 😉

BTreeMap serialization missing in 0.8.0 update

The ability to serialize BTreeMaps is suddenly missing in the new update, is this intentional?

https://docs.rs/borsh/0.7.1/src/borsh/ser/mod.rs.html#235-251 <- present
https://docs.rs/borsh/0.8.0/src/borsh/ser/mod.rs.html#235-251 <- missing

#[derive(BorshSerialize, BorshDeserialize)]
    |          ^^^^^^^^^^^^^^ the trait `BorshSerialize` is not implemented for `BTreeMap<std::string::String, Vec<Vec<u8>>>`
    |
    = help: see issue #48214
    = help: add `#![feature(trivial_bounds)]` to the crate attributes to enable
    = note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

For more information about this error, try `rustc --explain E0277`.
error: could not compile `test-program`

Confusing safety contract of is_u8

BorshSerialize and BorshDeserialize have an is_u8 trait method which is unsafe to call (but safe to implement). What contract is the caller required to uphold when calling is_u8 in order for the call to be sound?

Update CI to run for branches

Would be ideal to just use github actions to keep it consistent, as well as travis having some issues semi-recently.

This needs to be run for PR branches because something simple like: #47 can break compatibility.

FromIterator & IntoIterator types

I was just curious, is there any reason that Borsh does not have generic implementations of BorshSerialize and BorshDeserialize for all IntoIterator and FromIterator types respectively? Presumably specialisation would then allow overriding this.

UB in BorshSerialize with padding or uninitialized data

The following safe code exhibits UB by reading and dumping out uninitialized memory.

use borsh::BorshSerialize;
use std::io::Result;
use std::mem::MaybeUninit;

#[derive(Copy, Clone)]
struct B(MaybeUninit<u8>);

impl BorshSerialize for B {
    fn serialize<W>(&self, _writer: &mut W) -> Result<()> {
        unreachable!()
    }
    fn is_u8() -> bool {
        true
    }
}

fn main() {
    let x = [B(MaybeUninit::uninit()); 1024];
    println!("{:?}", String::from_utf8_lossy(&x.try_to_vec().unwrap()));
}
$ cargo run --release
"\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}00 103:02
 75240665                  /usr/lib/x86_64-linux-gnu/libgcc_s.so.1\n7f03f10ad000-7f03f10af
000 rw-p 00000000 00:00 0 \n7f03f10af000-7f03f10b0000 r--p 00000000 103:02 75236957       
           /usr/lib/x86_64-linux-gnu/ld-2.31.so\n7f03f10b0000-7f03f10d3000 r-xp 00001000 1
03:02 75236957                  /usr/lib/x86_64-linux-gnu/ld-2.31.so\n7f03f10d3000-7f03f10
db000 r--p 00024000 103:02 75236957                  /usr/lib/x86_64-linux-gnu/ld-2.31.so\
n7f03f10dc000-7f03f10dd000 r--p 0002c000 103:02 75236957                  /usr/lib/x86_64-
linux-gnu/ld-2.31.so\n7f03f10dd000-7f03f10de000 rw-p 0002d000 103:02 75236957             
     /usr/lib/x86_64-linux-gnu/ld-2.31.so\n7f03f10de000-7f03f10df000 rw-p 00000000 00:00 0
 \n7ffeb4418000-7ffeb443a000 rw-p 00000000 00:00 0                          [stack]\n7ffeb
4450000-7ffeb4453000 r--p 00000000 00:00 0                          [vvar]\n7ffeb4453000-7
ffeb4454000 r-xp 00000000 00:00 0                          [vdso]\nffffffffff600000-ffffff
ffff601000 --xp 0000"

Code size and dynamic dispatch

Not sure this is the best place to have this conversation so feel free to close the issue!

I've been working on reducing (compiled) code size for an API that uses borsh heavily. This API includes a bunch of functions that use BorshSerialize generics. Between monomorphized copypasta, bad inlining and the error handling code for all the Results returned by all the BorshSerialize::serialize() calls, these functions end up compiling down to a whole lot of code.

When I first noticed the problem I thought oh well it's a shame, but I guess I'll just make these functions use dynamic dispatch. Except BorshSerialize isn't object safe, duh. So, armed with sed and a healthy dose of frustration, I hacked up a BorshDynSerialize trait and derive that are exactly like BorshSerialize, but use dynamic dispatch for the writer so are object safe.

Switching my project from BorshSerialize generics to dynamic dispatch with BorshDynSerialize, cuts down .text size of about 50k (of 390k total). Ugh.

I realize this probably only matters for embedded and fringe programs that care about code size - but has this issue come up before? Has anyone thought about providing an API that is usable with dynamic dispatch?

Remove mandatory dependency on hashbrown

Currently borsh unconditionally pulls in hashbrown:

hashbrown = "0.9.1"

That's ungreat:

  • hash brown is a big dependency, so it imposes compilation time tax on consumers
  • hash brown is a public dependency, so it makes it harder to keep our own backwards compat promise
  • the particular version of hashbrown we use is quite old

cc #46

Unable to serialize after adding elements to array in struct

Hi, I'm new learning to use Borsh and I'm trying to use it to serialize and deserialize arrays. I see on the borsh.io website under specifications that dynamic sized arrays are supported, but I'm having trouble getting it to work. Below I modified the example given on the webpage, and you can see the outputs of the program in the following code section.

Main results:
I'm able to modify x, but when I try to add an element, no changes are reflected as seen in the second encoded_a output. I tried to deserialize the encoding again after serializing the changes, but it's giving me an error.

It would be great to know the correct way to serialize arrays after adding elements or alternative approaches. In any case, any help would be appreciated

use borsh::{BorshSerialize, BorshDeserialize};
use std::cell::{RefCell};

fn main() {
    #[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
    struct A {
        x: u8,
        y: Vec<String>,
    }

    let a = A {
        x: 1,
        y: vec!["cat".to_string()],
    };

    let encoded_a = RefCell::new(a.try_to_vec().unwrap());
    println!("Before: {:?}", encoded_a);

    let mut decoded_a = A::try_from_slice(&encoded_a.borrow()).unwrap();

    println!("Decoded Before:  {:?}", decoded_a);
    decoded_a.x = 2;
    decoded_a.y.push("dog".to_string());

    decoded_a.serialize(&mut &mut encoded_a.borrow_mut()[..]);

    println!("After:  {:?}", encoded_a);
    let mut decoded2_a = A::try_from_slice(&encoded_a.borrow());
    println!("Decoded After:  {:?}", decoded2_a);
}

Outputs:

Before: RefCell { value: [1, 1, 0, 0, 0, 3, 0, 0, 0, 99, 97, 116] }
Decoded Before:  A { x: 1, y: ["cat"] }
After:  RefCell { value: [2, 2, 0, 0, 0, 3, 0, 0, 0, 99, 97, 116] }
Decoded After:  Err(Custom { kind: InvalidInput, error: "Unexpected length of input" })

Consider implment borshSerialize/de for (serde::)Serialize/de

This gives a few benefits:

  • issue like #7 will work because serde gives handle recursive types support for free
  • serde as defacto standard, those 3rd party container libraries, like IndexMap, usually have serde::deserialize implemented for it. And we cannot impl borshSerialize for all 3rd party containers. If user would like to impl borshSerialize for IndexMap, they cannot do because both the trait and the type is not in their crate. They have to wrap it to struct MyIndexMap(IndexMap) and impl borshSerialize for MyIndexMap
  • issue like borsh#112 won't exist

Cons is with extra layer of Serde::Serialize trait, this may slowdown borsh, to be research

Add MSRV check to CI

Currently, changes can bump the MSRV without noticing. Would be good to have this documented and stable for #51.

Noticed from #104 bumping MSRV without any failing CI

can't build `borsh` without `std`

Building, borsh with not std fails, due to lack of alloc

> cargo build --no-default-features

error: cannot find macro `vec` in this scope
   --> borsh/src/de/mod.rs:283:30
    |
283 |             let mut result = vec![T::deserialize(buf)?];
    |                              ^^^
    |
    = note: consider importing one of these items:
            alloc::vec
            crate::maybestd::vec

error: could not compile `borsh` due to previous error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.