GithubHelp home page GithubHelp logo

abomonation's People

Contributors

antiguru avatar frankmcsherry avatar milibopp avatar sdht0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abomonation's Issues

Is inline_always really needed?

Inconsiderate use of inline_always can result in code bloat (which, if taken too far, leads to L1i cache misses) and increased compilation times.

The inlining heuristics of rustc normally aren't too bad, so it might be worthwhile to investigate how many of these annotations can be replaced with plain inline or removed entirely without incurring a significant performance cost.

Implementations for standard library types

By comparison, serde implements its Serialize trait for a lot more types than abomonation does. It might be a nice idea to gradually add more types from std to avoid ergonomic issues caused by missing implementations.

To get started, I added a trivial implementation for PhantomData in #4, because I think it will help make integrating nalgebra and abomonation in dimforge/nalgebra#277 more ergonomic. Other types will likely be more work ;)

Shouldn't entomb and encode take a self rather than a &self?

In Rust terms, abomonation serialization is effectively a sophisticated move. Therefore, there is no technical reason why it shouldn't be possible to abomonate types which are movable but not clonable, such as Box<T> where T: !Clone.

However, the entomb operation, and its encode higher-level cousin take their input object by shared reference. This makes it impossible to correctly implement the Abomonation trait for non-copyable types, which needlessly restricts its applicability.

Please consider modifying this interface to take input objects by value instead.

Consider taking writers by value

&mut T where T: Write implements Write, so there is no reason to ask for a borrowed writer in the API. But this is unlikely to be a problem in practice because owned writers are rarely used.

A possible path forward for padding bytes

So, I've had a quick chat with @RalfJung about our padding bytes problem, and I think I now get a decent grasp of what we need in order to resolve that particular UB in abomonation.

Padding bytes are uninitialized memory, and we now have a safe way to model that in Rust, in the form of MaybeUninit. So we can take a first step towards handling them correctly today by casting &[T] into &[MaybeUninit<u8>] instead of &[u8].

This is enough to memcpy the bytes into another &mut [MaybeUninit<u8>] slice. But it's not yet enough to expose our unintialized bytes to the outside world, e.g. for the purpose of sending them to Write in encode() and entomb(), because Write wants initialized bytes, not possibly uninitialized ones.

To resolve this, we need another language functionality, which is not available yet but frequently requested from the UCG: the freeze() operation, a tool which can turn MaybeUninit<u8> into a nondeterministic valid u8 value. You can think of it as a way to opt out of the UB of reading bad data and defer to hardware "whatever was lying around at that memory address" behavior.

IIUC, something like that was proposed a long time ago, but it was initially rejected by security-conscious people on the ground that it could be used to observe the value of uninitialized memory coming from malloc(), which may leak sensitive information like cryptographic secrets which a process forgot to volatile-erase before calling free().

That precaution is commendable, but on the other hand, giving the growing body of evidence that an UB-free way to access specific regions of memory is needed for many use cases (from IPC with untrusted processes to implementation of certain low-overhead thread synchronization algorithms like seqlock), I'm hopeful that we're likely to get something like that in Rust eventually (and I will in fact take steps to make this discussion move forward once I'm done with my current UCG effort).

TL;DR: For now, this is blocked on a missing Rust feature, but the issue seems understood and is likely to be eventually resolved.

Can the pointer alignment situation be improved?

As the docs say, abomonation currently doesn't guarantee correct pointer alignment. This is pretty dangerous, even on x86 as rustc might be tempted to generate those evil SIMD instructions that assume the data is aligned and raise an exception otherwise someday.

I wonder if there is an API tweak we could use to improve upon this situation?

Define a framing protocol

(from a chat with @frankmcsherry:)

This should allow writing to files and sockets, while accomodating multiple use cases (possibly with different framing structures).

Perhaps Abomonation should have read and write methods for readers and writers, and it does the framing for you and doesn't give the choice of forgetting.

Things to keep in mind:

  • If each abomonated object is prepended by a length marker: "the length is also handy in that it lets you zip through an array faster (moving from object to object, rather than deserializing each to determine the length).

A Framed struct:

A struct Framed<T: Abomonation> { len: usize, data: T } which then implements Abomonation, but in a magical special way where maybe (i) len is written as part of abomonation, or maybe (ii) len is computed by fake serialization (relatively cheap, without traversing all the data).
This has other advantages, like allocating enough memory to write T rather than repeatedly growing / copying the Vec

  • More complex headers may be folded into the abomonated T? For example, what should we do with the message headers in timely_communication?

NonZeroI16 is nightly only

This breaks abomonation, and, because differential depends on abomonation 0.7.*, by transitivity breaks differential as well:

impl Abomonation for NonZeroI16 { }

The problem is that NonZeroI16 and related types are deprecated after rustc 1.26 and also marked nightly only for some reason.

Is it okay to implement Abomonation for both T and &T?

From the point of view of abomonation's core semantics, there is nothing wrong with providing implementations of the Abomonation trait for both a type T and a reference to it &T. Basically, the implementation for &T works exactly like that of Box<T> in abomonation's current master branch.

Such implementations would be useful for high-level users of Abomonation, who stick with derives, encode, decode and measure, because they would allow safely auto-deriving Abomonation for more types. Something which, as a matter of fact, I ended up wanting for my project.

However, and that's the reason why I'm opening this issue before submitting my changes as a PR, there is also a significant ergonomic cost to doing so for any low-level user of Abomonation who calls into the trait directly.

If Abomonation is implemented for both T and &T, then anyone who uses the Abomonation trait directly instead of going through encode, decode and measure must be super-careful, because method call syntax of the x.extent() kind becomes a deadly trap that can very easily trigger the wrong Abomonation impl through auto-Deref magic.

Instead, one must get into the habit of only calling Abomonation trait method via U::extent(&x) syntax, or if type U is unknown go for the somewhat less safe compromise of Abomonation::extent(&x).

Is this a trade-off that we can tolerate for the sake of having more Abomonation impls?

Should Abomonated use StableDeref?

Abomonated is not the only Rust abstraction that relies on slices of bytes Deref-ing into the same location even after a move. Pretty much every attempt at building self-referential types in Rust (which we're kinda doing inside of Abomonated) needs this property. As a result, someone has built the nice stable_deref_trait crate, which provides a trait for exactly this purpose.

We could reduce the unsafety of Abomonated<T, S>::new by requiring S: StableDeref. Unfortunately, we couldn't completely remove the unsafety in this way, because there is still the shared mutability issue to take care of. A long time ago, Rust had a nice Freeze trait for this, but that trait is now gone from the stable subset of the language and there is no sign of it coming back anytime soon. Still, I think partially removing the unsafety is worthwhile.

Using StableDeref would add an extra crate to abomonation's dependency list, but that crate is very small so I don't think it's a big issue.

Shouldn't exhume take a NonNull<Self> rather than a &mut Self?

Rust references allow the compiler to assume that the data behind them is valid. One way in which rustc currently uses this is to tag the associated pointers with LLVM's dereferencable attibute, which allows the latter to prefetch from them to its heart's content. This kind of smart optimization should not be allowed before objects are exhumed, as it can lead to undefined behavior like LLVM following dangling pointers and segfaulting the program.

Therefore, I think exhume should not take its target object as a Rust reference, but as a NonNull pointer, which provides no guarantee of target data validity to rustc and therefore doesn't allow the compiler to muck around with it.

potential issue with unsafe_abomonate! on latest version

Code very similar to the following works with Abomonation version 0.4.5 but not with 0.5:

#[macro_use]
extern crate abomonation;
extern crate timely;

use timely::dataflow::InputHandle;
use abomonation::Abomonation;

pub struct Foo {
  x: Vec<u8>,
  y: Vec<u8>,
}

pub struct Bar {
  z: Vec<u8>,
}

pub struct Baz {
  foo: Foo,
  bar: Bar,
}

unsafe_abomonate!(Foo: x, y);
unsafe_abomonate!(Bar: z);
unsafe_abomonate!(Baz: foo, bar);

fn main() {
  let input: InputHandle<u64, Baz> = InputHandle::new();
  // do other stuff
}

In version 0.5 I get the error:
"the trait bound Baz: abomonation::Abomonation is not satisfied
the trait abomonation::Abomonation is not implemented for Baz.

note: required because of the requirements of the impl of Timely::Data for Baz
note: required by timely::dataflow::Handle"

Did anything change in the way unsafe_abomonate work? I noticed you removed a generics parameter, but I'm not sure if that has any effect here.

Is it a good idea to implement Abomonation for non-abomonable PhantomData?

So, while resolving the memory safety issue of #28 that you pointed out in #27, I had a pause while reaching the implementation of Abomonation for PhantomData.

Currently, abomonation provides an impl of Abomonation for PhantomData<T> even if T is not abomonable. This is by design, as there is a test checking that this impl is available. And it is certainly technically correct to the first order of approximation: since PhantomData contains no data, it is trivially serializable.

Where I get uneasy, though, is when I consider how PhantomData<T> is typically used. By and large, the main use for this marker type in the wild is in container classes like Box and Vec, where you get types which only hold a *mut T, NonNull<T>, or index into some kind of arena of T, but need to tell rustc that they "logically own" one or more Ts, so that Send, Sync, Drop and other stuff that gets automatically implemented works as expected.

From this perspective, if a type contains a PhantomData<T>, it should almost certainly be regarded as containing a T by abomonation too. In which case we should require that this T be abomonable.

What do you think about this train of thought?

Should abomonation start using trybuild tests?

Resolving #27 entailed walking on some razor blades to figure out the right set of lifetime constraints needed to allow deserializing references, without allowing invalid deserializations (like deserializing a fake &'static T from stack-allocated data).

Given that someone (maybe you, maybe I) may need to touch that code again in the future, and that it is easy to get wrong, I would sleep better at night if I could add some compilation failure tests to #28 in order to make sure that some classic invalid reference deserialization examples will continue to refuse to compile in the future.

Unfortunately, rust does not have a nice built-in mechanism for that sort of tests, but someone has suggested using the trybuild crate for this purpose.

The two drawbacks are that 1/it's one more dependency and 2/since it's based on parsing rustc output, which is not subjected to any stability guarantee, those tests are likely to require occasional maintenance the future so that they keep working on new rustc versions.

Sanitize addresses in serialized data.

As of 0.5 Abomonation doesn't automatically sanitize addresses in serialized data. This is mainly due to requiring random access to the post-serialized data, which means (roughly) a &mut [u8] interface to the written data, and not all W: Write provide this.

Instead, we could add back something like

pub fn sanitize<T: Abomonation>(bytes: &mut [u8])

which would treat bytes as a &T and erase the associated memory-address holding fields.

I'm not 100% certain what the right way to erase the fields is, as the packing of exciting discriminant information into such fields is recent sport for the Rust folks. It could just be pushing a 0x01 in there (what is used to be), but this could change at a moment's notice, I would guess.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.