GithubHelp home page GithubHelp logo

make `ItemSliceSync` safe about gitoxide HOT 15 OPEN

martinvonz avatar martinvonz commented on July 30, 2024
make `ItemSliceSync` safe

from gitoxide.

Comments (15)

Manishearth avatar Manishearth commented on July 30, 2024 1

It's fine I figured it out, I've got a partial PR.

It wasn't that it's confusing, it's that the invariant has multiple caveats which is tricky for invariants of this form.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

Thanks for bringing this up!

I also would love to get rid of all unsafe. In this case, it's used to allow access to a size-optimized vec of items that form a tree, and we need to access it concurrently with each thread taking its own root node, and working its way through all the reachable leaves. The tree was built by decoding a pack whose entries refer to its bases only by OFS_DELTA - they are pointing backwards only which assures a correct tree. REF_DELTA entries aren't supported here, and cause immediate failure - they are expected to have been resolved before as part of the thin-pack handling. This is also the reason for this unsurmountable issue.

If we somehow would allow REF_DELTA entries to point to an in-pack object, then in theory malicious packs could cause all kinds of graphs as they can point anywhere in the pack, but they still can't link an entry to more than one base. And that's what one would really have to do for two threads to encounter the same child.

Thus I believe it's impossible for this data structure to end up in a place where it violates its assumption. Maybe that's wrong somehow, but I don't see it yet.

Maybe ItemSliceSync should be replaced by a list of RefCell to make it safe?

Even if the performance-drop when doing that would be negligible I'd think that the increase in memory consumption will not be that. Last time I eyeballed the memory consumption gitoxide came in with ~20% less than Git when dealing with the linux kernel pack, and that was hard-earned already.

Would you consider closing the issue until there is more evidence something needs to be done?

from gitoxide.

martinvonz avatar martinvonz commented on July 30, 2024

The tree was built by decoding a pack whose entries refer to its bases only by OFS_DELTA - they are pointing backwards only which assures a correct tree.

Ah, I see. I agree that it should be safe then.

Would you consider closing the issue until there is more evidence something needs to be done?

I'll defer that to @Manishearth who did the unsafe review of gix-pack this time. Manish, do you think a comment explaining why it's safe would be sufficient?

from gitoxide.

Manishearth avatar Manishearth commented on July 30, 2024

Would you consider closing the issue until there is more evidence something needs to be done?

As an unsafe reviewer the standard I tend to apply is that it should be very easy to verify that the invariant us upheld (not just a comment claiming it is). So I think what I would need is:

  • A comment and sufficent access control on the fields that have these invariants
  • (Ideally) methods that impact these fields are marked unsafe
  • Code which modifies these fields should justify that the invariants are upheld. This is easier if the methods that modify/construct the fields are all marked unsafe.

Unless the comment is sufficiently self-evident and does not require jumping around a lot of code to verify. Even then I'd like the code it references to also have comments to that effect.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

Thanks for chiming in - I like your standard as it's easy to follow and thus hopefully just as easy to apply.

Trying just that, I created #1233.

  • A comment and sufficent access control on the fields that have these invariants

Access control was improved by restricting it more.

  • (Ideally) methods that impact these fields are marked unsafe
  • Code which modifies these fields should justify that the invariants are upheld. This is easier if the methods that modify/construct the fields are all marked unsafe.

I didn't propagate-up the unsafe just yet, but reviewed and improved the safety comments. The reason it's safe lies in the way the data structure is built and traversed, and that I tried to make clearer.

It's interesting that the ItemSliceMut (and related types in gix-features) are only needed after a change in rustc which prevented the usage of raw pointers - previously this type would just have been a pointer access. I think this was done to enforce the declaration of Send and Sync accordingly, but it's just a guess. I don't know of this information could help to improve the documentation further, or the way it is used.

from gitoxide.

Manishearth avatar Manishearth commented on July 30, 2024

@Byron I think that's a start but still not easily reviewed.

I'm also not convinced on that Send and Sync bound, the safety comments on the Send and Sync bounds are definitely not correct at the very least.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

I think the main culprit is that ItemSliceMut is really a way to access T: Send items mutably from multiple threads, and there are no guardrails at all. There was a time when this was *const T and that was transmuted into *mut T as needed. That, however, stopped working probably because of the Send + Sync bounds, and ItemSliceSync was born.

By itself there is no way this can be considered safe, as it is only 'not unsafe' if its used correctly, and that usage is described in detail, but the code for that is in a couple of places and certainly not too easy to follow.

With that said, I don't know what else to do about this.

from gitoxide.

Manishearth avatar Manishearth commented on July 30, 2024

Yeah the main thing to be careful about is that &T is Send only if T is both Sync and Send, which is not reflected in those bounds.

a RefCell is Send, but not Sync, so an &RefCell is not Send

from gitoxide.

Byron avatar Byron commented on July 30, 2024

I wish this could be reflected somehow, after all the T that's actually stored is Send + Sync, and it was never the intention to allow T that are &Type or Type<'_>. Maybe T can be constrained to not be a reference somehow?

from gitoxide.

Manishearth avatar Manishearth commented on July 30, 2024

No, your type is logically an &T, it's not about whether T is an &Type, it's about whether it is RefCell. If your T is allowed to be RefCell your code is unsound.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

This definitely compiles even though it shouldn't:

diff --git a/gix-pack/src/cache/delta/traverse/util.rs b/gix-pack/src/cache/delta/traverse/util.rs
index 95789b50b..79602ada1 100644
--- a/gix-pack/src/cache/delta/traverse/util.rs
+++ b/gix-pack/src/cache/delta/traverse/util.rs
@@ -59,3 +59,18 @@ unsafe impl<T> Send for ItemSliceSync<'_, T> where T: Send {}
 //         `get_mut()`, we only ever access one T at a time.
 #[allow(unsafe_code)]
 unsafe impl<T> Sync for ItemSliceSync<'_, T> where T: Send {}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::cell::RefCell;
+
+    #[test]
+    fn type_cannot_be_refcell() {
+        type Test<'a> = ItemSliceSync<'a, RefCell<String>>;
+        let mut item = RefCell::new("hi".into());
+        let value = Test::new(std::slice::from_mut(&mut item));
+        fn use_it(_slice: &Test<'_>) {}
+        use_it(&value);
+    }
+}

To me it's a matter of unsafe abstraction - the way ItemSliceSync works is something that was added relatively recently and probably already was an improvement at the time as it assures as &mut [T] is presented, so Rusts lifetime logic will be intact until then, with all the unsafe usage being internal.

Now the question is how this "array with members that are (trust me) owned by a single thread at a time (but accessed by reference)" can be expressed so that ideally Rusts other safeguards aren't removed like what's done here.

from gitoxide.

Manishearth avatar Manishearth commented on July 30, 2024

Now the question is how this "array with members that are (trust me) owned by a single thread at a time (but accessed by reference)" can be expressed so that ideally Rusts other safeguards aren't removed like what's done here.

There are a couple patterns that help with this. If you're doing parallel tree traversal the best thing is tohave a little bit of unsafe in the splitting code (effectively a vector based split_at_mut() type of thing), and be extremely strict about ownership elsewhere.

The core problem here is that the tree is not structured in an ownership-based way, so you're having to stitch this together across different parts of the code.

That said the main invariant is on Item.children. Maybe I can submit a PR that makes that super clear.

from gitoxide.

Manishearth avatar Manishearth commented on July 30, 2024

Okay, the invariant is trickier because future_child_offsets is also weird.

from gitoxide.

Byron avatar Byron commented on July 30, 2024

I dug in and added this chunk of text. In summary, this is for handling a special case where ref-delta entries point to their base by id which is in the future of the current entry, not in its past. Thus these yet-to-be-seen parent objects are hooked up with their children when finishing the tree. Such packs exist only at rest and are untypical.
I hope that helps (and I am glad I added this documentation now).

diff --git a/gix-pack/src/cache/delta/mod.rs b/gix-pack/src/cache/delta/mod.rs
index 435008edf..c9d498ef1 100644
--- a/gix-pack/src/cache/delta/mod.rs
+++ b/gix-pack/src/cache/delta/mod.rs
@@ -53,6 +53,18 @@ pub struct Tree<T> {
     last_seen: Option<NodeKind>,
     /// Future child offsets, associating their offset into the pack with their index in the items array.
     /// (parent_offset, child_index)
+    /// In thin-packs, ref-deltas entries point to an object by its id. If this is a thin pack then this
+    /// would have been handled by [`LookupRefDeltaObjectsIter`](crate::data::input::LookupRefDeltaObjectsIter)
+    /// already.
+    /// Thus `child_index` points to a resolved ref-delta entry which pointed to an object within this pack
+    /// that wasn't seen yet, hence it points to a yet-to-be-seen pack entry.
+    /// And these we store and resolve later once we have seen all entries in the pack.
+    /// Note that such pack are atypical, as packs at rest should only ever use ofs-deltas to refer to parent entries.
+    /// In practice though, these packs with ref-delta entries exist at reset as well.
+    /// They are even produced by [Azure Git servers](https://github.com/Byron/gitoxide/issues/1025),
+    /// which is when we cannot handle them at all as we cannot know where an entry is in its pack by knowing its ID
+    /// without having an index yet, which we don't have as we are streaming the pack right now and thus try to produce
+    /// said index.
     future_child_offsets: Vec<(crate::data::Offset, usize)>,
 }
 

from gitoxide.

Manishearth avatar Manishearth commented on July 30, 2024

#1237

from gitoxide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.