GithubHelp home page GithubHelp logo

rudy's Introduction

Rudy

Rudy is a Judy array implementation in Rust. Judy arrays are highly efficient word-to-word or word-to-bool maps that adapt well to different data. The reference Judy array implementation provides a word to word map (JudyL), set of words (Judy1), string to word (JudySL) and fixed length byte array to word map (JudyHS). Judy arrays use a compressed 256-radix trie.

The initial Rudy implementation will implement JudyL as RudyMap and Judy1 as RudySet. Because zero sized types can be represented by a RudyMap, it will be trivial to represent RudySet as a wrapper around a RudyMap<T, ()>. Future iterations may include JudySL and JudyHS support.

Differences between Judy and Rudy

Rudy appears to be the first implementation to use generics in the core library. The judy-template bindings for C++ allow for automatic conversion to and from words, but not use of values that need larger storage. Using generics allows for lower memory usage for smaller types and the usage of larger types, with possible impact to performance.

Status

  • General library structure
  • Top-level root nodes
    • Leaf1
    • Leaf2
    • VecLeaf
  • JPM
    • Linear Leaf
    • Bitmap Leaf
    • Bitmap Branch
    • Linear Branch
    • Uncompressed Branch
  • Insertion
  • Get
  • Remove
  • Memory used
  • Shrink
  • Iterators

License

Rudy is dual licensed under the MIT and Apache-2.0 licenses.

rudy's People

Contributors

adevore avatar vks avatar wadelma avatar willglynn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rudy's Issues

Implement iteration

I'm in the middle of implementing iteration on a feature branch, iteration.

Replace nodrop with std::mem::ManuallyDrop

The 1.20 release of the Rust compiler stabilized ManuallyDrop. Switch LockstepArray and LeafBitmap to ManuallyDrop. Iteration will rely on the newly stabilized associated consts feature, so this will not lose additional backwards compatibility.

Tag releases

rudy-0.0.1 exists as a crate, and I think it might be cb50da5, but without a 0.0.1 tag I can't quickly jump to that point in the repo.

rudy should have tags for each release.

  • Identify and tag 0.0.1 retroactively
  • Decide on how to integrate tags into a release process (cargo release?)

Get working on stable

Currently there are three nightly features:

  • conservative_impl_trait: I planned on using this with iteration, but can switch to concrete types for now.
  • associated_consts: Stabilized in 1.20, can be removed entirely. Future experimentation with associated consts on traits should be done on a branch.
  • slice_patterns: This was only used as a shortcut because I was already using nightly. The slices patterns can be replaced with more verbose code.

Add a LICENSE file

It'd be nice to note the license in the README.md and Cargo.toml, too.

Implement FromIterator

Implement FromIterator for both RudyMap and RudySet. For RudyMap, pick a root type that is appropriate for the expected number of items to avoid reallocation.

Benchmarks vs. std

It would be nice to have benchmarks comparing the performance to the data structures in std.

Implement signed keys

Currently, only unsigned integers have the Key trait implemented. The problem with signed integers comes when trying to negative integers into a trie. Because the top bit is 1 in two's complement integers, negative numbers will come after positive numbers if simply transmuted. Instead, the numbers must be shifted in the Key trait implementation.

The feature/signed-keys branch contains an implementation that is verified to work for insert, get, and remove. However, because iteration is not implemented the order has not been verified. This is blocking on #14.

Concurrent get_mut()s of different keys?

specs has a DistinctStorage marker trait which guarantees that:

Multiple threads may call get_mut() with distinct indices without causing undefined behavior.

Now, yes: get_mut() itself takes a &mut self, and there's various barriers in the way of getting a &mut RudyMap to another thread, like how std::thread::spawn() needs 'static. There's even more barriers for getting a &mut RudyMap to multiple threads, since… well, the whole purpose of &muts is to be exclusive. Having said that, achieving parallelism by splitting the keyspace is useful in practice and can often be accommodated by generic containers even if the compiler doesn't understand, so the DistinctStorage trait here is unsafe and asks implementors to prove its safety.

As I understand it, safe concurrent get_mut() requires two things:

  1. An Option<&mut> returned by get_mut() of one key must not overlap with any Option<&mut> returned for any other key. That is, each key's &mut must refer to an entirely disjoint set of bytes.
  2. get_mut() on one key does not write to any memory location that is read by get_mut() on any other key.

The specs use case defers insertions/removals to outside the parallel section, so it really is just get_mut() that's under examination.

My read of the Rudy code is that get_mut() can never trigger any internal reorganization or anything like that, so any memory writes would be caused by the caller acting on the &mut element -- and that while keys have essentially shared storage, elements do not. RudyMap would therefore seem to satisfy the specs::DistinctStorage guarantee.

Do you agree with this analysis? If concurrent get_mut()s are in fact safe today, is this a contract Rudy is willing to uphold going forward?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.