GithubHelp home page GithubHelp logo

xacrimon / dashmap Goto Github PK

View Code? Open in Web Editor NEW
2.7K 28.0 140.0 624 KB

Blazing fast concurrent HashMap for Rust.

License: MIT License

Rust 100.00%
hashmap hashtable concurrent-programming concurrent concurrent-data-structure concurrent-map data-structures

dashmap's People

Contributors

ahornby avatar alainx277 avatar aminya avatar arnaz87 avatar arthurprs avatar bitwalker avatar bratsinot avatar cuviper avatar cyril-marpaud avatar dpbriggs avatar earthcomputer avatar ethanhs avatar jerrody avatar joshtriplett avatar kamilaborowska avatar kestrer avatar kixiron avatar leoleoasd avatar mscofield0 avatar pmarks avatar rustyyato avatar s-arash avatar sdragic avatar stepancheg avatar threated avatar tomkarw avatar typr124 avatar w1th0utnam3 avatar xacrimon avatar yaa110 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dashmap's Issues

Allow user to supply the hasher

std::HashMap defaults to SipHash for DoS resilience but allows passing a custom hasher via with_hasher and with_capacity_and_hasher methods. It would be very nice to allow the same in DashMap.

I understand it currently defaults to FxHash, so this is actually strictly required for DoS resilience.

Failed to work with async_trait

Not sure this is right place to file the issue, but async_trait works fine with std::collections::HashMap, so allow me to post here first. I tried to build following code

use std;
use async_trait::async_trait;
use dashmap::DashMap;
use std::collections::HashMap;
use tokio;

struct Foo {}
struct Gar<T: Bar> {
    map: DashMap<u32, T>,
}

#[async_trait]
trait Bar {
    async fn bar(&self) -> u32;
}

#[async_trait]
impl Bar for Foo {
    async fn bar(&self) -> u32{
        0
    }
}

#[async_trait]
trait GarTrait {
    async fn coo(&self) -> Vec<u32>;
}

#[async_trait]
impl<T> GarTrait for Gar<T> where T: Bar + Send + Sync,{
    async fn coo(&self) -> Vec<u32> {
        let mut a = Vec::new();
        for item in self.map.iter() {
            let bar= item.value();
            let i: u32 = bar.bar().await;
            a.push(i);
        }
        a
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>>{
    let g = Gar::<Foo>{map:DashMap::new()};
    g.coo().await;
    Ok(())
}

cargo build complaints error

$ cargo build
   Compiling t v0.1.0 (-)
warning: unused import: `std::collections::HashMap`
 --> src/main.rs:4:5
  |
4 | use std::collections::HashMap;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

error[E0311]: the parameter type `T` may not live long enough
  --> src/main.rs:31:37
   |
31 |       async fn coo(&self) -> Vec<u32> {
   |  _____________________________________^
32 | |         let mut a = Vec::new();
33 | |         for item in self.map.iter() {
34 | |             let bar= item.value();
...  |
38 | |         a
39 | |     }
   | |_____^
   |
   = help: consider adding an explicit lifetime bound for `T`
   = note: the parameter type `T` must be valid for any other region...
note: ...so that the type `T` will meet its required lifetime bounds
  --> src/main.rs:31:37
   |
31 |       async fn coo(&self) -> Vec<u32> {
   |  _____________________________________^
32 | |         let mut a = Vec::new();
33 | |         for item in self.map.iter() {
34 | |             let bar= item.value();
...  |
38 | |         a
39 | |     }
   | |_____^

error: aborting due to previous error

error: could not compile `t`.

To learn more, run the command again with --verbose.

Note: It's ok to compile when using self.map.get()

Cargo.toml

[dependencies]
tokio = { version = "*", features = ["full"] }
async-trait = "*"
dashmap = "*"

Experimental: Algorithmic improvements

More efficient probing with control bytes a la hashbrown.

Quadraratic probing + Robin Hood

Hopscotch

Fix load factor calculation error due to tombstones.

Eliminating tombstones in any operation if one of the following is met.

  • The following value in the table is empty.
  • Swap keys with tombstones preceeding it.

Compare hash codes before keys when probing.

Leapfrog probing.

Using the Default DashMap causes methods to panic

If you obtain the DashMap using DashMap::default, any method used will cause a panic.
DashMap::new works fine.

(I'm using v3.0.7)

Panic message

thread 'main' panicked at 'attempt to shift right with overflow'

How to recreate

use dashmap::DashMap;
 
fn main() {                                           
    let map: DashMap<usize, ()> = DashMap::default();
    map.get(&1);
}

Possible fix

Remove the derived Default implementation and replace it with a call to new. Or just remove it completely.

Use len within or_insert_with?

In the 1.x, the below worked well.

map.get_or_insert(&key, map.len() + 1)

But, in later versions, both 2.x and 3.x, the below just halts.

map.entry(key).or_insert_with(|| map.len() + 1)

Can you give me any ideas to resolve this issue?

Switch to SipHasher by default

The current default of FxHash is not DoS-resistant. It would be nice to follow the std::HashMap convention of having a secure default, especially if you let the user override it as needed (see #9).

Due to contention the performance hit of SipHash is less than it is on std::HashMap. I get somewhere between 36% and 25% hit to throughput on 4 threads, and it is rapidly decreasing as the thread count increases.

Turn dashmap into read-only mode?

I have a common pattern in a library where I first collect data into a hashmap (only writing) and then process it (only reading).

During the reading stage, the RwLock and Ref wrappers prevent me from storing raw references to the values in the DashMap in an auxiliary structure, even though this would be perfectly ok, because I don't want to perform any more writes.

For me, it would therefore be nice to have a functionality like RwLock's .into_inner() which moves data out of the lock. For DashMap it would basically turn it into a map without concurrent write support. Do you think that this would be a useful addition to the library? I could have a look at this when I have some free time.

DashSet does not implement Debug

After upgrading to DashMap 3.10.0 I tried to replace DashMap<X, ()> with DashSet.

It almost worked, except DashMap implements Debug and DashSet does not, so I could no longer #[derive(Debug)] on my struct containing the DashSet field.

Also, std::collections::HashSet implements Debug, so DashSet should too.

Add an async API

Add an async API that won't block, so dashmap can be used in async contexts.

HashSet abstraction

I've started playing around with DashMap, and I've already seen some impressive performance gains. One annoying thing that I have is that for a set, I've created the type alias:

type DashSet<T> = DashMap<T, ()>

Which is fine, but the iterators behave weirdly, since the Ref dereferences by default to .value(), so now I always have to explicitly call .key() on the iterator item.

It would be nice to have a DashSet that fixes this syntactic quirk.

Open issues for known bugs

I'm using V1 and am intending to use it in production. You've mentioned on a few issues that you found several bugs with V2, and your attention seems to have shifted to https://github.com/ixy-db. Would you mind creating issues here for the known bugs to enable someone else to picking them up?

#[derive(Default)] for DashMap mandates that keys/values be Default

I'm upgrading DashMap in my project (from 1.2.0 to 2.1.0), and noticed that a new requirement got added that is a bit difficult for me to implement: the Default impl for DashMap now requires that <K: Default, V: Default> also.

I think this might be unintentional - I don't see a reason why ::new() shouldn't require Default-ness on Keys and the Default impl would.

use of unstable library feature 'map_get_key_value'

how to use in lib and unit test

error[E0658]: use of unstable library feature 'map_get_key_value'
--> /Users/snlan/.cargo/registry/src/github.com-1ecc6299db9ec823/dashmap-3.7.0/src/lib.rs:609:37
|
609 | if let Some((k, v)) = shard.get_key_value(key) {
| ^^^^^^^^^^^^^
|
= note: for more information, see rust-lang/rust#49347

rustc 1.39.0 (4560ea788 2019-11-04)
binary: rustc
commit-hash: 4560ea788cb760f0a34127156c78e2552949f734
commit-date: 2019-11-04
host: x86_64-apple-darwin
release: 1.39.0
LLVM version: 9.0

DashMap shard number conflicts with HashBrown tag

std::collections::HashMap's HashBrown algorithm uses the high 7 bits of the hash as its tag for SIMD comparisons: https://github.com/rust-lang/hashbrown/blob/f7bb664f41b1c74d2c5dcbab55727227b9e8d13a/src/raw/mod.rs#L129

DashMap also uses those same high bits of the hash to pick its shard:

hash >> self.shift

This means that, since there are 4 shards per core, if you have 32 or more cores, every SIMD tag in the same shard will be equal, ruining the effectiveness (but not correctness) of HashBrown's parallel tag match algorithm. Even with fewer cores, the effectiveness will be substantially reduced.

The easiest fix is probably just to do:

// Leave the high 7 bits for the HashBrown SIMD tag.
let shard = (hash << 7) >> shift;

Alternatively, multiply is slower but does a good job of mixing low bits into high bits, so something like:

// Multiply by golden ratio (as a usize) to mix into the high bits.
let golden_ratio = (0x9e3779b97f4a7c15u64 >> (64 - util::ptr_size_bits())) as usize;
let shard = hash.wrapping_mul(golden_ratio) >> shift;

This is a few cycles slower than the << 7, but has the advantage of not magically "knowing about" the underlying HashMap's 7 bit tags. It will also choose a good shard even for a poor hash function that leaves the high bits zero, like certain uses of nohash-hasher, although of course those would suffer from poor SIMD tags anyway.

Perf improvement: pre-compute shift

Looking here:

let shift = util::ptr_size_bits() - self.ncb;

Everytime a get or a put is done there is a subtraction performed. However this value doesn't usually change between calls. It could be avoided by making the result of the operation a member instead of ncb.

Lost updates with `Entry::or_*` methods

Hi. I was wondering if Entry::or_* methods (or_default, or_insert and or_insert_with) can result in lost updates. For example, if we have let map: DashMap<String, AtomicU64> = DashMap::new();, and then two threads execute the following code:

let value = map.entry(key).or_insert(AtomicU64::new(0));
value.fetch_add(1, Ordering::Relaxed);

Is it possible that the final value of the atomic is 1? I was thinking that it might be possible if both threads can see that key does not exist and both or_insert, resulting in the lost of one of the fetch_add.

After writing this, I think I have a more clear question: is it possible that or_* methods insert in an Entry::Occupied?

Thanks.

Segfault on 32-bit platforms

Currently on 32-bit platforms, DashMap causes a debug_assert failure/segfault in DashMap::_yield_read_shard and DashMap::_yield_write_shard. This is due to it being passed a garbage i, which it gets from determine_map. Currently, the hashing function used in determine_map is fxhash::hash64, which returns a u64. When that hash is later shifted, it's shifted based on the platform's pointer size. On 32bit platforms, this doesn't shift enough and causes the value returned from determine_map to be well out of bounds.

Changing the hash function from hash64 to just hash (which returns a usize instead of a u64) seems to work for me so far.

For reference, I'm using this on a Raspberry Pi 2 Model B which uses the 32bit armv7-unknown-linux-gnueabihf target.

Implement clone_from on DashMap

The Clone trait has an optional clone_from(&mut self, source: &Self) method. It would be good to manually implement this method so that users can avoid some extra allocations/deallocations when they need to clone_from.

Avoid double hashing

Right now the code hashes incoming keys twice. Once to select which inner map to use, and then internally by the map.

This could be optimized by trading off a minor amount of memory.
In the inner map instead of inserting the K directly, it could be a:
HashMap<Key<K>, V> where Key is defined as

struct Key {
   hashCode: u64,
   userKey, K,
}

The inner maps could be set to use a Hasher which is designed to operate exclusively with this type, and which returns the hashCode as the hash every time. (Eq would be defined via the userKey only.)

Then the outer hasher could first select which map to send the data into (probably not using the lower bits of the hash to avoid collisions).

Add or_try_insert_with method in Entry

Currently, the Entry struct has a or_insert_with method that accepts a callback to provide a default value; anyway, there is no way to pass a fallible closure.
So, I wonder if it is possible to add the following method to the Entry struct:

#[inline]
pub fn or_try_insert_with<E>(self, value: impl FnOnce() -> Result<V, E>) -> Result<RefMut<'a, K, V, S>, E> {
    match self {
        Entry::Occupied(entry) => Ok(entry.into_ref()),
        Entry::Vacant(entry) => Ok(entry.insert(value()?)),
    }
}

This would allow this use case where dashmap is used as a local cache for value loaded, for example, from a remote storage:

let my_value = my_dashmap
     .entry(&"hello")
     .or_try_insert_with(fetch_from_db("hello"))
     .unwrap();

compatibility with serde

Hi, id like to make a feature request. I am currently replacing a RwLock<HashMap> and it works really good so far and the fact that i do not need to acquire a lock manually is really nice. The biggest inconvenience i have within my current code base is that i frequently use serde to (de)serialize the std::collections::HashMap. I think having this as an optional feature would add value to your implementation.

Undefined Behavior in `util::to_mut`

You can never transmute a &T to a &mut T in any way. This is instant undefined behavior.

Currently there is no fast way to get a (&K, &mut V) from the standard hashmap, so your out of luck it seems, but there is a way to fix this! Instead of storing HashMap<K, V>, store HashMap<K, UnsafeCell<V>>. This way you can soundly get a &mut V from get_key_value (so long as you are also holding a write shard).

Improve hasher perf with aHash

DashMap puts great effort into performance and aims to be as fast as possible. If you have any suggestions or tips do not hesitate to open an issue or a PR.

Using aHash will both improve performance and resolve the DOS issue #12 at the same time.

Blocking after get_mut when reaccessing key

Hi,

This might not be the place for this kind of question.
I'm curious about this behaviour where my program locks up when I reaccess the map using a key that was borrowed as mutable.
My code is here: https://gist.github.com/rust-play/26c34df4f98d5ce6b34e050763092215

In this case, only the first line is printed. After, the program still keeps using a single thread at 100% (and multiple threads when using tokio async).

Is this expected behaviour? Because the compiler does not complain.
If it is, are there ways to achieve what I'm trying to do? Perhaps using shards and awaiting/manually releasing the lock.

Thanks.

Ability to take all key value pairs out of a DashMap

I'm currently receiving a DashMap from some code external to mine and do not need the concurrency. To eliminate the performance overhead and added complexity the concurrent api brings, I would like to be able to take out all the data and store it in a HashMap (I am doing many many lookups into the map).

Previously when they used a CHashMap I was able to do the following:

let map: HashMap<Key, Value> = c_hash_map.into_iter().collect();

It would be nice to have a similar API or perhaps a single .take() method.

Would this change be possible? Right now I am cloning all the pairs out of the DashMap, which is not ideal.

Thanks!

Plans for V4

Hi!

First of all, congratulations for this fantastic piece of work. I'm using it in an actor system and it is the main corner stone for it.

I was reading that you want to build a V4, what the changes you have in mind? Would you accept help? If yes, I would like to see if there is anything that I can help.

Btw, if this is not your preferred channel for this kind of questions, let me know.

Best! ๐Ÿ˜„

Debug assertion failed

Running my tests that create objects with Dashmaps inside them panics:

thread 'tests::test_simplest_struct_actor' panicked at 'assertion failed: i < self.shards.len()', C:\Users\Khionu\.cargo\registry\src\github.com-1ecc6299db9ec823\dashmap-2.1.1\src\lib.rs:440:9

Document single-threaded deadlock behavior

This code:

let map = DashMap::<i32, i32>::new();
map.insert(1, 2);
let a = map.get_mut(&1);
let b = map.get(&1);

will deadlock. The shard containing 1 gets write-locked, nothing else can access it. This is reasonable behavior but I think it should be documented.

Perhaps more surprising is that this code:

let map = DashMap::<i32, i32>::new();

for i in 0..100 {
    map.insert(i, 0);
}

let mut refs = vec![];
for i in 0..100 {
    refs.push(map.get_mut(&i))
}

Will also deadlock. Even though you're not getting the same slot twice, you're still trying to lock certain chunks twice, since some slots are colocated. This isn't very realistic code though. Something less artificial might look like:

let a = map.get_mut(a);
let b = map.get_mut(b);

This code can still deadlock! But it's more sporadic. It only deadlocks with probability 1/shard_count^2; if the two entries happen to be colocated.

Basically, if your thread is holding a RefMut, it's not deadlock-safe to do any other operations to the map. This behavior surprised me a little, so I think it would be worth documenting.

Entry API: automatically clone/copy?

I'm trying out the new entry API and it's really quite nice to use - it's a great ergonomics improvement! One thing that I notice though is that now, the only way I can tell of ensuring a key is present on the map is to get an owned key (via copy or clone), even if it should already be in the map (or if it might have gotten added in a parallel routine).

As far as I can tell, the most ergonomic way to adjust a key's value is something like this, which always clones (as opposed to a solution that probes via get first, then uses entries to initialize the entry without a race condition):

fn some_work(key: &K) {
    // key gets cloned for every access:
    let entry = self.entry(key.clone).or_default();
    return (*entry).do_the_work();
}

Now, when K should be Clone or Copy, I imagine something like this would be cool:

fn some_work<K: Clone>(key: &K) {
    let entry = self.entry_cloning(key).or_default();
    return (*entry).do_the_work();
}

...where .entry_cloning returns a thing that's like Entry, except in or_default (and the other methods), it probes with a reference to the key first, then only if the key shouldn't be present, clones in order to insert it. (And same for .entry_copying...)

Does this make sense?

Alternatively, I'd like to understand how .insert behaves if the key being inserted is found - does it return the value that was present, does it overwrite the value? If there was a way for my code to ensure a value for the key is present (without overwriting) in the manual way, I'll take it too! (:

No Debug impl

Like the standard HashMap, DashMap should implement Debug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.