xacrimon / dashmap Goto Github PK
View Code? Open in Web Editor NEWBlazing fast concurrent HashMap for Rust.
License: MIT License
Blazing fast concurrent HashMap for Rust.
License: MIT License
std::HashMap
defaults to SipHash for DoS resilience but allows passing a custom hasher via with_hasher and with_capacity_and_hasher methods. It would be very nice to allow the same in DashMap.
I understand it currently defaults to FxHash, so this is actually strictly required for DoS resilience.
Crate: spin
Title: spin is no longer actively maintained
Date: 2019-11-21
URL: https://rustsec.org/advisories/RUSTSEC-2019-0031
spin
got added as a dependency in d20bc47.
Unfortunately, the commit does not explain why the recommended crate is being replaced here.
Not sure this is right place to file the issue, but async_trait works fine with std::collections::HashMap, so allow me to post here first. I tried to build following code
use std;
use async_trait::async_trait;
use dashmap::DashMap;
use std::collections::HashMap;
use tokio;
struct Foo {}
struct Gar<T: Bar> {
map: DashMap<u32, T>,
}
#[async_trait]
trait Bar {
async fn bar(&self) -> u32;
}
#[async_trait]
impl Bar for Foo {
async fn bar(&self) -> u32{
0
}
}
#[async_trait]
trait GarTrait {
async fn coo(&self) -> Vec<u32>;
}
#[async_trait]
impl<T> GarTrait for Gar<T> where T: Bar + Send + Sync,{
async fn coo(&self) -> Vec<u32> {
let mut a = Vec::new();
for item in self.map.iter() {
let bar= item.value();
let i: u32 = bar.bar().await;
a.push(i);
}
a
}
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>>{
let g = Gar::<Foo>{map:DashMap::new()};
g.coo().await;
Ok(())
}
cargo build
complaints error
$ cargo build
Compiling t v0.1.0 (-)
warning: unused import: `std::collections::HashMap`
--> src/main.rs:4:5
|
4 | use std::collections::HashMap;
| ^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
error[E0311]: the parameter type `T` may not live long enough
--> src/main.rs:31:37
|
31 | async fn coo(&self) -> Vec<u32> {
| _____________________________________^
32 | | let mut a = Vec::new();
33 | | for item in self.map.iter() {
34 | | let bar= item.value();
... |
38 | | a
39 | | }
| |_____^
|
= help: consider adding an explicit lifetime bound for `T`
= note: the parameter type `T` must be valid for any other region...
note: ...so that the type `T` will meet its required lifetime bounds
--> src/main.rs:31:37
|
31 | async fn coo(&self) -> Vec<u32> {
| _____________________________________^
32 | | let mut a = Vec::new();
33 | | for item in self.map.iter() {
34 | | let bar= item.value();
... |
38 | | a
39 | | }
| |_____^
error: aborting due to previous error
error: could not compile `t`.
To learn more, run the command again with --verbose.
Note: It's ok to compile when using self.map.get()
Cargo.toml
[dependencies]
tokio = { version = "*", features = ["full"] }
async-trait = "*"
dashmap = "*"
More efficient probing with control bytes a la hashbrown.
Quadraratic probing + Robin Hood
Hopscotch
Fix load factor calculation error due to tombstones.
Eliminating tombstones in any operation if one of the following is met.
Compare hash codes before keys when probing.
Leapfrog probing.
If you obtain the DashMap using DashMap::default
, any method used will cause a panic.
DashMap::new
works fine.
(I'm using v3.0.7)
thread 'main' panicked at 'attempt to shift right with overflow'
use dashmap::DashMap;
fn main() {
let map: DashMap<usize, ()> = DashMap::default();
map.get(&1);
}
Remove the derived Default implementation and replace it with a call to new
. Or just remove it completely.
In the 1.x, the below worked well.
map.get_or_insert(&key, map.len() + 1)
But, in later versions, both 2.x and 3.x, the below just halts.
map.entry(key).or_insert_with(|| map.len() + 1)
Can you give me any ideas to resolve this issue?
The current default of FxHash is not DoS-resistant. It would be nice to follow the std::HashMap
convention of having a secure default, especially if you let the user override it as needed (see #9).
Due to contention the performance hit of SipHash is less than it is on std::HashMap. I get somewhere between 36% and 25% hit to throughput on 4 threads, and it is rapidly decreasing as the thread count increases.
I have a common pattern in a library where I first collect data into a hashmap (only writing) and then process it (only reading).
During the reading stage, the RwLock
and Ref
wrappers prevent me from storing raw references to the values in the DashMap
in an auxiliary structure, even though this would be perfectly ok, because I don't want to perform any more writes.
For me, it would therefore be nice to have a functionality like RwLock
's .into_inner()
which moves data out of the lock. For DashMap
it would basically turn it into a map without concurrent write support. Do you think that this would be a useful addition to the library? I could have a look at this when I have some free time.
After upgrading to DashMap 3.10.0 I tried to replace DashMap<X, ()>
with DashSet
.
It almost worked, except DashMap
implements Debug
and DashSet
does not, so I could no longer #[derive(Debug)]
on my struct containing the DashSet
field.
Also, std::collections::HashSet
implements Debug
, so DashSet
should too.
update if the key exists or insert
Add an async API that won't block, so dashmap can be used in async contexts.
I've started playing around with DashMap, and I've already seen some impressive performance gains. One annoying thing that I have is that for a set, I've created the type alias:
type DashSet<T> = DashMap<T, ()>
Which is fine, but the iterators behave weirdly, since the Ref
dereferences by default to .value()
, so now I always have to explicitly call .key()
on the iterator item.
It would be nice to have a DashSet that fixes this syntactic quirk.
I'm using V1 and am intending to use it in production. You've mentioned on a few issues that you found several bugs with V2, and your attention seems to have shifted to https://github.com/ixy-db. Would you mind creating issues here for the known bugs to enable someone else to picking them up?
I'm upgrading DashMap in my project (from 1.2.0 to 2.1.0), and noticed that a new requirement got added that is a bit difficult for me to implement: the Default
impl for DashMap
now requires that <K: Default, V: Default>
also.
I think this might be unintentional - I don't see a reason why ::new()
shouldn't require Default-ness on Keys and the Default
impl would.
how to use in lib and unit test
error[E0658]: use of unstable library feature 'map_get_key_value'
--> /Users/snlan/.cargo/registry/src/github.com-1ecc6299db9ec823/dashmap-3.7.0/src/lib.rs:609:37
|
609 | if let Some((k, v)) = shard.get_key_value(key) {
| ^^^^^^^^^^^^^
|
= note: for more information, see rust-lang/rust#49347
rustc 1.39.0 (4560ea788 2019-11-04)
binary: rustc
commit-hash: 4560ea788cb760f0a34127156c78e2552949f734
commit-date: 2019-11-04
host: x86_64-apple-darwin
release: 1.39.0
LLVM version: 9.0
std::collections::HashMap's HashBrown algorithm uses the high 7 bits of the hash as its tag for SIMD comparisons: https://github.com/rust-lang/hashbrown/blob/f7bb664f41b1c74d2c5dcbab55727227b9e8d13a/src/raw/mod.rs#L129
DashMap also uses those same high bits of the hash to pick its shard:
Line 279 in 57b72ae
This means that, since there are 4 shards per core, if you have 32 or more cores, every SIMD tag in the same shard will be equal, ruining the effectiveness (but not correctness) of HashBrown's parallel tag match algorithm. Even with fewer cores, the effectiveness will be substantially reduced.
The easiest fix is probably just to do:
// Leave the high 7 bits for the HashBrown SIMD tag.
let shard = (hash << 7) >> shift;
Alternatively, multiply is slower but does a good job of mixing low bits into high bits, so something like:
// Multiply by golden ratio (as a usize) to mix into the high bits.
let golden_ratio = (0x9e3779b97f4a7c15u64 >> (64 - util::ptr_size_bits())) as usize;
let shard = hash.wrapping_mul(golden_ratio) >> shift;
This is a few cycles slower than the << 7
, but has the advantage of not magically "knowing about" the underlying HashMap's 7 bit tags. It will also choose a good shard even for a poor hash function that leaves the high bits zero, like certain uses of nohash-hasher, although of course those would suffer from poor SIMD tags anyway.
Iteration is currently weakly consistent due to dropping shards before all entry refs are dropped.
Looking here:
Line 235 in 778f81f
Everytime a get or a put is done there is a subtraction performed. However this value doesn't usually change between calls. It could be avoided by making the result of the operation a member instead of ncb
.
Currently default is only implemented for DashMap<K, V, RandomState>
Hi. I was wondering if Entry::or_*
methods (or_default
, or_insert
and or_insert_with
) can result in lost updates. For example, if we have let map: DashMap<String, AtomicU64> = DashMap::new();
, and then two threads execute the following code:
let value = map.entry(key).or_insert(AtomicU64::new(0));
value.fetch_add(1, Ordering::Relaxed);
Is it possible that the final value of the atomic is 1? I was thinking that it might be possible if both threads can see that key
does not exist and both or_insert
, resulting in the lost of one of the fetch_add
.
After writing this, I think I have a more clear question: is it possible that or_*
methods insert in an Entry::Occupied
?
Thanks.
Currently on 32-bit platforms, DashMap causes a debug_assert failure/segfault in DashMap::_yield_read_shard
and DashMap::_yield_write_shard
. This is due to it being passed a garbage i
, which it gets from determine_map
. Currently, the hashing function used in determine_map
is fxhash::hash64
, which returns a u64. When that hash is later shifted, it's shifted based on the platform's pointer size. On 32bit platforms, this doesn't shift enough and causes the value returned from determine_map
to be well out of bounds.
Changing the hash function from hash64
to just hash
(which returns a usize instead of a u64) seems to work for me so far.
For reference, I'm using this on a Raspberry Pi 2 Model B which uses the 32bit armv7-unknown-linux-gnueabihf
target.
The Clone
trait has an optional clone_from(&mut self, source: &Self)
method. It would be good to manually implement this method so that users can avoid some extra allocations/deallocations when they need to clone_from
.
Right now the code hashes incoming keys twice. Once to select which inner map to use, and then internally by the map.
This could be optimized by trading off a minor amount of memory.
In the inner map instead of inserting the K
directly, it could be a:
HashMap<Key<K>, V>
where Key
is defined as
struct Key {
hashCode: u64,
userKey, K,
}
The inner maps could be set to use a Hasher which is designed to operate exclusively with this type, and which returns the hashCode as the hash every time. (Eq would be defined via the userKey
only.)
Then the outer hasher could first select which map to send the data into (probably not using the lower bits of the hash to avoid collisions).
Currently, the Entry
struct has a or_insert_with
method that accepts a callback to provide a default value; anyway, there is no way to pass a fallible closure.
So, I wonder if it is possible to add the following method to the Entry
struct:
#[inline]
pub fn or_try_insert_with<E>(self, value: impl FnOnce() -> Result<V, E>) -> Result<RefMut<'a, K, V, S>, E> {
match self {
Entry::Occupied(entry) => Ok(entry.into_ref()),
Entry::Vacant(entry) => Ok(entry.insert(value()?)),
}
}
This would allow this use case where dashmap is used as a local cache for value loaded, for example, from a remote storage:
let my_value = my_dashmap
.entry(&"hello")
.or_try_insert_with(fetch_from_db("hello"))
.unwrap();
I'm getting a panic from an overflow here: https://github.com/xacrimon/dashmap/blob/master/src/lib.rs#L147
In my code: https://github.com/rsimmonsjr/axiom/blob/master/src/executor.rs#L55
If this is intentional, should use the overflowing APIs: https://doc.rust-lang.org/std/primitive.usize.html
Hi, id like to make a feature request. I am currently replacing a RwLock<HashMap>
and it works really good so far and the fact that i do not need to acquire a lock manually is really nice. The biggest inconvenience i have within my current code base is that i frequently use serde to (de)serialize the std::collections::HashMap
. I think having this as an optional feature would add value to your implementation.
I have no idea about what is happening but pinning dashmap version to 3.4.0 fixed the tests.
You can never transmute a &T
to a &mut T
in any way. This is instant undefined behavior.
Currently there is no fast way to get a (&K, &mut V)
from the standard hashmap, so your out of luck it seems, but there is a way to fix this! Instead of storing HashMap<K, V>
, store HashMap<K, UnsafeCell<V>>
. This way you can soundly get a &mut V
from get_key_value
(so long as you are also holding a write shard).
Please don't use qadapt-spin
, which is just an early fork version of spin-rs that apply rustfmt
, nothing else changed.
If you don't use spin-rs just because it is no longer actively maintained, I think you can consider using lock_api
(At least, I think spin-rs
is better than qadapt-spin
).
so that these are equivalent:
for kv in dashmap.iter() {}
for kv in &dashmap {}
Hi,
This might not be the place for this kind of question.
I'm curious about this behaviour where my program locks up when I reaccess the map using a key that was borrowed as mutable.
My code is here: https://gist.github.com/rust-play/26c34df4f98d5ce6b34e050763092215
In this case, only the first line is printed. After, the program still keeps using a single thread at 100% (and multiple threads when using tokio async).
Is this expected behaviour? Because the compiler does not complain.
If it is, are there ways to achieve what I'm trying to do? Perhaps using shards and awaiting/manually releasing the lock.
Thanks.
I want to remove an entry when the entry's value matches the condition.
I'm currently receiving a DashMap from some code external to mine and do not need the concurrency. To eliminate the performance overhead and added complexity the concurrent api brings, I would like to be able to take out all the data and store it in a HashMap
(I am doing many many lookups into the map).
Previously when they used a CHashMap I was able to do the following:
let map: HashMap<Key, Value> = c_hash_map.into_iter().collect();
It would be nice to have a similar API or perhaps a single .take()
method.
Would this change be possible? Right now I am cloning all the pairs out of the DashMap, which is not ideal.
Thanks!
Hi!
First of all, congratulations for this fantastic piece of work. I'm using it in an actor system and it is the main corner stone for it.
I was reading that you want to build a V4, what the changes you have in mind? Would you accept help? If yes, I would like to see if there is anything that I can help.
Btw, if this is not your preferred channel for this kind of questions, let me know.
Best! ๐
Running my tests that create objects with Dashmaps inside them panics:
thread 'tests::test_simplest_struct_actor' panicked at 'assertion failed: i < self.shards.len()', C:\Users\Khionu\.cargo\registry\src\github.com-1ecc6299db9ec823\dashmap-2.1.1\src\lib.rs:440:9
This code:
let map = DashMap::<i32, i32>::new();
map.insert(1, 2);
let a = map.get_mut(&1);
let b = map.get(&1);
will deadlock. The shard containing 1
gets write-locked, nothing else can access it. This is reasonable behavior but I think it should be documented.
Perhaps more surprising is that this code:
let map = DashMap::<i32, i32>::new();
for i in 0..100 {
map.insert(i, 0);
}
let mut refs = vec![];
for i in 0..100 {
refs.push(map.get_mut(&i))
}
Will also deadlock. Even though you're not getting the same slot twice, you're still trying to lock certain chunks twice, since some slots are colocated. This isn't very realistic code though. Something less artificial might look like:
let a = map.get_mut(a);
let b = map.get_mut(b);
This code can still deadlock! But it's more sporadic. It only deadlocks with probability 1/shard_count^2; if the two entries happen to be colocated.
Basically, if your thread is holding a RefMut
, it's not deadlock-safe to do any other operations to the map. This behavior surprised me a little, so I think it would be worth documenting.
I'm trying out the new entry API and it's really quite nice to use - it's a great ergonomics improvement! One thing that I notice though is that now, the only way I can tell of ensuring a key is present on the map is to get an owned key (via copy or clone), even if it should already be in the map (or if it might have gotten added in a parallel routine).
As far as I can tell, the most ergonomic way to adjust a key's value is something like this, which always clones (as opposed to a solution that probes via get first, then uses entries to initialize the entry without a race condition):
fn some_work(key: &K) {
// key gets cloned for every access:
let entry = self.entry(key.clone).or_default();
return (*entry).do_the_work();
}
Now, when K should be Clone or Copy, I imagine something like this would be cool:
fn some_work<K: Clone>(key: &K) {
let entry = self.entry_cloning(key).or_default();
return (*entry).do_the_work();
}
...where .entry_cloning
returns a thing that's like Entry
, except in or_default
(and the other methods), it probes with a reference to the key first, then only if the key shouldn't be present, clones in order to insert it. (And same for .entry_copying
...)
Does this make sense?
Alternatively, I'd like to understand how .insert
behaves if the key being inserted is found - does it return the value that was present, does it overwrite the value? If there was a way for my code to ensure a value for the key is present (without overwriting) in the manual way, I'll take it too! (:
Like the standard HashMap, DashMap should implement Debug
.
Is there any chance that something similar to https://doc.rust-lang.org/std/cell/struct.Ref.html#method.map could be provided for dashmap
's Ref
and RefMut
types?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.