GithubHelp home page GithubHelp logo

tkaitchuck / ahash Goto Github PK

View Code? Open in Web Editor NEW
940.0 20.0 84.0 1.03 MB

aHash is a non-cryptographic hashing algorithm that uses the AES hardware instruction

Home Page: https://crates.io/crates/ahash

License: Apache License 2.0

Rust 99.88% Shell 0.12%
rust hash hashing aes

ahash's Introduction

aHash Build Status Licence Downloads

AHash is the fastest, DOS resistant hash currently available in Rust. AHash is intended exclusively for use in in-memory hashmaps.

AHash's output is of high quality but aHash is not a cryptographically secure hash.

Design

Because AHash is a keyed hash, each map will produce completely different hashes, which cannot be predicted without knowing the keys. This prevents DOS attacks where an attacker sends a large number of items whose hashes collide that get used as keys in a hashmap.

This also avoids accidentally quadratic behavior by reading from one map and writing to another.

Goals and Non-Goals

AHash does not have a fixed standard for its output. This allows it to improve over time. For example, if any faster algorithm is found, aHash will be updated to incorporate the technique. Similarly, should any flaw in aHash's DOS resistance be found, aHash will be changed to correct the flaw.

Because it does not have a fixed standard, different computers or computers on different versions of the code will observe different hash values. As such, aHash is not recommended for use other than in-memory maps. Specifically, aHash is not intended for network use or in applications which persist hashed values. (In these cases HighwayHash would be a better choice)

Additionally, aHash is not intended to be cryptographically secure and should not be used as a MAC, or anywhere which requires a cryptographically secure hash. (In these cases SHA-3 would be a better choice)

Usage

AHash is a drop in replacement for the default implementation of the Hasher trait. To construct a HashMap using aHash as its hasher do the following:

use ahash::{AHasher, RandomState};
use std::collections::HashMap;

let mut map: HashMap<i32, i32, RandomState> = HashMap::default();
map.insert(12, 34);

For convenience, wrappers called AHashMap and AHashSet are also provided. These do the same thing with slightly less typing.

use ahash::AHashMap;

let mut map: AHashMap<i32, i32> = AHashMap::new();
map.insert(12, 34);
map.insert(56, 78);

Flags

The aHash package has the following flags:

  • std: This enables features which require the standard library. (On by default) This includes providing the utility classes AHashMap and AHashSet.
  • serde: Enables serde support for the utility classes AHashMap and AHashSet.
  • runtime-rng: To obtain a seed for Hashers will obtain randomness from the operating system. (On by default) This is done using the getrandom crate.
  • compile-time-rng: For OS targets without access to a random number generator, compile-time-rng provides an alternative. If getrandom is unavailable and compile-time-rng is enabled, aHash will generate random numbers at compile time and embed them in the binary.
  • nightly-arm-aes: To use AES instructions on 32-bit ARM, which requires nightly. This is not needed on AArch64. This allows for DOS resistance even if there is no random number generator available at runtime (assuming the compiled binary is not public). This makes the binary non-deterministic. (If non-determinism is a problem see constrandom's documentation)

If both runtime-rng and compile-time-rng are enabled the runtime-rng will take precedence and compile-time-rng will do nothing. If neither flag is set, seeds can be supplied by the application. Multiple apis are available to do this.

Comparison with other hashers

A full comparison with other hashing algorithms can be found here

Hasher performance

For a more representative performance comparison which includes the overhead of using a HashMap, see HashBrown's benchmarks as HashBrown now uses aHash as its hasher by default.

Hash quality

AHash passes the full SMHasher test suite.

The code to reproduce the result, and the full output are checked into the repo.

Additional FAQ

A separate FAQ document is maintained here. If you have questions not covered there, open an issue here.

License

Licensed under either of:

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

ahash's People

Contributors

a1phyr avatar amanieu avatar atul9 avatar cbeck88 avatar dbdr avatar eaufavor avatar emilk avatar erickt avatar jefffrey avatar joshlf avatar koute avatar liamwhite avatar maxtremblay avatar mutantbob avatar nabilwadih avatar nehliin avatar novedevo avatar orlp avatar orzogc avatar purewhitewu avatar robbepop avatar robjtede avatar rodrimati1992 avatar schungx avatar stepancheg avatar stepantubanov avatar striezel avatar timotree3 avatar tkaitchuck avatar virtualritz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ahash's Issues

Seeing different hash values when the hasher is wrapped in a newtype

I'm seeing different hash values from ahash when the AHasher is wrapped in newtype.

I've written a test showing the difference in
ahornby@fa76d4e which is testable with cargo test.

Would you expect this to work? I was expecting only the address of the factory to matter, and whether the constructed hasher is wrapped or not to make no difference.

I don't see the difference if using the default hasher from HashMap (the commit adds a test case for that as well).

Reduce redundancy in code

The two algorithms, as well as the tests contain a lot of copied and pasted code that is modified between the two impls. Find a way to consolidate this.

Improve performance of 17-128 byte strings

This area is somewhat weak in both the aes code and the fallback. Getting out of the loop and working with the tail appears to be an issue. Further expanding the size search tree might help, but longer term there needs to be a way to transition out of the high-speed loop and deal with the remainder.

Adding a feature for serde

Hello, I used this hasher in a project and I noticed that 'AHashMap and AHashSet are not implementing serde::Serialize nor serde::Deserialize traits. I copy/paste chunk of codes from serde directly to implement it for a local copy.

Would you be interested in me cleaning that up and adding that as a feature to this crate?

Using aHash 0.5 with wasm

Hi!

I found a small issue when using aHash with --target wasm32-unknown-unknown. Starting in aHash version 0.5 I get this:

error: target is not supported, for more information see: https://docs.rs/getrandom/#unsupported-targets
   --> /Users/emil.ernerfeldt/.cargo/registry/src/github.com-1ecc6299db9ec823/getrandom-0.2.0/src/lib.rs:224:9
    |
224 | /         compile_error!("target is not supported, for more information see: \
225 | |                         https://docs.rs/getrandom/#unsupported-targets");
    | |_________________________________________________________________________^

error[E0433]: failed to resolve: use of undeclared type or module `imp`
   --> /Users/emil.ernerfeldt/.cargo/registry/src/github.com-1ecc6299db9ec823/getrandom-0.2.0/src/lib.rs:246:5
    |
246 |     imp::getrandom_inner(dest)
    |     ^^^ use of undeclared type or module `imp`

There is a simple workaround:

ahash = { version = "0.5", features = ["std"], default-features = false }
getrandom = { version = "0.2", features = ["js"] } # used by ahash

I was wondering if aHash could have some helpful feature flag for this (features = ["std", "wasm"]), or maybe even detect the wasm32 target? I'm not great with how Cargo.toml dependencies work :)

PS: I really love using aHash as a faster drop-in replacement for HashMap and HashSet.

What is hash_test[_aes] for?

Those seem like dead code, but they show up as public exports in my WebAssembly. If they are necessary for some automated testing, maybe they could be moved to some testing feature flag?

 (export "hash_test" (func $hash_test))

Review

Review of aHash is needed both in implementation and conceptually to ensure that it actually satisfies the properties it is attempting to guarantee.

Support for running under miri

This may be a tricky one, I'm not sure.

When running any code that uses ahash through miri to check for unsafe code, miri runs into issues with ahashes use of per-arch accelerated instructions like std::arch::x86_64::_mm_aesdec_si128. Specifically, the user ends up with an error like

error: unsupported operation: can't call foreign function: llvm.x86.aesni.aesdec
    --> /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/../stdarch/crates/core_arch/src/x86/aes.rs:39:5
     |
39   |     aesdec(a, round_key)
     |     ^^^^^^^^^^^^^^^^^^^^ can't call foreign function: llvm.x86.aesni.aesdec
     |
     = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
     = note: inside `std::arch::x86_64::_mm_aesdec_si128` at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/../stdarch/crates/core_arch/src/x86/aes.rs:39:5
     = note: inside `ahash::aes_hash::aeshashx2` at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.3.2/src/aes_hash.rs:189:21
     = note: inside `ahash::aes_hash::AHasher::hash_in` at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.3.2/src/aes_hash.rs:58:23
     = note: inside `<ahash::aes_hash::AHasher as std::hash::Hasher>::write_u128` at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.3.2/src/aes_hash.rs:87:9
     = note: inside `<ahash::aes_hash::AHasher as std::hash::Hasher>::write_u64` at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.3.2/src/aes_hash.rs:97:9
     = note: inside `<ahash::aes_hash::AHasher as std::hash::Hasher>::write_usize` at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.3.2/src/aes_hash.rs:92:9
     = note: inside `std::hash::impls::<impl std::hash::Hash for usize>::hash::<ahash::aes_hash::AHasher>` at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/hash/mod.rs:569:21

While we could leave it to miri to mock out all of these functions, it might be easier (I don't know) for ahash to simply use the fallback hash implementation under cfg(miri). I think this should just be a matter of changing

#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "aes"))]

and other cfgs like it to include not(miri) in the all list.

/cc @RalfJung

Allow setting RandomState construction seed using runtime randomness

#[cfg(feature = "compile-time-rng")]
static SEED: AtomicUsize = AtomicUsize::new(const_random!(u64));
#[cfg(not(feature = "compile-time-rng"))]
static SEED: AtomicUsize = AtomicUsize::new(MULTIPLE as usize);

Nondeterministic builds are problematic in some environments, for example a Buck-based codebase that uses distributed caching of build artifacts -- distributed build caches incorporate an assumption that build outputs are a deterministic function of the inputs. When this is not the case, it can cause a mismatch between what we think should be in cache vs what can actually be retrieved from the cache by content hash.

But when using ahash with compile-time-rng disabled, we'd still like to avoid having a predictable sequence of states produced by RandomState::new().

Would you consider exposing a setter for random_state::SEED to allow populating it with runtime randomness at startup?

ahash::RandomState::set_seed(rand());

Is hashing a tuple (u32, u32, u32) the same as u128?

Hello! my game hashes tile coordinates. HashMap<(u32,u32,u32), Vec<Entity>) and I was wondering if it's the same as hashing a u128.

edit: woops, just realized this is 96 bits... still unsure which bucket tuples fall into in this case.

Hash results between release and debug build differs

I'm facing a strange issue. When I calculate the hash value of a binary blob with size 920603 in release and debug build I get different hash results.
I'm calculating the hash value with the following code:

let mut hasher = AHasher::default();
hasher.write(&data);
cksum = hasher.finish()

I'm using rustc 1.48.0-nightly (0e2c1281e 2020-09-07) on Windows. I couldn't test it yet with stable because my project depends on unstable features.

Increase perf by combining algorithms

There are some cases where the fallback is outperforming the aes version.
Figure out how to take advantage of these to increase the performance with AES to be at least as good.

Switch from lazy_static to once_cell

once_cell::sync::Lazy fulfills the same task as lazy_static but is less magic in that it doesn't require a macro. A lot of crates have switched already and std has adopted large parts of once_cells API surface as a Nightly feature (tracking issue). Would you be open to a PR that switches aHash over too?

Crate fails to build on platforms outside x86 / x86_64

You've set up shuffle in operations.rs to use _mm_shuffle_epi8 or not based on the result of the cfg! macro. This is incorrect because on non-{x86 | x86_64} builds this will attempt to compile in a call to _mm_shuffle_epi8 and then just never take that code path. This is enough to make the build fail because of name resolution failure. That function doesn't exist at all outside x86 / x86_64.

Here is a corrected version of shuffle:

#[inline(always)]
pub(crate) fn shuffle(a: u128) -> u128 {
    #[cfg(all(target_feature = "ssse3", not(miri)))]
    {
        use core::mem::transmute;
        #[cfg(target_arch = "x86")]
        use core::arch::x86::*;
        #[cfg(target_arch = "x86_64")]
        use core::arch::x86_64::*;
        unsafe {
            transmute(_mm_shuffle_epi8(transmute(a), transmute(SHUFFLE_MASK)))
        }
    }
    #[cfg(not(all(target_feature = "ssse3", not(miri))))]
    {
        a.swap_bytes()
    }
}

Find a way to accelerate creating a keyed hasher

Generating two random numbers each time a hasher is created is a high overhead.
Rust's default approach is to use the thread local storage to generate a random value and then for each subsequent hasher increment the number.
This is not ideal, because it means if you learn one seed you learn them all, it also requires a hash map lookup each time you want to instantiate a hash map. (Which is not a big deal if that were fast, but right now that involves SipHash which is MUCH slower.)

ABuildHasher isn't returning consistent results

A BuildHasher needs to create Hashers with the same seed so that they return consistent results. The seed should be randomized when the BuildHasher is created, not when the Hasher is built.

Add default builder that incorporates randomization.

Hashmaps should not have to call a PRNG or do a hasmap lookup in a thread local to create a new hashmap. One solution is to create a builder that xors the const random key with the address of the generator with address space randomization this should be sufficiently unpredictable.

Folded multiply optimizes poorly on WebAssembly

https://rust.godbolt.org/z/dPYf4M

The fallback hash is designed around the assumption that the 128-bit multiply will be optimized into a 64-bit multiply with 128-bit output, but when compiling for WebAssembly it turns into a call to __multi3 which emulates a full 128-bit multiply.

I think this is an inherent limitation of WebAssembly rather than a codegen issue, as far as I can tell there's no way to get the upper part of a 64-bit multiply in wasm. A secondary fallback that avoids 128-bit math might be beneficial for targets like this.

Fails to build without std feature enabled

Versions 0.3.6 and 0.3.7 fail to build without the std feature enabled, e.g. if only a direct dependency on hashbrown is used which depends on ahash without default features.

The error message is

error[E0433]: failed to resolve: use of undeclared type or module `std`
 --> /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.3.7/src/aes_hash.rs:4:5
  |
4 | use std::hint::unreachable_unchecked;
  |     ^^^ use of undeclared type or module `std`

error: aborting due to previous error

Generate 32-bit hashes on 32-bit platforms

On 32-bit platforms HashMap only looks at the lowest 32 bits of the hash value. AHash could take advantage of this and only use 32-bit multiplications instead of 64-bit multiplications on such platforms.

const_random should be a build dependency?

Hi, I'm trying to cross-compile code that uses hashbrown and ahash to a small system.
I run into trouble because aHash depends on const_random and then ultimately on rand, which I cannot cross-compile because it relies on libc.

I think const_random is only providing a macro, so it is only needed at build-time, and only needs to be compiled for the host. Is this correct? If so, can we mark const_random as a build dependency instead?

Cannot compile ahash if "compile-time-rng" feature is disabled and the environment is not "no_std"

Hello ๐Ÿ‘‹

I cannot compile the crate if I disable the "compile-time-rng" feature. The issues seems to be because the const_random crate is imported anyway, (unless the environment is not no_std) here:

aHash/src/lib.rs

Lines 16 to 17 in a7c0b5a

#![cfg_attr(all(not(test), not(feature = "std")), no_std)]
extern crate const_random;
which is strange because a few lines later, another use of const_random is guarded under a different condition:

aHash/src/lib.rs

Lines 33 to 34 in a7c0b5a

#[cfg(feature = "compile-time-rng")]
use const_random::const_random;

And thank you for the crate, it really helped me speed up my program!

(edit: oops, sorry, I just noticed that there is already a pull request to fix this! #25)

Reproducible builds.

Currently aHash uses const_random to improve performance. However this has the consequences of making the binary produced form a build non-identical. This can be a problem in some cases.

Ideally this should not be the case. Need to identify some mechanism of generating a key that doesn't have this requirement.

`const_random` compile error: mismatched types

ahash v0.2.18, a subdependency fairly far deep down in my cargo tree, is failing to compile (via cargo test) with this message:

error[E0308]: mismatched types
  --> /home/jstrong/.cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.2.18/src/lib.rs:55:45
   |
55 | static SEED: AtomicUsize = AtomicUsize::new(const_random!(u64));
   |                                             ^^^^^^^^^^^^^^^^^^ expected `usize`, found `u64`
   |
   = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

For more information about this error, try `rustc --explain E0308`.
error: could not compile `ahash`

I did not see any other reports of this particular error, but may have missed it.

Seems like the fix would be simple, to cast the return value of const_random!(u64) to usize? I'm confused how this previously compiled, though, seems like I am missing some part of the picture.

using rustc 1.49.0-nightly (98edd1fbf 2020-10-06)

ahash still doesn't work on no_std for me

When I try to compile it, it errors with the following errors (repeating for every macro invocation):

Compiling ahash v0.2.12
error: cannot find macro `proc_macro_call!` in this scope
  --> C:\Users\Christopher Serr\.cargo\registry\src\github.com-1ecc6299db9ec823\ahash-0.2.12\src/lib.rs:49:45
   |
49 | static SEED: AtomicUsize = AtomicUsize::new(const_random!(u64));
   |                                             ^^^^^^^^^^^^^^^^^^
   |
   = help: have you added the `#[macro_use]` on the module/import?
   = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

Add Serde feature

Can you add a feature which enables serde support for the AHashMap and AHashSet types?

Fix slowdown in 16-31 byte strings.

For strings between 16 and 31 bytes three rounds of AES are performed:

while data.len() >= 16 {

where the first round is only scrambling the key. This is obviously useless.

The problem is that when a line is added above the while loop above to xor the first block with the key before the start of the loop, the compiler for some reason compiles the loop differently (and worse) resulting in a 20% performance drop for strings >32 bytes.

Ideally it should be possible to avoid this extra round of aes with out affecting longer strings.

Document reasons for yanking versions from https://crates.io

Numerous versions of ahash have been yanked from https://crates.io, but it is difficult to determine why any particular version was taken down. When dependent builds fail due to a yank, it could be useful and informative to see why the yanked version of ahash was removed from the registry, especially if build artifacts that include such a version of ahash are still out in the wild.

Considering the number of yanks (46 in total at the time of this writing), perhaps a "yank log" could be introduced so users can see why a version they've depended upon must be avoided. Thoughts?

Implement `Trace` and `Finalize` traits in features

Currently, having an issue with using aHash with usage in rust-gc, an issue has been raised with adding support for more crates usage, which in turn states that this should be added as a feature in these other crates, currently using FxHash, which incorporates STL and so works with the gc's STL Traits, however, this does not occur with aHash.

This also links to #42 as the trait implementation is due to this.

Allow creating an AHashMap without an `Eq` or `Hash` bound on the keys

with_capacity_and_hasher requires K to implement Eq:

error[E0277]: the trait bound `K: Hash` is not satisfied
  --> compiler/rustc_data_structures/src/sso/map.rs:89:29
   |
89 |             SsoHashMap::Map(FxHashMap::with_capacity_and_hasher(cap, Default::default()))
   |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `Hash` is not implemented for `K`
   |
   = note: required by `AHashMap::<K, V, S>::with_capacity_and_hasher`

Consider relaxing that bound, I think you shouldn't need it for constructing the hashmap, only for operations involving the elements (get/set/entry). Same request for various other functions, new(), capacity(), len(), etc.

Note the error is misleading here - FxHashMap is an alias for AHashMap that I renamed because it was simpler than changing a hundred imports.

Originally posted by @jyn514 in rust-lang/rust#77996 (comment)

Make no_panic optional

My build started failing with

 = note: /root/mobilenode/target/debug/deps/libahash-72211b027529aaa9.rlib(ahash-72211b027529aaa9.ahash.4a8lpfqe-cgu.8.rcgu.o): In function `<ahash::hash_test_final::__NoPanic as core::ops::drop::Drop>::drop':
          /root/mobilenode/cargo/registry/src/github.com-1ecc6299db9ec823/ahash-0.2.9/src/lib.rs:195: undefined reference to `
          
          ERROR[no-panic]: detected panic in function `hash_test_final`
          '
          collect2: error: ld returned 1 exit status

I'm not sure why this started happening, I think it's because in this commit I'm using hashbrown + ahash in more targets?

It only fails during cargo test not during cargo build

In documentation of no_panic crate, it is suggested:

If you find that code requires optimization to pass #[no_panic], either make no-panic an optional dependency that you only enable in release builds, or add a section like the following to Cargo.toml to enable very basic optimization in debug builds.

Would you consider making no_panic an optional dependency that's off by default? It's a pretty nice check but I think most users probably don't need to validate 3rd party library code this way as part of their builds, and would appreciate faster builds. I'm happy to make a patch like this.

BIndgen to SMhasher to verify quality

Currently hash quality should be strong based on the properties of aes and splitmix. But verification of this is limited to manual inspection and basic unit tests. SMHasher is a good hasher test suite. Similarly BigCrush is a good prng test suite. These are in C++. So if aHash had a connector made using bindgen, it could be run against these tests to assert its quality.

Build failed

Cargo.toml
[package]
name = "test_dashmap"
version = "0.1.0"
authors = ["vmos [email protected]"]
edition = "2018"

[dependencies]
dashmap = "1.0"

main.rs
extern crate dashmap;

use dashmap::DashMap;

fn main() {
let map: DashMap<usize, usize> = DashMap::default();
}

cargo build
Compiling ahash v0.2.12
error: cannot find macro proc_macro_call! in this scope
--> C:\Users\vmos.cargo\registry\src\github.com-1ecc6299db9ec823\ahash-0.2.12\src/lib.rs:49:45
|
49 | static SEED: AtomicUsize = AtomicUsize::new(const_random!(u64));
| ^^^^^^^^^^^^^^^^^^
|
= help: have you added the #[macro_use] on the module/import?
= note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

error: cannot find macro proc_macro_call! in this scope
--> C:\Users\vmos.cargo\registry\src\github.com-1ecc6299db9ec823\ahash-0.2.12\src/lib.rs:92:32
|
92 | AHasher::new_with_keys(const_random!(u64), const_random!(u64))
| ^^^^^^^^^^^^^^^^^^
|
= help: have you added the #[macro_use] on the module/import?
= note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

error: cannot find macro proc_macro_call! in this scope
--> C:\Users\vmos.cargo\registry\src\github.com-1ecc6299db9ec823\ahash-0.2.12\src/lib.rs:92:52
|
92 | AHasher::new_with_keys(const_random!(u64), const_random!(u64))
| ^^^^^^^^^^^^^^^^^^
|
= help: have you added the #[macro_use] on the module/import?
= note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

error: aborting due to 3 previous errors

error: Could not compile ahash.

Create faster low quality hash

Following up on rust-lang/hashbrown#207
It may be beneficial to have lower quality non-dos-resistant hash for cases where applications don't care about these features.

It would make sense to do this as a separate hasher instance. The trick is to find a way to do this without having to duplicate all of the code.

`AHashMap::default()` gives a type inferrence error

(again, please ignore FxHashMap - it's really an AHashMap)

error[E0282]: type annotations needed for `FxHashMap<usize, snippet::Style, S>`
    --> compiler/rustc_errors/src/emitter.rs:1356:38
     |
1356 |                 let mut multilines = FxHashMap::default();
     |                     --------------   ^^^^^^^^^^^^^^^^^^ cannot infer type for type parameter `S` declared on the struct `AHashMap`
     |                     |
     |                     consider giving `multilines` the explicit type `FxHashMap<usize, snippet::Style, S>`, where the type parameter `S` is specified

The way FxHashMap deals with this is by making FxHashMap a straight type alias for using BuildHasherDefault: https://docs.rs/rustc-hash/1.1.0/rustc_hash/type.FxHashMap.html. Maybe AHashMap could do something similar?

AES hash is significantly slower than fallback for short strings on Broadwell

Tested on a Broadwell Xeon E5-2690 v4 with Rust Nightly (1.51, 2020-01-09):

  • "1": 3.07 ns vs. 1.90 ns
  • "123": 3.00 ns vs. 2.01 ns
  • "1234": 3.00 ns vs. 2.11 ns
  • "1234567": 2.99 ns vs 2.10 ns
  • "12345678": 2.05 ns vs. 2.09 ns

This performance difference is very noticeable in some macrobenchmarks that involve aHash-powered hashmaps. If this is an inherent limitation of the AES-powered hash, perhaps it would be nice to have a feature flag or some other argument to force the use of the fallback hash if the hashed values are known to be short.

Raw test results
aeshash/u8              time:   [883.26 ps 885.33 ps 887.31 ps]                        

aeshash/u16             time:   [848.89 ps 852.85 ps 856.76 ps]                         
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

aeshash/u32             time:   [837.57 ps 841.42 ps 845.60 ps]                         
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

aeshash/u64             time:   [844.28 ps 848.31 ps 852.62 ps]                         
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

aeshash/u128            time:   [634.65 ps 637.45 ps 640.59 ps]                          
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

aeshash/string/"1"      time:   [3.0568 ns 3.0707 ns 3.0857 ns]                                
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
aeshash/string/"123"    time:   [2.9733 ns 3.0039 ns 3.0427 ns]                                  
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
aeshash/string/"1234"   time:   [2.9937 ns 3.0096 ns 3.0261 ns]                                   
aeshash/string/"1234567"                                                                             
                        time:   [2.9739 ns 2.9858 ns 2.9995 ns]
aeshash/string/"12345678"                                                                             
                        time:   [2.0422 ns 2.0526 ns 2.0634 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345"                                                                             
                        time:   [2.1141 ns 2.1215 ns 2.1289 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
aeshash/string/"1234567890123456"                                                                             
                        time:   [2.0369 ns 2.0457 ns 2.0556 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
aeshash/string/"123456789012345678901234"                                                                             
                        time:   [2.2794 ns 2.2919 ns 2.3055 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
aeshash/string/"123456789012345678901234567890123"                                                                             
                        time:   [3.6343 ns 3.6497 ns 3.6677 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"                                                                             
                        time:   [8.6159 ns 8.6649 ns 8.7177 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...                                                                             
                        time:   [11.947 ns 12.029 ns 12.107 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2                                                                             
                        time:   [44.972 ns 45.239 ns 45.515 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
fallback/u8             time:   [888.06 ps 889.87 ps 891.68 ps]                         
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

fallback/u16            time:   [881.25 ps 884.31 ps 888.02 ps]                          
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

fallback/u32            time:   [888.11 ps 891.69 ps 895.86 ps]                          
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

fallback/u64            time:   [881.68 ps 883.99 ps 886.34 ps]                          
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

fallback/u128           time:   [681.99 ps 683.29 ps 684.65 ps]                           
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

fallback/string/"1"     time:   [1.9006 ns 1.9042 ns 1.9079 ns]                                 
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
fallback/string/"123"   time:   [2.0054 ns 2.0109 ns 2.0163 ns]                                   
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"1234"  time:   [2.0983 ns 2.1073 ns 2.1166 ns]                                    
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
fallback/string/"1234567"                                                                             
                        time:   [2.0951 ns 2.1031 ns 2.1110 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
fallback/string/"12345678"                                                                             
                        time:   [2.0800 ns 2.0892 ns 2.0982 ns]
fallback/string/"123456789012345"                                                                             
                        time:   [2.3176 ns 2.3222 ns 2.3268 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"1234567890123456"                                                                             
                        time:   [2.3022 ns 2.3065 ns 2.3108 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"123456789012345678901234"                                                                             
                        time:   [3.5435 ns 3.5927 ns 3.6562 ns]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe
fallback/string/"123456789012345678901234567890123"                                                                             
                        time:   [4.8958 ns 4.9083 ns 4.9210 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"                                                                             
                        time:   [7.5150 ns 7.5410 ns 7.5667 ns]
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...                                                                             
                        time:   [12.932 ns 12.951 ns 12.972 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2                                                                            
                        time:   [98.567 ns 98.730 ns 98.885 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

Please reconsider the strategy in `RandomState` for getting a hashing key

In order for aHash to achieve DOS resistance, the key used for the hash must be a secret not known to the attacker.
However, a problem (not only for rust but for all low-level programming that needs randomness) is that there are not usually completely portable APIs for getting randomness. In rust stdlib the APIs for randomness are in std and not core.

This is a big pain for hashmaps and AFAIK it is the only reason that the stdlib hashmap is not in alloc crate like all the other collections. That is very annoying for people trying to port their code that uses hashmaps to an environment with no operating system. Portability is super important for rust and not enough libraries in the ecosystem pay attention to this. So I appreciate your focus on this and the efforts you went to try to make something that will work.

Here's the strategy that I see based on review of the current code. (https://github.com/tkaitchuck/aHash/blob/master/src/random_state.rs)

(1) The const-random feature gets randomness at compile time and bakes it into the binary as a constant.
(2) The RandomState default implementation also mixes in the addresses of some stack and global variables, with a code comment explaining that ASLR will randomize these addresses, so this is like a source of randomness.

However, there are a few big problems with this:
(1) Turning on const-random feature basically means that I am baking my secret keys into the binary. It is NOT normally okay to assume that the attacker does not have the binary. People make releases on github of their binaries all the time. Even if their project is not open source, all kinds of engineers and contractors are likely to have access to a build that runs on the servers. If anyone who has the binary can extract the key and then DOS the server, that is terrible and way outside the threat model for most projects. This basically runs up against Kerckhoff's law: https://en.wikipedia.org/wiki/Kerckhoffs's_principle
It is not enough for the key to be chosen randomly "at some point in time", and rolling the random dice in the build.rs doesn't really fix anything. The point of choosing the key randomly at all is to make it a secret from the attacker, who is assumed not to have access to the specific machine where the process is running.

(2) Turning on the const-random feature throws repeatable builds out the window. Most of the time when people move security-critical code around, they will do things like take hashes of the binaries to confirm a correct download or a correct build. No one expects that a process is going to intentionally bake random bytes into the binary. In some cases, like SGX, not having a repeatable build destroys the guarantees of SGX -- the point is that someone else could build your software from scratch and get the same hash as the remote SGX hardware is telling them. A year ago, I had to spend several days tracking down why our SGX enclave build is not repeatable, by diffing the intermediate build artifacts repeatedly until I could isolate the problem, and the problem was aHash const-random feature. I now basically have to screen every third party lib we add to the project to check if there is an aHash somewhere in the tree without const-random feature disabled. This is a major footgun that will have to be disabled in serious projects, so I would argue that it should just be removed.

(3) If const-random is off, then the only source of entropy in our keys is ASLR. Here's the thing: ASLR is an OS feature. If you don't have an OS, you probably don't have ASLR either. You generally don't have ASLR in embedded devices, and you don't have ASLR in SGX enclaves. In fact, in any case where ASLR would be present, I expect that you can simply ask the OS for randomness instead of using pointer tricks based on ASLR assumption. The advantage of using a standard API for getting randomness from the OS, instead of trying to extract randomness indirectly via ASLR, is that ASLR is not actually an interface for getting randomness, and may not actually give you a secret random value like you would get from a normal interface. ASLR is a defense-in-depth technique to try to defend against ROP. But the address offsets can leak depending on the structure of the rest of your program. If you simply ask the OS for randomness instead of trying to rely on ASLR, then the OS will give you a value that can't be leaked or inferred in this way.

If there is no ASLR present on the system, then the secret key derivation in aHash likely fails in a manner similar to the debian Random Number Generator Bug from some years ago. Assuming that const-random is off, or that the adversary has a copy of the binary, there isn't really any other entropy present, so, game over, no DOS resistance.

So unfortunately, although we did all this work to try to make the library more portable and help the people working on no_std environments, embedded devices and such, we likely just created bigger problems for them, because we didn't use a standard API for obtaining randomness for secret keys.


I want to suggest an alternate approach: In the last year or so, the getrandom crate has matured and offers no_std support. This now appears to me to be the best and most portable way to obtain randomness. It is also the basis of OsRng now in the rand crate.

What I would suggest is:
(1) RandomState should get entropy from the getrandom crate, which becomes an optional dependency of AHash.
(2) When getrandom crate is not available, don't offer a default-initialized RandomState -- force the user to use the with_keys API and provide secret keys on their own, or tell them to patch the getrandom crate so that it will work for their target. (And ideally submit that patch upstream)
(3) All the const-random and ASLR-based key derivation should just go away.

This way, in an environment where ASLR is not actually present, instead of silently building insecure stuff, we fail at build time.

By using standard APIs and investing our maintenance energies in them, we can avoid one-off tricks and strengthen the ecosystem as a whole.

Include LICENSE files

It would be very awesome if you could include LICENSE-MIT and LICENSE-APACHE files and make a new release.

Having text of full license is mandatory by the license terms.

Thanks!

use typedef for AHashMap/AHashSet

I was trying to switch some code over to AHashMap/AHashSet, but serde::{Serialize,Deserialize} aren't implemented for them, because of the newtype wrapper.

Wondering if a better strategy for exposing these would be typedefs? This is what fxhash does. That has the advantage of interoperating more cleanly with the rest of the ecosystem, because any external trait implemented for the stdlib type are still valid for the typedefs. I'm not aware of any drawbacks in this case. Thoughts?

Document security properties

There is a widespread belief that a cryptographicly secure hash is needed. See: https://github.com/rurban/smhasher/blob/master/README.md#security

Beyond what is already documented in https://github.com/tkaitchuck/aHash/wiki/Attacking-aHash-or-why-it's-good-enough-for-a-hashmap
There should a wiki page should be created to explain:

  • How DoS attacks on hashes actually work and why even 'strong' hashes are susceptible.
  • What properties are actually needed.
  • How aHash satisfies these.

Develop a measure of hash quality for comparison

For < 64bit inputs hash quality is highly subjective, as any hash function that produces no collisions in the output is in some sense 'ok'. So to make the problem concrete, bit collisions in hash maps should be the focus. Some possible tests include:

  • Number of output bits flipped per bit of input
  • Low/high order collisions by incrementing by 1 or any other fixed constant.
  • Attacker flipping one or more mask bits and or rotating to 'undo' part of the scrambling.

RandomState::build_hasher documentation is misleading

fn build_hasher(&self) -> AHasher

Constructs a new AHasher with keys based on compile time generated constants** and the location of the this object in memory.

When reading this, it scared me initially, as with the docs as written, moving the BuildHasher would change the key used to generate the real Hasher.

The correct description of the behavior is that the keys are "based on compile compile time generated constants and the location this object was constructed at in memory."

Allow enabling of AES-NI support through a feature flag

This crate currently enables AES support if AES-NI support is detected (by target_feature=+aes being set somehow). To the best of my knowledge, the only way of setting this option is by setting RUSTFLAGS or by passing -C ... to rustc somehow. Sadly Cargo does not provide a sane way (that I know of) for passing such settings conditionally. Instead, you either have to pass it manually or pass it always.

But there is a way around this: aHash could include a build.rs that looks like this:

fn main() {
    if cfg!(feature = "aes") {
        println!("cargo:rustc-cfg=target_feature=\"aes\"");
    }
}

This allows setting of target_feature settings for this crate, which can then be driven by a feature flag set through Cargo. aHash could then enable AES support if it either detects AES-NI support on its own, or when forced by setting this feature flag. This in turn allows depending crates to reuse this functionality. All this makes it easier to (conditionally) build depending crates (and aHash) with or without AES-NI support.

For example, if project X depends on aHash, this setup allows users of X to enable/disable AES-NI support easily; instead of having to set RUSTFLAGS. It also allows X to enable it by default, something you can't really do with RUSTFLAGS easily that I know of (at least not without requiring users to go through a Makefile or edit Cargo.toml by hand).

I'd be happy to provide a patch adding support for this. But before I do so: does this sound reasonable, or are there perhaps better ways of doing this that I am not aware of?

Strengthen fallback algorithm

The fallback algorithm was weakened for speed. It could be better if there were a hardware reverse bits instruction such as exists on ARM processors.
Figure out how to take advantage of this. This could be done by either introducing more special case hardware instructions, or falling back onto generic autovectorization with something better than a rotate randomize bits.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.