GithubHelp home page GithubHelp logo

al8n / stretto Goto Github PK

View Code? Open in Web Editor NEW
395.0 6.0 28.0 392 KB

Stretto is a Rust implementation for Dgraph's ristretto (https://github.com/dgraph-io/ristretto). A high performance memory-bound Rust cache.

License: Apache License 2.0

Rust 97.52% Shell 1.03% Go 1.46%
cache rust-lang rust rust-library rust-crate tinylfu caching concurrent

stretto's Introduction

Stretto

Stretto is a pure Rust implementation for https://github.com/dgraph-io/ristretto.

A high performance thread-safe memory-bound Rust cache.

English | 简体中文

github Build codecov

docs.rs crates.io crates.io

license

Features

  • Internal Mutability - Do not need to use Arc<RwLock<Cache<...>> for concurrent code, you just need Cache<...> or AsyncCache<...>
  • Sync and Async - Stretto support sync and runtime agnostic async.
    • In sync, Cache starts two extra OS level threads. One is policy thread, the other is writing thread.
    • In async, AsyncCache starts two extra green threads. One is policy thread, the other is writing thread.
  • Store policy Stretto only store the value, which means the cache does not store the key.
  • High Hit Ratios - with Dgraph's developers unique admission/eviction policy pairing, Ristretto's performance is best in class.
    • Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
    • Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
  • Fast Throughput - use a variety of techniques for managing contention and the result is excellent throughput.
  • Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
  • Fully Concurrent - you can use as many threads as you want with little throughput degradation.
  • Metrics - optional performance metrics for throughput, hit ratios, and other stats.
  • Simple API - just figure out your ideal CacheBuilder/AsyncCacheBuilder values and you're off and running.

Table of Contents

Installation

  • Use Cache.
[dependencies]
stretto = "0.8"

or

[dependencies]
stretto = { version = "0.8", features = ["sync"] }
  • Use AsyncCache
[dependencies]
stretto = { version = "0.8", features = ["async"] }
  • Use both Cache and AsyncCache
[dependencies]
stretto = { version = "0.8", features = ["full"] }

Related

If you want some basic caches implementation(no_std), please see https://crates.io/crates/caches.

Usage

Example

Sync

use std::time::Duration;
use stretto::Cache;

fn main() {
    let c = Cache::new(12960, 1e6 as i64).unwrap();

    // set a value with a cost of 1
    c.insert("a", "a", 1);
    // set a value with a cost of 1 and ttl
    c.insert_with_ttl("b", "b", 1, Duration::from_secs(3));

    // wait for value to pass through buffers
    c.wait().unwrap();

    // when we get the value, we will get a ValueRef, which contains a RwLockReadGuard
    // so when we finish use this value, we must release the ValueRef
    let v = c.get(&"a").unwrap();
    assert_eq!(v.value(), &"a");
    v.release();

    // lock will be auto released when out of scope
    {
        // when we get the value, we will get a ValueRef, which contains a RwLockWriteGuard
        // so when we finish use this value, we must release the ValueRefMut
        let mut v = c.get_mut(&"a").unwrap();
        v.write("aa");
        assert_eq!(v.value(), &"aa");
        // release the value
    }

    // if you just want to do one operation
    let v = c.get_mut(&"a").unwrap();
    v.write_once("aaa");

    let v = c.get(&"a").unwrap();
    assert_eq!(v.value(), &"aaa");
    v.release();

    // clear the cache
    c.clear().unwrap();
    // wait all the operations are finished
    c.wait().unwrap();
    assert!(c.get(&"a").is_none());
}

Async

Stretto support runtime agnostic AsyncCache, the only thing you need to do is passing a spawner when building the AsyncCache.

use std::time::Duration;
use stretto::AsyncCache;

#[tokio::main]
async fn main() {
    // In this example, we use tokio runtime, so we pass tokio::spawn when constructing AsyncCache
    let c: AsyncCache<&str, &str> = AsyncCache::new(12960, 1e6 as i64, tokio::spawn).unwrap();

    // set a value with a cost of 1
    c.insert("a", "a", 1).await;

    // set a value with a cost of 1 and ttl
    c.insert_with_ttl("b", "b", 1, Duration::from_secs(3)).await;

    // wait for value to pass through buffers
    c.wait().await.unwrap();

    // when we get the value, we will get a ValueRef, which contains a RwLockReadGuard
    // so when we finish use this value, we must release the ValueRef
    let v = c.get(&"a").unwrap();
    assert_eq!(v.value(), &"a");
    // release the value
    v.release(); // or drop(v)

    // lock will be auto released when out of scope
    {
        // when we get the value, we will get a ValueRef, which contains a RwLockWriteGuard
        // so when we finish use this value, we must release the ValueRefMut
        let mut v = c.get_mut(&"a").unwrap();
        v.write("aa");
        assert_eq!(v.value(), &"aa");
        // release the value
    }

    // if you just want to do one operation
    let v = c.get_mut(&"a").unwrap();
    v.write_once("aaa");

    let v = c.get(&"a").unwrap();
    println!("{}", v);
    assert_eq!(v.value(), &"aaa");
    v.release();

    // clear the cache
    c.clear().await.unwrap();
    // wait all the operations are finished
    c.wait().await.unwrap();

    assert!(c.get(&"a").is_none());
}

Config

The CacheBuilder struct is used when creating Cache instances if you want to customize the Cache settings.

num_counters

num_counters is the number of 4-bit access counters to keep for admission and eviction. Dgraph's developers have seen good performance in setting this to 10x the number of items you expect to keep in the cache when full.

For example, if you expect each item to have a cost of 1 and max_cost is 100, set num_counters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set num_counters to 100,000. The important thing is the number of unique items in the full cache, not necessarily the max_cost value.

max_cost

max_cost is how eviction decisions are made. For example, if max_cost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted.

max_cost can also be used to denote the max size in bytes. For example, if max_cost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted.

max_cost could be anything as long as it matches how you're using the cost values when calling insert.

key_builder

pub trait KeyBuilder {
    type Key: Hash + Eq + ?Sized;

    /// hash_index is used to hash the key to u64
    fn hash_index<Q>(&self, key: &Q) -> u64
        where 
            Self::Key: core::borrow::Borrow<Q>,
            Q: Hash + Eq + ?Sized;

    /// if you want a 128bit hashes, you should implement this method,
    /// or leave this method return 0
    fn hash_conflict<Q>(&self, key: &Q) -> u64
        where 
            Self::Key: core::borrow::Borrow<Q>,
            Q: Hash + Eq + ?Sized;
    { 0 }

    /// build the key to 128bit hashes.
    fn build_key<Q>(&self, k: &Q) -> (u64, u64) 
        where 
            Self::Key: core::borrow::Borrow<Q>,
            Q: Hash + Eq + ?Sized;
    {
        (self.hash_index(k), self.hash_conflict(k))
    }
}

KeyBuilder is the hashing algorithm used for every key. In Stretto, the Cache will never store the real key. The key will be processed by KeyBuilder. Stretto has two default built-in key builder, one is TransparentKeyBuilder, the other is DefaultKeyBuilder. If your key implements TransparentKey trait, you can use TransparentKeyBuilder which is faster than DefaultKeyBuilder. Otherwise, you should use DefaultKeyBuilder You can also write your own key builder for the Cache, by implementing KeyBuilder trait.

Note that if you want 128bit hashes you should use the full (u64, u64), otherwise just fill the u64 at the 0 position, and it will behave like any 64bit hash.

buffer_size

buffer_size is the size of the insert buffers. The Dgraph's developers find that 32 * 1024 gives a good performance.

If for some reason you see insert performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 32 * 1024. This is a fine-tuning mechanism and you probably won't have to touch this.

metrics

Metrics is true when you want real-time logging of a variety of stats. The reason this is a CacheBuilder flag is because there's a 10% throughput performance overhead.

ignore_internal_cost

Set to true indicates to the cache that the cost of internally storing the value should be ignored. This is useful when the cost passed to set is not using bytes as units. Keep in mind that setting this to true will increase the memory usage.

cleanup_duration

The Cache will cleanup the expired values every 500ms by default.

update_validator

pub trait UpdateValidator: Send + Sync + 'static {
    type Value: Send + Sync + 'static;

    /// should_update is called when a value already exists in cache and is being updated.
    fn should_update(&self, prev: &Self::Value, curr: &Self::Value) -> bool;
}

By default, the Cache will always update the value if the value already exists in the cache, this trait is for you to check if the value should be updated.

callback

pub trait CacheCallback: Send + Sync + 'static {
    type Value: Send + Sync + 'static;

    /// on_exit is called whenever a value is removed from cache. This can be
    /// used to do manual memory deallocation. Would also be called on eviction
    /// and rejection of the value.
    fn on_exit(&self, val: Option<Self::Value>);

    /// on_evict is called for every eviction and passes the hashed key, value,
    /// and cost to the function.
    fn on_evict(&self, item: Item<Self::Value>) {
        self.on_exit(item.val)
    }

    /// on_reject is called for every rejection done via the policy.
    fn on_reject(&self, item: Item<Self::Value>) {
        self.on_exit(item.val)
    }
}

CacheCallback is for customize some extra operations on values when related event happens.

coster

pub trait Coster: Send + Sync + 'static {
    type Value: Send + Sync + 'static;

    /// cost evaluates a value and outputs a corresponding cost. This function
    /// is ran after insert is called for a new item or an item update with a cost
    /// param of 0.
    fn cost(&self, val: &Self::Value) -> i64;
}

Cost is a trait you can pass to the CacheBuilder in order to evaluate item cost at runtime, and only for the insert calls that aren't dropped (this is useful if calculating item cost is particularly expensive, and you don't want to waste time on items that will be dropped anyways).

To signal to Stretto that you'd like to use this Coster trait:

  1. Set the Coster field to your own Coster implementation.
  2. When calling insert for new items or item updates, use a cost of 0.

hasher

The hasher for the Cache, default is SeaHasher.

Acknowledgements

  • Thanks Dgraph's developers for providing amazing Go version Ristretto implementation.

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

stretto's People

Contributors

al8n avatar bddap avatar ciscorn avatar clslaid avatar dependabot[bot] avatar fossabot avatar goldwind-ting avatar millione avatar palashahuja avatar peter-scholtens avatar sd2k avatar sebastian-radancy avatar tatsuya6502 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

stretto's Issues

TTL not really work

If I insert with ttl 10000 items, with 60sec TTL in my on_evict in a callback, I will see only ~1000 evict items after 60sec.
During checking len() of such cache, it also shows ~9000 items.
I suppose the cleaner process is not working correctly, and there is something with the TTL map.

failure: cache::test::sync_test::test_cache_drop_updates, when running with multiple threads

I haven't figured out if the failure is caused by a problem with the library or with the test. I do seem to have narrowed it down to only failing when the test is run with multiple threads and allowing all the test to be run. When the test is limited to a single thread, or when the test is run in isolation (cargo test --release --lib test_cache_drop_updates -- --test-threads=4), it seems to pass each time.

cargo test and cargo test --release often, but not always, fails with

failures:

---- cache::test::sync_test::test_cache_drop_updates stdout ----
thread 'cache::test::sync_test::test_cache_drop_updates' panicked at 'assertion failed: c.insert(1, \"0\".to_string(), 10)', src/cache/test.rs:680:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    cache::test::sync_test::test_cache_drop_updates

but running with cargo test -- --test-threads=1 sees it pass each time. I wonder if other tests that preceded this one are having an affect because when the one test is run in isolation, it always passes too.

If it were determined the test isn't designed for multiple threads, I would suggest creating a .cargo/config.toml with

[env]
RUST_TEST_THREADS = { value = "1" }

I would offer a PR but as I said at the top, I don't know if a real problem is being masked by just running the one test or just running with one thread.

Can key lifetime be shortened to insert/get/destroy functions only?

As the (ri)stretto algorithm does not store/modify the key, it should also be able to only borrow it? The lifetime of this borrow should therefore only resemble or exceed the lifetime of the function calls of insert(), try_insert(), get(), get_mut() and remove(), but not the lifetime of the full cache, I assume. To simply demonstrate the idea I modified the synced example version to this code below where the key is a &str:

use std::time::Duration;
use stretto::Cache;

// key should live as long as function call while cache takes ownership of value
fn new_cache<'a>() -> Cache<&'a str, String> {
    let c: Cache<&str, String> = Cache::new(12960, 1e6 as i64).unwrap();
    c
}

fn main() {
    let r = rand::random::<u8>();
    let c = new_cache();

    {
        // create storage key from random number, and set a value with a cost of 1
        let store_key = format!("key1-{}", r);
        // set a value with a cost of 1
        c.insert(&store_key, "value1".to_string(), 1);
        // set a value with a cost of 1 and ttl
        c.insert_with_ttl("key2", "value2".to_string(), 1, Duration::from_secs(3));

        // wait for value to pass through buffers
        c.wait().unwrap();
    }

    // Create a search key
    let key1 = "key1";

//... rest is not modified ...

However, that fails with the following error:

24 |     }
   |     - `store_key` dropped here while still borrowed

The idea of this enhancement is that users of the cache do not need to copy their key str to a String possibly saving CPU processing time by only handling a non-mutable borrow. I assume this can be done by inserting a lifetime specifier in the function calls?

The sampling is not random during evictions in fill_sample

The rust iterators over HashMap do not work the same as in golang. In rust iteration order is stable between calls if hashmap does not change. If it changes the iteration order is still mostly the same. That not the case for golang where iteration order is different between calls to k := range map.

for (k, v) in self.key_costs.iter() {

https://github.com/dgraph-io/ristretto/blob/3177d9c9520c37d36b18113be01fea6393f63860/policy.go#L317

I suspect that cache performance (hit rates) suffer significantly because of this.

Use-After-Free in SharedNonNull?

Hi. Thanks for an interesting library. I'm wondering what you had in mind for SharedNonNull under src/utils.rs. It's not being used anywhere, but it's also somewhat buggy.

As of fb980da, There seems to be use-after-free coming from SharedNonNull. For example: cargo miri test utils::test::test_shared_non_null returns error: Undefined Behavior: pointer to alloc189467 was dereferenced after this allocation got freed.

In that case, I suspect 3 is being deallocated right after SharedNonNull is created, so the inner NonNull is hanging. I am not too familiar with your library, but it seems you might want some Rc like behavior to prevent the original location from being deallocated. Otherwise, it seems you will always get use-after-free when creating from a value that lives only for 1 line.

Full error message

error: Undefined Behavior: pointer to alloc189467 was dereferenced after this allocation got freed
   --> /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:327:18
    |
327 |         unsafe { &*self.as_ptr() }
    |                  ^^^^^^^^^^^^^^^ pointer to alloc189467 was dereferenced after this allocation got freed
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `std::ptr::NonNull::<i32>::as_ref` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:327:18
note: inside `utils::SharedNonNull::<i32>::as_ref` at src/utils.rs:261:9
   --> src/utils.rs:261:9
    |
261 |         self.ptr.as_ref()
    |         ^^^^^^^^^^^^^^^^^
note: inside `utils::test::test_shared_non_null` at src/utils.rs:360:26
   --> src/utils.rs:360:26
    |
360 |         let r = unsafe { snn.as_ref() };
    |                          ^^^^^^^^^^^^
note: inside closure at src/utils.rs:358:5
   --> src/utils.rs:358:5
    |
357 |       #[test]
    |       ------- in this procedural macro expansion
358 | /     fn test_shared_non_null() {
359 | |         let snn = SharedNonNull::new(&mut 3);
360 | |         let r = unsafe { snn.as_ref() };
361 | |         assert_eq!(r, &3);
...   |
365 | |         }
366 | |     }
    | |_____^
    = note: inside `<[closure@src/utils.rs:358:5: 366:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:585:5
    = note: inside closure at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:576:30
    = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1811:9
    = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:271:9
    = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:406:40
    = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:370:19
    = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:133:14
    = note: inside `test::run_test_in_process` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:608:18
    = note: inside closure at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:500:39
    = note: inside `test::run_test::run_test_inner` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:538:13
    = note: inside `test::run_test` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:572:28
    = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:313:17
    = note: inside `test::run_tests_console` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/console.rs:290:5
    = note: inside `test::test_main` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:124:15
    = note: inside `test::test_main_static` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:143:5
    = note: inside `main`
    = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:123:18
    = note: inside closure at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:145:18
    = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
    = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:406:40
    = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:370:19
    = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:133:14
    = note: inside closure at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:48
    = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:406:40
    = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:370:19
    = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:133:14
    = note: inside `std::rt::lang_start_internal` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:20
    = note: inside `std::rt::lang_start::<()>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:144:17
    = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

error: test failed, to rerun pass '--lib'

Can you use a cache after clear()?

Should you be able to insert into the cache after you clear() it? The test case below fails on the final line since the get() returns None (after insert and wait). Looking at the code, I believe this is a bug and not a design decision; however, I don't see an attempt to use a cache after clear in the examples or tests.

    use claim::*;
    use pretty_assertions::assert_eq;

    #[test]
    fn test_basic_cache_add_after_clear() {
        let ttl = Duration::from_secs(60);
        block_on(async {
            let cache = AsyncCache::builder(1000, 100)
                .set_metrics(true)
                .set_ignore_internal_cost(true)
                .finalize()
                .expect("failed creating cache");

            assert!(cache.insert_with_ttl("foo".to_string(), 17.to_string(), 1, ttl).await);
            assert!(cache.insert_with_ttl("bar".to_string(), "otis".to_string(), 1, ttl).await);
            assert_ok!(cache.wait().await);

            assert_eq!(assert_some!(cache.get(&"foo".to_string())).value(), &17.to_string());
            assert_eq!(assert_some!(cache.get(&"bar".to_string())).value(), &"otis".to_string());

            assert_ok!(cache.clear());
            assert_ok!(cache.wait().await);

            assert_none!(cache.get(&"foo".to_string()));

            assert!(cache.insert_with_ttl("zed".to_string(), 33.to_string(), 1, ttl).await);
            assert_ok!(cache.wait().await);

            assert_none!(cache.get(&"bar".to_string()));
            assert_eq!(assert_some!(cache.get(&"zed".to_string())).value(), &33.to_string());
        });
    }

error message in my project:

---- phases::sense::clearinghouse::cache::tests::test_basic_cache_add_after_clear stdout ----
thread 'phases::sense::clearinghouse::cache::tests::test_basic_cache_add_after_clear' panicked at 
'assertion failed, expected Some(..), got None', src/phases/sense/clearinghouse/cache.rs:272:24

How to print `sync::cache` state

Suppose I have a sync::Cache<_, _> and I'd like to display its content(for example for debugging purposes). How can I do that? I didn't find any way.

Pointer is not multiple of 8 in test

When cloning the most recent commit 262f340 and running cargo test I see it fails with:

thread '<unnamed>' panicked at 'misaligned pointer dereference: address
must be a multiple of 0x8 but is 0x7f5f40010b05', src/bbloom.rs:122:22

Some background info which may be relevant

  • kernel: 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • compiler: rustc 1.71.0-nightly (7908a1d65 2023-04-17)
  • Linux Mint 21.1 Vera

Potential race condition with insert-wait-get

Hi folks, not sure it's a bug but I can't figure out otherwise.

We got a cache based on stretto (sync API) in our service with simple semantics:

  • try_insert_with_ttl() then wait()
  • get() to fetch the value

I have a test that does simple insert/wait/get sequence to check that given entry exists in cache and in our CI/CD (bazel) this test sometimes fails - get() reports that the key is missing. Problem is that I cannot reproduce this locally - it has 100% success rate even if I run it thousands of times.

I am creating a cache with a large enough max_cost and using TTL of 3600s to make sure it won't be evicted.

Would be grateful for any hint on how to debug this, maybe I'm doing something wrong. But it seems consistent with code in https://github.com/al8n/stretto/blob/main/examples/sync_example.rs

Examples hint the unhandy usage of static variables, leaving ownership to others

When trying to modify both examples as a novice exercise, I stumbled onto the fact that static string slices are used. This immediately goes wrong when I simply move the creation of the cache to a separate function, e.g. in the async case:

fn new_async_cache() -> AsyncCache<&str, &str> {
	let c : AsyncCache<&str, &str> = AsyncCache::new(12960, 1e6 as i64, tokio::spawn).unwrap();
	c
}

The compiler will now rightfully complains:

fn new_async_cache() -> AsyncCache<&str, &str> {
                                   ^     ^ expected named lifetime parameter
                                   |
                                   expected named lifetime parameter

Using references to string slices leaves the responsibility ownership to others: I would expect the cache takes up ownership, why create it otherwise? Do you accept a patch where I modify the structures to String types for both key and value?

How to use CacheCallback

How can I use CacheCallback? I need to delete the related file from the file system when the entry is evicted

[feature] Getting TTL value of entry

proposing a new method that gets ttl value of an entry.

let val_ref = cache.get(&key).unwrap();
let ttl: std::time::Duration = val_ref.ttl();
let value = val_ref.value();

How does the max_cost works?

Hi,
I have a question about the max_cost argument in the AsyncCache. When I set it to 2 and insert 2 item each have 1 cost, then one of the item is missing. Maybe I misinterpret something here. Code:

use stretto::AsyncCache;

#[tokio::main]
async fn main() {
    let c = AsyncCache::new(20, 2, tokio::spawn).unwrap();
    c.insert(1, 2, 1).await;
    c.insert(2, 3, 1).await;

    c.wait().await.unwrap();

    c.get(&1).unwrap(); // Get a panic
}

Possible deadlock in `wait` in `async_std` runtime

Code below may freeze on wait() call. It happens most of the time for me (macOS).

use async_std::task;
use stretto::AsyncCache;

fn main() {
    let max_cost = 10_000;
    let lru = AsyncCache::new(max_cost * 10, max_cost as i64, task::spawn)
        .expect("failed to create cache");

    task::block_on(task::spawn(async move {
        for i in 0..10_000 {
            println!("i = {i}, len before insert = {}", lru.len());

            let value = 123;
            let cost = 1;
            lru.insert(i, value, cost).await;
            lru.wait().await.unwrap(); // <-- freezes here
        }

        println!("done");
    }));
}

I also have a question. Is there an easy way to predict how many items a cache will fit (with parameters above) if we always use cost = 1? Seems like it's capping out at ~175 items in an example above (max_cost = 10_000, with each item cost = 1).

More idiomatic example in README

Instead of calling release or drop manually, the example could use scopes to achieve the same. For instance,

let v = c.get(&"a").unwrap();
assert_eq!(v.value(), &"a");
v.release();

could be changed to

{
    let v = c.get(&"a").unwrap();
    assert_eq!(v.value(), &"a");
}

Some feedback

Hey there, very cool library. I was looking it over and noticed a few miscellaneous things that might improve it. In no particular order:

There are a couple instances of sentinel values in the api.

  • Duration::ZERO indicates infinite ttl. This isn't what I would expect. Consider using Duration::MAX instead. (should make implementation simpler too)
  • When 0 is provided as an external cost, the provided Coster kicks in. This makes 0 a special value despite being a perfectly valid cost setting. Consider taking either an Option cost in setter methods, or using a separate setter method that doesn't take a cost parameter, and uses Coster to assign one.

It's advisable to make invalid states unrepresentable. Costs are expressed as i64, not u64. This struck me as odd. Are negative costs valid?

This call to tokio select has only one future, which doesn't make sense considering what select does (it races multiple futures).

Cargo clippy is a really handy tool for improving code. Among other things, it helps catch errors and keeps code idiomatic. I do recommend trying it out on this repo as can give really useful suggestions and teach you about rust idioms. It currently reports 49 suggestions for this codebase.

This last one might just be personal preference. Macros are immensely useful, but do have costs, like confusing rustfmt or sometimes confusing people trying to understand the code. That said, I see couple instances where use of custom macros can be reduced. This for example could be replaced with:

#[cfg(feature = "async")]
mod aync_test {

The consistent and idiomatic formatting is much appreciated by the way. Make it a lot easier to dive into the code.

Again I think this library is super cool and nicely written. If you'd like me to address any or all of these points via pr please let me know.

How to construct a TransparentKeyBuilder?

This looks nice!

I'm trying this code:

        let cache = stretto::Cache::new_with_key_builder(
            max_counters,
            MAX_CACHE_BYTES as i64,
            stretto::TransparentKeyBuilder::default(),
        )
        .expect("Create blockdir cache");

However, it complains that my key type doesn't implement Default, which it doesn't, and I don't really want to since there is no sensible default

error[E0277]: the trait bound `blockhash::BlockHash: std::default::Default` is not satisfied
   --> src/blockdir.rs:109:13
    |
109 |             stretto::TransparentKeyBuilder::default(),
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `std::default::Default` is not implemented for `blockhash::BlockHash`
    |
    = help: the trait `std::default::Default` is implemented for `stretto::TransparentKeyBuilder<K>`
    = note: required for `stretto::TransparentKeyBuilder<blockhash::BlockHash>` to implement `std::default::Default`

It doesn't seem inherently necessary that the key has a default? Could we just add a new method?

Stacks on close() while code compiled as cdylib

It's great that with your lib i can do such things

lazy_static! {
    static ref CACHE: Cache<String, TcpStream> = {
        let cache = Cache::new(12000, 1e6 as i64).unwrap();
        cache
    };
}

But there is 2 issues while compiling code as cdylib:

  1. Without cache.close() in DLL_PROCESS_DETACH (yeah, my lib is for windows), it just throws some kind of error, which rust detects while I loading this dll - STATUS_ACCESS_VIOLATION. In another programs this issues cause whole program crash
  2. I find out to prevent that issue i need to call cache.close() in order to clear some memory, stop additional threads, but now it just stacks on closing and do nothing until I kill process
    I really need your help, because i literally can't find another in-memory cache which can be defined as global value with lazy_static

Deadlock with AsyncCache when using wait and nested caches

We have encountered a deadlock situation when attempting to have nested caches, as in a key-key-value multi-dimensional map, where there are many spawned tasks trying to interact with the cache at once, and the tasks attempt to call wait after insert of the inner value.

All the tokio (non-blocking) threads become blocked on trying to obtain a lock on the cache. The await inside wait allows one of the other tasks destined to block to be scheduled, eventually causing deadlock. The code below, inner_cache.wait() happens while a read lock on lru is held. Avoiding the inner_cache.wait() call prevents the deadlock.

We have restructured our code so it no longer uses nested caches and avoids this situation, but I wanted to share this discovery. This may not be a usage pattern that is intended to work or be supported. But to my eyes it is not immediately obvious that this would deadlock. I speculate that if the CacheProcessor was spawned as a dedicated thread instead of a green thread it might be able to complete the wait group and unstick things (we didn't experience a deadlock with this same design pattern when using the sync Cache).

Here is an example test that can demonstrate the deadlock:

    #[tokio::test(flavor = "multi_thread")]
    async fn test_wait_with_read_lock() {
        let max_cost = 10_000;
        let lru = Arc::new(
            AsyncCacheBuilder::new(max_cost * 10, max_cost as i64)
                .set_ignore_internal_cost(true)
                .finalize(tokio::spawn)
                .expect("failed to create cache"),
        );

        let key = 1;
        let cost = 1;

        let mut tasks = Vec::new();

        for i in 1..=10_000 {
            let lru = Arc::clone(&lru);
            tasks.push(tokio::spawn(async move {
                let inner_cache = match lru.get(&key) {
                    Some(v) => v,
                    None => {
                        let inner_lru = AsyncCacheBuilder::new(max_cost * 10, max_cost as i64)
                            .set_ignore_internal_cost(true)
                            .finalize(tokio::spawn)
                            .expect("failed to create cache");
                        lru.insert(key, inner_lru, cost).await;
                        lru.wait().await.unwrap();
                        lru.get(&key).unwrap()
                    }
                };
                let inner_cache = inner_cache.value();
                inner_cache.insert(i, 123, cost).await;
                eprintln!("i = {i}, len before wait = {}", inner_cache.len());
                // removing this wait avoids deadlock
                inner_cache.wait().await.unwrap();
            }));
        }

        for task in tasks {
            task.await.unwrap();
        }
    }

Async use doesn't appear to work?

error[E0308]: mismatched types
  --> lit-attestation-service/src/handlers/attestation_intent.rs:34:9
   |
34 | /         Box::pin(async move {
35 | |             debug!(req = as_serde!(req); "AttestationIntentHandler");
36 | |
37 | |             // Create initial Attestation object.
...  |
77 | |             Ok(AttestationIntentResp { attestation, session_id })
78 | |         })
   | |__________^ one type is more general than the other
   |
   = note: expected struct `Box<dyn Any + std::marker::Send + Sync>`
              found struct `Box<dyn Any + std::marker::Send + Sync>`

error: higher-ranked lifetime error
  --> lit-attestation-service/src/handlers/attestation_intent.rs:34:9
   |
34 | /         Box::pin(async move {
35 | |             debug!(req = as_serde!(req); "AttestationIntentHandler");
36 | |
37 | |             // Create initial Attestation object.
...  |
77 | |             Ok(AttestationIntentResp { attestation, session_id })
78 | |         })
   | |__________^
   |
   = note: could not prove `Pin<Box<[async block@lit-attestation-service/src/handlers/attestation_intent.rs:34:18: 78:10]>>: CoerceUnsized<Pin<Box<(dyn futures::Future<Output = std::result::Result<AttestationIntentResp, lit_attestation::Error>> + std::marker::Send + 'b)>>>`

Which happens when i use async_trait or directly use BoxFuture. It all relates to this line:

CACHE.insert(session_id.clone(), Box::new(attestation.clone()), 1).await;

My cache is constructed:

pub static CACHE: Lazy<AsyncCache<String, Box<dyn Any + Send + Sync>>> =
    Lazy::new(|| AsyncCache::new(100, 10, tokio::spawn).expect("failed to create cache"));

Get-or-insert-with semantics / Dogpile Effect Mitigation

Async Caches benefit from being able to ensure that - if a value is not currently cached - a future imminently fulfilling the entry will be passed to the other asynchronous readers. I haven't found a way to implement this pattern without adding another layer of internal mutability within the existing Stretto cache; is there an officially endorsed way to deduplicate cache fulfillment requests? If not, I'd like to request the feature.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.