GithubHelp home page GithubHelp logo

erwanor / gcache2 Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 0.0 204 KB

Goroutine-safe cache library. Support LFU/LRU/ARC policies - partial support for more exotic cache eviction algos is WIP (RR/TinyLFU), expirable entries and snapshots.

License: MIT License

Go 100.00%
cache lfu arc lru storage expiration-control self-tuning tinylfu golang in-memory-caching

gcache2's Introduction

GCache2

GoDoc Go Report Card Waffle.io - Columns and their card count

Work-in-Progress: This repository will be the home of gcache2, a caching library for Golang. Ideally the only one you will ever need. To reach that ambitious goal, I am taking some time to scope and re-architecture gcache such that it becomes easier to add features, maintain existing ones, and get total testing coverage of the execution paths that matter. If you are interested in joining this effort, take a look at the issues! Stay tuned, friends. (:

Overview

The only embedded caching library you will ever need, in Golang.

Features

A variety of concurrency-safe caches are available:

  • LFU (Least-Frequently-Used)

  • LRU (Least-Recently-Used)

  • ARC (Adaptive-Replacement policy)

  • TinyLFU ("clever" admission policy)

  • RR (RandomReplacement)

Other features:

  • Expirable entries

  • Optional support for event handlers (on eviction, purge, set, loader).

  • Cache snapshots

Install

$ go get github.com/aaronwinter/gcache2

Authors

Erwan Ounn (main contributor/maintainer of gcache2)

Jun Kimura (main contributor of gcache)

gcache2's People

Contributors

aaronwinter avatar anpryl avatar bluele avatar pcman312 avatar sean- avatar

Stargazers

 avatar

gcache2's Issues

ARC: Evicted entries from T1 are not tracked in the ghost list

gc := gcache2.New(10).ARC().Build()

for i := 0; i <= 10; i++ {
    value := fmt.Sprintf("val%d", i)
    gc.Set(i, value)
}

Upon inspection of the cache internal state, it appears that:

  • As expected, all the key/values are stored in the tier 1 cache t1
  • The key 0 with value val0 is correctly evicted from t1

However, the ghost list b1 does not keep track of the evicted key 0.

See: ARC: A Self-Tuning, Low Overhead Replacement Cache - https://www.usenix.org/legacy/events/fast03/tech/full_papers/megiddo/megiddo.pdf

GCache2: Add a "Refresh" method to the Cache interface

GCache2 eviction policies are "event-driven". This means that we trigger an eviction when a user operation happen, be it a Get, Set, or getting the Length of the cache.

This is desirable most of the time but creates some peculiar situation some of the time. The reason for this is that GCache2 offers "expirable" entries according to wall-clock time.

There has been suggestions to deal with that. One of the most popular one seem to be the following: have GCache2 maintain a pool of workers tasked with cleaning the cache and trigger evictions when need be.

I don't think that this is a good idea to have such feature built-in. First the added complexity is not worth it. It's an overkill to engineer a solution like this just to improve the performance of a single feature.

However, I think we should make it easier for users to extend the library to support this without having them dive into the internals. The solution I have in mind is simple: offer a cache.Refresh method in the Cache interface. Such method would scan the cache and trigger evictions+clean-ups if need be. It would make it very easy for a user to create a worker routine tasked with refreshing the cache every N seconds or so.

Another benefit of this approach is that it fits well with #13, that is to embrace the principle of least-astonishment and stop having cache.Len have such huge potential side-effects.

type Cache interface {
// ... 
    Refresh() error
}

LFU: Optimize frequency list

Consider an LFU cache with capacity=10 i.e

gc, err := gcache.New(10).LFU().Build()

The internal cache.freqList is a linked list. Each of its entries represent a given frequency along with a map (bucket) with all the items with such frequency.

The issue is that we never clean-up empty entries, over time the list grows larger and larger. It's not great.

Example:

We fill-up our cache:

for key := 0; key < 10; key++ {
    value := fmt.Sprintf("val%d", key)
    gc.Set(key, value)
}

Folllowing this, our cache.freqList has one entry (freq=0) containing all the items with such frequency. In that case that's everything.

Let's simulate some read workload:

numReads := 1000
for read := 0; read < numReads; read++ {
     gc.Get(5)
}

Our cache.freqList should only have two entries because there are only two frequencies i.e freq=0 and freq=1000. The current behavior makes it keep tracks of all the frequency historically used. In other words, the cache.freqList will have 1001 entries, even though only two are needed to represent the whole state!

GCache2: Add a Debug method to test cache internals

Add a Debug method to the Cache interface such that internal metrics (like size of the b2 ghost list for ARC) are exposed.

The reason for this addition to the Cache interface is to make it easier to test the unexported fields of Cache and baseCache.

type Cache interface {
// ...
    Debug() map[string][]int
// ...
}

We will map TYPE_CACHE to a slice of ints representing the length of its different subcomponents.

GCache2: Add benchmarks and workload profiles for each cache type

We should establish a methodology to compare performance of GCache2 against other cache libraries and perform that benchmark.

GCache2 enable users to work with different types of caches. It would be useful to help users figure out what kind of caches is best suited for their workloads.

ARC: The eviction event-handler is not called when manually removing an entry

Good: (simple.go)

func (c *SimpleCache) remove(key interface{}) bool {
	item, ok := c.store[key]
	if ok {
		delete(c.store, key)
		if c.evictedFunc != nil {
			c.evictedFunc(key, item.value)
		}
		return true
	}
	return false
}

This is the internal remove function. Let us walk through it:

  1. We check if an entry is in the cache
  2. If it is we remove it (delete(c.store, key))
  3. We check if we have a defined eviction handler
  4. If we do we call it
  5. Otherwise we return

Bad:

func (c *ARC) Remove(key interface{}) bool {
	c.mu.Lock()
	defer c.mu.Unlock()

	elt, exists := c.store[key]
	if !exists {
		return false
	}

	if elt.parent != nil {
		elt.parent.Remove(elt.element)
	}

	delete(c.store, elt.key)
	c.size--
	return true
}

Goal: Make the good behavior consistent across the library

ARC: Increase testing depth/coverage

The test suite for the AR-cache should cover the basic operations (get/set/len..) as well as more niche features (autoload, custom expiration, etc.).

GCache2: Better documentation and comments

There are ways to automatically generate API docs. However, it will be easier to onboard new contributors by documenting the cache library internals. This should be an on-going effort.

GCache2: Have a consistent mutex locking policy

Example:

func (c *ARC) Remove(key interface{}) bool {
	c.mu.Lock()
	defer c.mu.Unlock()

	return c.remove(key)
}

However, for the Get method we do:

func (c *ARC) Get(key interface{}) (interface{}, error) {
	v, err := c.get(key, false)
	if err == KeyNotFoundError {
		return c.getWithLoader(key, true)
	}
	return v, err
}

further down the execution path, we see that the mutex is locked in getValue:

func (c *ARC) getValue(key interface{}, onLoad bool) (interface{}, error) {
	c.mu.Lock()
	defer c.mu.Unlock()

We should be careful about this. Without a consistent policy about where to lock/unlock a mutex in the execution path it can be easy making a mistake that compromise the concurrency-safe property of the library.

GCache2: Create debug/testing tools

GCache has become cumbersome to debug. We should make tools that allow maintainers to easily visualize the state of the cache (internal clocks, eviction list, keys/values etc.).

Maybe extend the Cache interface to add a Debug method doing exactly that?

GCache2: Do not "panic" if the cache parameters are erroneous.

Example:

gc := gcache2.New(-51).ARC().Build() will panic since -51 is not a valid cache size.

A better behavior would be to return an error and let the user handle it at their discretion.

gc, err := gcache2.New(-51).ARC().Build()
if err != nil {
    log.Fatalf("failed to initialize gcache2 (err=%s)", err)
    // retry logic or panic, at the user's discretion
}

LFU: Deterministic eviction

Consider an LFU cache with capacity=10 i.e:

gc, _ := gcache.New(10).LFU().Build()

We fill the cache:

for key := 0; key < 10; i++ {
    value := fmt.Sprintf("val%d", key)
    gc.Set(key, val)
}

Now that the cache is filled, the next call to Cache.Set should lead to the eviction of the Least-Frequently-Used entry. So far, so good.

When multiple entries share the same frequency (here all our entries have freq=0) we "pick" one at random. That is, the Cache.evict internal function will iterate over the internal map until we have evicted enough entries. Iterating over a map is done in a randomized order (per Golang maps semantics). We should explore whether that's a desirable behavior or if we should evict the least recent entry when everything else is equal.

Potential solution: Use a sorted map for the internal LFU maps.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.