GithubHelp home page GithubHelp logo

xsync's Introduction

GoDoc reference GoReport codecov

xsync

Concurrent data structures for Go. Aims to provide more scalable alternatives for some of the data structures from the standard sync package, but not only.

Covered with tests following the approach described here.

Benchmarks

Benchmark results may be found here. I'd like to thank @felixge who kindly ran the benchmarks on a beefy multicore machine.

Also, a non-scientific, unfair benchmark comparing Java's j.u.c.ConcurrentHashMap and xsync.MapOf is available here.

Usage

The latest xsync major version is v3, so /v3 suffix should be used when importing the library:

import (
	"github.com/puzpuzpuz/xsync/v3"
)

Note for v1 and v2 users: v1 and v2 support is discontinued, so please upgrade to v3. While the API has some breaking changes, the migration should be trivial.

Counter

A Counter is a striped int64 counter inspired by the j.u.c.a.LongAdder class from the Java standard library.

c := xsync.NewCounter()
// increment and decrement the counter
c.Inc()
c.Dec()
// read the current value
v := c.Value()

Works better in comparison with a single atomically updated int64 counter in high contention scenarios.

Map

A Map is like a concurrent hash table-based map. It follows the interface of sync.Map with a number of valuable extensions like Compute or Size.

m := xsync.NewMap()
m.Store("foo", "bar")
v, ok := m.Load("foo")
s := m.Size()

Map uses a modified version of Cache-Line Hash Table (CLHT) data structure: https://github.com/LPD-EPFL/CLHT

CLHT is built around the idea of organizing the hash table in cache-line-sized buckets, so that on all modern CPUs update operations complete with minimal cache-line transfer. Also, Get operations are obstruction-free and involve no writes to shared memory, hence no mutexes or any other sort of locks. Due to this design, in all considered scenarios Map outperforms sync.Map.

One important difference with sync.Map is that only string keys are supported. That's because Golang standard library does not expose the built-in hash functions for interface{} values.

MapOf[K, V] is an implementation with parametrized key and value types. While it's still a CLHT-inspired hash map, MapOf's design is quite different from Map. As a result, less GC pressure and fewer atomic operations on reads.

m := xsync.NewMapOf[string, string]()
m.Store("foo", "bar")
v, ok := m.Load("foo")

One important difference with Map is that MapOf supports arbitrary comparable key types:

type Point struct {
	x int32
	y int32
}
m := NewMapOf[Point, int]()
m.Store(Point{42, 42}, 42)
v, ok := m.Load(point{42, 42})

MPMCQueue

A MPMCQueue is a bounded multi-producer multi-consumer concurrent queue.

q := xsync.NewMPMCQueue(1024)
// producer inserts an item into the queue
q.Enqueue("foo")
// optimistic insertion attempt; doesn't block
inserted := q.TryEnqueue("bar")
// consumer obtains an item from the queue
item := q.Dequeue() // interface{} pointing to a string
// optimistic obtain attempt; doesn't block
item, ok := q.TryDequeue()

MPMCQueueOf[I] is an implementation with parametrized item type. It is available for Go 1.19 or later.

q := xsync.NewMPMCQueueOf[string](1024)
q.Enqueue("foo")
item := q.Dequeue() // string

The queue is based on the algorithm from the MPMCQueue C++ library which in its turn references D.Vyukov's MPMC queue. According to the following classification, the queue is array-based, fails on overflow, provides causal FIFO, has blocking producers and consumers.

The idea of the algorithm is to allow parallelism for concurrent producers and consumers by introducing the notion of tickets, i.e. values of two counters, one per producers/consumers. An atomic increment of one of those counters is the only noticeable contention point in queue operations. The rest of the operation avoids contention on writes thanks to the turn-based read/write access for each of the queue items.

In essence, MPMCQueue is a specialized queue for scenarios where there are multiple concurrent producers and consumers of a single queue running on a large multicore machine.

To get the optimal performance, you may want to set the queue size to be large enough, say, an order of magnitude greater than the number of producers/consumers, to allow producers and consumers to progress with their queue operations in parallel most of the time.

RBMutex

A RBMutex is a reader-biased reader/writer mutual exclusion lock. The lock can be held by many readers or a single writer.

mu := xsync.NewRBMutex()
// reader lock calls return a token
t := mu.RLock()
// the token must be later used to unlock the mutex
mu.RUnlock(t)
// writer locks are the same as in sync.RWMutex
mu.Lock()
mu.Unlock()

RBMutex is based on a modified version of BRAVO (Biased Locking for Reader-Writer Locks) algorithm: https://arxiv.org/pdf/1810.01553.pdf

The idea of the algorithm is to build on top of an existing reader-writer mutex and introduce a fast path for readers. On the fast path, reader lock attempts are sharded over an internal array based on the reader identity (a token in the case of Golang). This means that readers do not contend over a single atomic counter like it's done in, say, sync.RWMutex allowing for better scalability in terms of cores.

Hence, by the design RBMutex is a specialized mutex for scenarios, such as caches, where the vast majority of locks are acquired by readers and write lock acquire attempts are infrequent. In such scenarios, RBMutex should perform better than the sync.RWMutex on large multicore machines.

RBMutex extends sync.RWMutex internally and uses it as the "reader bias disabled" fallback, so the same semantics apply. The only noticeable difference is in the reader tokens returned from the RLock/RUnlock methods.

License

Licensed under MIT.

xsync's People

Contributors

costela avatar destel avatar fufuok avatar lucafmarques avatar merovius avatar mertakman avatar psyhatter avatar ptrcnull avatar puzpuzpuz avatar rfyiamcool avatar starsep avatar vearutop avatar veqryn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xsync's Issues

Map: add way to provide initial size hints

Since the difference in performance caused by map growth is noticeble, it would be great if we could initialize the map with size hints to try to avoid growths on add.

The fact that we spread keys across buckets makes this non-trivial, but maybe it would be enough to assume normally distributed keys and just grow the buckets proportionally to the initial hint?

I've just glossed over the code, but could try a PR if there's no obvious blocker I missed and nobody else beats me to it!

Built-in hash function for comparable types

Hey @puzpuzpuz
Thanks for such a cool concurrent data synchronization package

I really like the example of a hash function for a comparable structure, because there is no magic in it, math and only

xsync/example_test.go

Lines 16 to 32 in 051117f

type Person struct {
GivenName string
FamilyName string
YearOfBirth int16
}
age := xsync.NewTypedMapOf[Person, int](func(seed maphash.Seed, p Person) uint64 {
var h maphash.Hash
h.SetSeed(seed)
h.WriteString(p.GivenName)
hash := h.Sum64()
h.Reset()
h.WriteString(p.FamilyName)
hash = 31*hash + h.Sum64()
h.Reset()
binary.Write(&h, binary.LittleEndian, p.YearOfBirth)
return 31*hash + h.Sum64()
})

And this comment gave me the idea to get the built-in hashing function from the map, in a slightly tricky and insecure magical way

xsync/mapof.go

Lines 32 to 33 in 051117f

// are supported. That's because Golang standard library does not
// expose the built-in hash functions for interface{} values.

Here's what I got: https://goplay.tools/snippet/w91su4PCNMY

func main() {
	type t struct{ int }

	hash := HashFuncOf[t]()
	if hash == nil {
		log.Fatalln("hash func is nil")
	}

	seed := maphash.MakeSeed()
	a, b := hash.Sum64(seed, t{1}), hash.Sum64(seed, t{1})
	fmt.Println(a == b)

	// Output:
	// true
}

This is not safe because future versions of golang may change the internal arrangement of types.
On the other hand, it might be OK if you put this idea in a separate package like github.com/puzpuzpuz/xsync/unsafehasher and warn the user about the potential problem in the documentation or init function, like this:

// This init function will prevent the application from starting
// if the internals of golang types change in such a way that
// this package causes a panic.
func init() {
	type t struct{ bool; int; string; float64 }
	HashFuncOf[t]().Sum64(maphash.MakeSeed(), t{})
}

Also, the trick has a minus, which consists in the fact that the built-in hash function requires passing by pointer, which can negatively affect performance, but it can probably be reduced using sync.Pool

What do you think about supporting this way of creating hash functions?

And yet, it seems that this comment about string keys is superfluous here, apparently accidentally copied from the implementation of Map

xsync/mapof.go

Lines 31 to 34 in 051117f

// One important difference with sync.Map is that only string keys
// are supported. That's because Golang standard library does not
// expose the built-in hash functions for interface{} values.
type MapOf[K comparable, V any] struct {

Is it possible to use zero value of Map/MapOf struct?

hi,

followed over here from the golang issue.

was testing using the library as a drop in replacement for existing sync.Map and a generic sync.MapOf[K,V] wrapper that we use, but a big issue is that the zero value isn't valid, which would result in a lot of incompatible refactoring for our code. it worked great when i did initialize them.

ideally, something like this would not panic.

func TestMap_ZeroValueValid(t *testing.T) {
	EnableAssertions()
	m := new(Map)
	v := 42
	m.Store("foo", v)
	m.Store("foo", v)
	DisableAssertions()
}

I have a naive solution that uses a sync.Once, happy to open a PR if this is something you would consider changing.

all i did was just just add a sync.Once

type Map struct {
   totalGrowths int64
   totalShrinks int64
   resizing     int64          // resize in progress flag; updated atomically
   resizeMu     sync.Mutex     // only used along with resizeCond
   resizeCond   sync.Cond      // used to wake up resize waiters (concurrent modifications)
   table        unsafe.Pointer // *mapTable

   initialized sync.Once
}

adding init function

func (m *Map) init(sizeHint int) {
	m.initialized.Do(func() {
		m.resizeCond = *sync.NewCond(&m.resizeMu)
		var table *mapTable
		if sizeHint <= minMapTableCap {
			table = newMapTable(minMapTableLen)
		} else {
			tableLen := nextPowOf2(uint32(sizeHint / entriesPerMapBucket))
			table = newMapTable(int(tableLen))
		}
		atomic.StorePointer(&m.table, unsafe.Pointer(table))
	})
}

changing NewMapPresized function

func NewMapPresized(sizeHint int) *Map {
	m := &Map{}
	m.init(sizeHint)
	return m
}

then i added init to these entrypoints:

func (m *Map) Load(key string) (value interface{}, ok bool) {
	m.init(minMapTableCap)
...
}
func (m *Map) Clear() {
	m.init(minMapTableCap)
...
}
func (m *Map) Size() int {
	m.init(minMapTableCap)
...
}
func (m *Map) Range(f func(key string, value interface{}) bool) {
	m.init(minMapTableCap)
...
}
func (m *Map) doCompute(
	key string,
	valueFn func(oldValue interface{}, loaded bool) (interface{}, bool),
	loadIfExists, computeOnly bool,
) (interface{}, bool) {
	m.init(minMapTableCap)
}

and the same thing for the generic.

Store based on previous value

Hello,

First of all thanks for this package, I really like it and I'm using it in a few of my side projects.

I just wanted to ask something regarding the MapOf type. Is there a way to store for a key, based on the previous value, atomically?

For example, I'm doing this:

	var new []string
	old, ok := ms.data.Load(id)
	if ok {
		new = old
	}
	new = append(new, values...)
	ms.data.Store(id, new)

however I'd like to do it in an atomic way, everything in a function.

Thanks.

Is there any reason to not use xsync.Map?

I have an application that makes use of a lot of different sync.Maps. It mostly fits the recommended use case for sync.Map ("append-only maps" with a lot more reads than writes), but the benchmarks for xsync.Map do look better across the board, so I was thinking of maybe just swapping them all out. I see your blog post mentions "When it comes to reads which are the strongest point of sync.Map, both data structures are on par.", though in the repo README you say "Due to this design, in all considered scenarios Map outperforms sync.Map."

Just wondering if you can foresee any reason to not do this. This blog post mentions potential memory concerns, but I think that probably won't be an issue for me.

nilVal is not unique

The compiler is free to make new(struct{}) == new(struct{}), and indeed does so. This means Map can get confused if and returns nil if a user tries to store new(struct{}).

I suggest using the address of a global int, for example.

var nilVal int // use &nilVal

github.com/puzpuzpuz/[email protected]: invalid version: module contains a go.mod file, so module path must match major version ("github.com/puzpuzpuz/xsync/v2")

Thank you for releasing 2.0.0!!! This library is awesome :-)

Unfortunately, I can't install 2.0.0 right now:

$ go get github.com/puzpuzpuz/xsync
go: added github.com/puzpuzpuz/xsync v1.5.2

$ go get github.com/puzpuzpuz/[email protected]
go: github.com/puzpuzpuz/[email protected]: invalid version: module contains a go.mod file, so module path must match major version ("github.com/puzpuzpuz/xsync/v2")

Specialized versions of Map: CounterMap, ByteSliceMap, etc.

Current xsync.Map supports string keys and interface{} values. This is sufficient for many use cases, but not all of them. So, that's why specialized map data structures could be added to the library.

IMO the most interesting one is CounterMap which would hold int64s as both keys and values and support Inc/Dec operations. Such map may be used in certain niche use cases. Also, it should outperform any synchronized built-in map, as well as xsync.Map.

Another option might be a map that would support []byte slices as keys. This is not supported by the built-in maps (both map and sync.Map) since slices are not comparable.

I'll be collecting feedback to understand the demand for specialized map collections, so leave comments on this issue if you need them.

panic on Compute

Hi Andrei,

We were using the v2.5.1 for some time but upgraded to v3.0.1 recently.

After the upgrade, we hit the following panic:

panic: runtime error: index out of range [1678240084308591512] with length 0

goroutine 46 [running]:
github.com/puzpuzpuz/xsync/v3.(*MapOf[...]).doCompute(0x12d3540, 0x2e, 0xc0006e5f80, 0x0, 0x1)
	/home/runner/go/pkg/mod/github.com/puzpuzpuz/xsync/[email protected]/mapof.go:267 +0xa59
github.com/puzpuzpuz/xsync/v3.(*MapOf[...]).Compute(...)
	/home/runner/go/pkg/mod/github.com/puzpuzpuz/xsync/[email protected]/mapof.go:215

We haven't seen this panic for quite some time, and it occurred right after the upgrade, so it might be related to the recent changes.

The map has the following type, if it is going to help: xsync.NewMapOfPresized[int64, someStruct](1000), where someStruct has two fields, one of the type any and one of the type int

Consider faster hash function for Map

FNV-1a which is used in Map has mediocre performance. A faster non-cryptographic hash function with good enough hash code distribution might be a good replacement.

LoadOrCompute duplicate compute bug on v1 implementation

Hey, I know v1 is a bit out of date, just thought you might be interested in a bug I found on the latest v1 tag (v1.5.2), but which was fixed somewhere in v2.

https://github.com/puzpuzpuz/xsync/blob/v1.5.2/mapof.go#L237

					value := valueFn()
					var wv interface{} = valueFn()

During insertion of a new value, the valueFn is called twice instead of using value for wv. That's OK for LoadOrStore, but causes issues in LoadOrCompute, which may end up creating the new object twice and returns a different object from the one that was just inserted.

Not sure what the position on v1 maintenance is, but since upgrading to v2 is a breaking change, it might be worth releasing the fix as v1.5.3, WDYT?

Generic map

Now that type parameters are coming to Go, it might be an optimization opportunity to reduce usages of interface{} with generics.

[xsync.MapOf] Atomically delete a value based on value's state?

Hi! We are using xsync.MapOf for a couple of projects (great package btw 😉 ) and we recently have a new use case where we would need to delete a value atomically if and only if such value meets certain conditions. Basically something like this:

func main() {
	ms := xsync.NewMapOf[int, []string]()

	id := 42

	// Delete the value atomically if and only if someCondition and someOtherCondition are true
	v, deleted := ms.DeleteIf(id, func(value []string) bool {
		if someCondition(value) && someOtherCondition(value) {
			return true
		}

		return false
	})

	// If the value was deleted, we can use it safely since it's not in the map anymore
	if deleted {
		fmt.Printf("v: %v\n", v)
	}
}

Is there a way to do this with MapOf? If there isn't and the implementation is not very complex, would you accept a contribution for this?

`MapOf` - Document zero value in the context of `Load` and whether or not it applies to deleted items during `Range`

The godoc for MapOf.Load states the following (emphasis mine):

Load returns the value stored in the map for a key, or nil if no value is present. The ok result indicates whether value was found in the map.

This is probably a copy-and-paste from the non-generic map implementation, since it does not make sense in the context of a generic data structure when the return type is not a pointer. To be more concrete, there cannot be a nil return from MapOf[string, string].Load(), so it should probably read something like:

Load returns a value of type V along with an boolean indicating whether the key was present in the map. If the key was not present, value will be the zero value of type V.

This brings up a related question about MapOf.Range. The godoc there says (emphasis mine):

Range does not necessarily correspond to any consistent snapshot of the Map's contents: no key will be visited more than once, but if the value for any key is stored or deleted concurrently, Range may reflect any mapping for that key from any point during the Range call.

Here "any mapping" makes sense in the storage context --if one or more Ps stores a value while we are iterating, we might see any one of them or none at all-- but it is unclear what "any mapping" means in the case of deletion. What happens if an entry is deleted before iteration reaches it? Are we guaranteed to never see it, or could we get the zero value instead? The docs here should be specific, since range does not give us any indication if the value we're seeing might have been deleted. This means that, if it is important to us not to use a deleted value in Range, we need to synthesize a way to do it ourselves using the zero value of the type. In the case of primitive types, this would mean never storing the zero value directly, while in the case of structs, it would mean ensuring that at least one field is never stored with a zero value, so that we can can use it to check if the entry was deleted.

I'll also note that the Range docs say that it's safe to modify the map during range; does this apply to deletion?
Edit: I can answer my own last question here; this test seems to spell out pretty clearly that delete while ranging is fine.

Range Causes extra memory allocations

Hi,
Your project and hard-work is very much appreciated.
My use case needs the map to be ranged as fast as possible ( Maybe million of times every second and this behavior will continue for entire service duration) . However, the following line

xsync/mapof.go

Line 319 in f620963

bentries := make([]rangeEntry, 0, entriesPerMapBucket)

cause memory allocation always. ( causing unnecessary garbage and thus GC cycles )
This memory allocation can go away if a fixed array is used instead.

Thank you.

Add copy-on-write list

A COWList data structure might be helpful in certain use cases, like lists of listeners.

LoadOrCompute should document intended per-key behavior

I was wondering if xsync.MapOf holds fine grained locks (per key) on LoadOrCompute(). The documentation doesn't say much about how the function is expected to behave. e.g. would LoadOrCompute('keyA', slowInitializer(a)) block out LoadOrCompute('keyB', slowinitializer(b))?

Wrote a quick test for it:

package main

import (
	"github.com/puzpuzpuz/xsync/v3"
	"sync"
	"time"
)

func main() {
	testM := xsync.NewMapOf[string, int]()
	wg := sync.WaitGroup{}
	wg.Add(3)
	go func() {
		value, loaded := testM.LoadOrCompute("key-one", func() int {
			println("INNER 1 ran")
			time.Sleep(1 * time.Second)
			return 1
		})
		println("FIRST LOADED", loaded, value)
		wg.Done()
	}()
	go func() {
		time.Sleep(500 * time.Millisecond)
		value, loaded := testM.LoadOrCompute("key-one", func() int {
			println("INNER 2 ran")
			time.Sleep(1 * time.Second)
			return 2
		})
		println("SECOND LOADED", loaded, value)
		wg.Done()
	}()

	go func() {
		time.Sleep(100 * time.Millisecond)
		value, loaded := testM.LoadOrCompute("key-two", func() int {
			println("INNER 3 ran")
			time.Sleep(1 * time.Second)
			return 3
		})
		println("THIRD LOADED", loaded, value)
		wg.Done()
	}()

	wg.Wait()

}

Output is:

INNER 1 ran
INNER 3 ran
FIRST LOADED false 1
SECOND LOADED true 1
THIRD LOADED false 3

I expected the third go routine (operating on key-two) not to get blocked by the initializers for the other keys. Reading the code, I guess it makes sense that it does get blocked.

  1. Is there a better way to achieve what I want to achieve ? I was hoping I'd avoid making my own map of sync.Once's

  2. Would be helpful to clarify this behavior in the docs

[Question] CPU cache friendly byte slice

Thank you for your library, articles and talks. Could you please help me.

I have a logger that writes a lot of messages in parallel to stdout. The problem is that messages were written simultaneously and shuffled. So I had to add a mutex and lock before printing:

l.mu.Lock()
fmt.Fprintf(os.Stdout format, v...)
l.mu.Unlock()

I wish to avoid the locking because I need as small latency as possible. But I'm fine with some pauses and I don't care much about order of messages.
On my server I have 24 CPUs and each has it's own cache. I have an idea to make per-cpu list of byte slices and then periodically gather all of them and dump to a log.
Can this work in practice? I'm feeling that I'm reinventing some existing structure. Could you please recommend an optimal way to do that.

I see that the state striping is something that probably can help me.
I see that the library has a Counter that uses the same principle.

I also asked the question on SO https://stackoverflow.com/questions/74954360/cpu-cache-friendly-byte-slice

Map/MapOf.Range iteration order

Map in the standard Go library (afaik even sync.Map) has an indeterministic iteration order. On each iteration a random point is chosen. They have various reasons for that, performance (golang/go#8412) or preventing bad codes relying on fixed iteration order.

Currently Map/MapOf.Range methods have a fixed iteration order for a stable map. Would you consider applying some randomness similar to standard library? For example starting from a random bucket instead of the first bucket.

Nesting multiple xsync.MapOf maps not supported, or...?

Hello! I've been trying out the xsync.MapOf on one of my projects and I ran into a strange issue when nesting the maps. Am I doing something wrong, or did I just discover a bug? Here is a runnable example (using latest xsync version):

package main

import (
	"github.com/puzpuzpuz/xsync"
)

type tmp struct {
	Hello string
}

func main() {
	const stringKey = "test"
	const intKey = 123

	outerStringMap := xsync.NewMapOf[*xsync.MapOf[uint32, *tmp]]()

	innerIntMap, loaded := outerStringMap.LoadOrCompute(stringKey, func() *xsync.MapOf[uint32, *tmp] {
		return xsync.NewIntegerMapOf[uint32, *tmp]()
	})
	if loaded {
		panic("expected nothing")
	}

	innerIntMap.Store(intKey, &tmp{Hello: "world"})

	innerIntMap, loaded = outerStringMap.Load(stringKey)
	if !loaded {
		panic("expected existing map")
	}

	hello, loaded := innerIntMap.Load(intKey)
	if !loaded {
		panic("expected existing value")
	}

	if hello.Hello != "world" {
		panic("unexpected value")
	}
}
grongor@grongor-nb:~/xsync-test$ go run .
panic: expected existing value

goroutine 1 [running]:
main.main()
        /home/grongor/xsync-test/main.go:33 +0x112
exit status 2

image
image

Avoid locking in Map.Range

ATM Map.Range iterates over the map by acquiring bucket locks sequentially and copying their contents in an intermediate slice. Locking could be avoided via reading each bucket's contents with atomic snapshots as in the Get operation.

Alternative epoch-based design for Maps

Just like in sync.Map, Store operation in Map allocates an intermediate interface{} struct for each value (see #1 for more details). Hence, each value pointer stored in map buckets references the following chain: *interface{} -> interface{} -> value. If we could specialize the Map to a concrete value type (say, with upcoming generics language feature), we could get rid of the intermediate interface{} struct and the chain could be simplified to the following: *value -> value. There are multiple way to achieve this and this issues describes one of them.

The idea is to replace atomic snapshots (for which we need *interface{} pointers) with epoch-based reader-writer communication.

Currently the bucket layout looks like the following:

| bucket mutex	| keys array		| values array		| pointer to next bucket  |
| 8 bytes	| 24 bytes (3 pointers)	| 24 bytes (3 pointers)	| 8 bytes		  |
|<-					one cache line (64 bytes)			->|

The epoch-based design would need to change it to this:

| bucket mutex	| keys array		| values array		| epoch			  |
| 8 bytes	| 24 bytes (3 pointers)	| 24 bytes (3 pointers)	| 8 bytes		  |
|<-					one cache line (64 bytes)			->|

Notice that the linked list (chain of buckets) is now gone and replaced with the epoch counter (uint64). Each epoch counter is per buckets and assumes two phases:

  1. Even value - update finished phase
  2. Odd value - in progress update phase

Note: 64B assumes only 3 entries per bucket, so that the table will be able to hold only 3*number_of_buckets entries. That could be improved by used 128B bucket sizes which is enough to fit 7 entries. Although, that would have a slight impact on the write performance.

Writers should do the following when updating a bucket:

  1. Lock the bucket
  2. Increment the epoch atomically, so that it's odd (update in progress)
  3. Execute the usual write logic
  4. Decrement the epoch atomically, so that it's even (update finished)

Readers should execute the following seqlock-style logic when reading from a bucket:

  1. Read the epoch atomically. If it's odd, goto 1
  2. Scan the bucket. If the entry is not there, return nil, false
  3. If the entry is found, read the epoch atomically. It the epoch doesn't match with the value obtained at step 1, spin over the epoch along with reading the entry (we could do a goto 1 here, but that's not necessary)

Just like with atomic snapshots, the above design assumes that readers do not block each other and writers allow readers to scan the table concurrently (although, readers might need to do a few spins until they epoch check succeeds).

Readme is not clear (at least not for me :))

In the Readme it's said under the Map header:
MapOf[K, V] is an implementation with parametrized key and value types. While it's still a CLHT-inspired hash map, MapOf's design is quite different from Map. As a result, less GC pressure and fewer atomic operations on reads.

It is not clear to me if the last part is referring to MapOf (probably as it was just mentioned) or Map (as it is under that header and was also just mentioned), so which is it?

My needed usage from this library will be to have a map with a string key and maybe a boolean value, it will need to hold millions (can be over 10M) of key/value pairs and every record will be inserted once (without ever getting updated or deleted) and get read multiple times.
Which type do u recommend me to use?
And how can I calculate its RAM allocation?

Thanks!

Copy on write map

(First of all, cool project @puzpuzpuz! A faster and typed concurrent map like MapOf is something that should be part of go's stdlib)

Copy-on-write maps are useful for read-mostly performance-critical areas, as LoadOrCompute() calls only cost an atomic load for reads. Additionally, these semantics provide a snapshot view that allow fast and predictable iteration.

Typical use cases for these structures are config settings and topology management.

I can contribute it, if you think it belongs in this project.

In v2 a default hasher (of strings) is missing

Hi,

The xsync.StrHash64 function of v1 was very helpful and it is a shamed it is missing from v2. Any reason why you decided not to make hashString public?
I'll be happy to issue a pull request with such a change.

Cheers,
Shmul

Consider storing 6 bits of key hash codes in tagged pointers

Since 64-bit pointers in Go are 8-byte aligned, we could use free 2x3 bits to store a (MSB?) part of the key hash code as tagged pointers in key and value pointers. This should improve search performance since this way we won't need to calculate equality of all scanned keys, but only a part of them.

MPMC queue CPU usage through the roof

I have a cache where I have a queue for synchronized writes(due to eviction policies).
When I have a simple channel:

func (c *cache) run() {
  for {
    select {
        case task := <-c.queue:
           ...
        case <-c.ctx.Done():
          return
    }
 }
}

the CPU usage is 0%(the app is not under any load, its just idling). When I switch to the the MPMC queue:

func (c *cache) run() {
  for {
    task := c.queue.Dequeue()
    switch {
       case task.foo == "bar":
        ....
    }
 }
}

the CPU jumps to over 8%. When I have 5 apps running on localhost with this cache, this takes half of my CPUs while none of the apps are actually doing anything. When I looked at pprof data it looks to me like Go has issue with scheduling.

prof

I have switched to the queue mostly because in benchmarks it proved 30% saving in memory usage, compared to channel and 3x increase in performance. But when I actually run the app, this happens. So I switched back to channel and CPU is 0% again. So this is the culprit. I do not understand why this is because I have seen this queue used in other caches and I doubt they had the same issues as I do. Maybe I should not use Dequeue()? Or not in this manner in a for loop? But then how should I wait for work?

Doesn't compile on 32bit

This function created a compile error when trying to build with arm32 (CGO_ENABLED=0 GOOS=linux GOARCH=arm go build test.go)

xsync/util.go

Lines 18 to 24 in d5f0f9d

// murmurhash3 64-bit finalizer
func hash64(x uintptr) uint64 {
x = ((x >> 33) ^ x) * 0xff51afd7ed558ccd
x = ((x >> 33) ^ x) * 0xc4ceb9fe1a85ec53
x = (x >> 33) ^ x
return uint64(x)
}

Following error is shown:

#0 7.331 /go/pkg/mod/github.com/puzpuzpuz/[email protected]/util.go:20:24: 0xff51afd7ed558ccd (untyped int constant 18397679294719823053) overflows uintptr
#0 7.331 /go/pkg/mod/github.com/puzpuzpuz/[email protected]/util.go:21:24: 0xc4ceb9fe1a85ec53 (untyped int constant 14181476777654086739) overflows uintptr

This can likely be fixed by casting the argument to uint64 before performing the operations.

Map Leaking entries...

Sorry about the non-specific, but I just spent 2 days chasing down a bug, which turned out to be a problem where entries are "leaked" from the maps.

I have tried making a standalone reproducer, but I have been unable to trace it down. Finally I just replaced it with a "dumb map[uint64]*value" with a mutex for operations. This doesn't leak entries.

It only leaks very rarely, and I've only seen it under high loads.

Init: xsync.NewIntegerMapOfPresized[uint64, *muxClient](1000) - value is never copied or anything similar.

Functions used inside the test:

Load(id uint64) (*muxClient, bool)
LoadAndDelete(id uint64) (*muxClient, bool)
LoadOrStore(id uint64, v *muxClient) (*muxClient, bool)
Delete(id uint64)

Naturally all operations are used on the map concurrently.

Functions not use while reproducing:

Range(fn func(key uint64, value *muxClient) bool)
Clear()
Size() int 

Observations:

  • All values are stored via LoadOrStore.
  • Loading the value with Load after the LoadOrStore completed shows the value exists.
  • Some time elapses while RPC processes.
  • When the response returns the entry is gone.
  • All delete operations are logged. None match the entry.

Sorry for the vagueness. I tried setting up a reproducer, but I was not able to do a standalone reproducer.

Using github.com/puzpuzpuz/xsync/v2 v2.5.0 - go version go1.21.0 windows/amd64.

`xsync.MapOf.Load` may lead to key allocation, unlike `sync.Map` and `xsync.Map`

Hi,

First, the blog posts were very helpful, and the library was sound. Thanks for writing all this!

We recently had a case where sync.RWMutex wasn't performing, so we migrated to sync.Map. This made me look around for (non-allocating) alternatives, and I found your library—it looks great!

However, it seems like we hit a performance hit that prevents us from migrating - (I'm guessing) the load function is too large, it's not inlined and it's triggering:

  • an extra alloc on our hot path
  • more painfully, it triggers an alloc per key when we Range over the map

I have two reproductions:

To explain the use case, we dynamically build a key using [256]byte and look it up on the map. Every few seconds, we iterate over the map and collect statistics. We have around 100_000 keys in the map, so the range can take quite a bit of time.

Interestingly, even with the extra alloc your map is still faster in our benchmarks, so a pretty good result :)

GraphNewCallNewElementParallelWithExport/-100-12                18.00µ ± 9%   13.01µ ±  7%  -27.75% (p=0.000 n=10)
GraphNewCallNewElementParallelWithExport/-75-12                 13.51µ ± 4%   10.18µ ±  6%  -24.65% (p=0.000 n=10)
GraphNewCallNewElementParallelWithExport/-50-12                 9.228µ ± 6%   7.272µ ±  3%  -21.20% (p=0.000 n=10)
GraphNewCallNewElementParallelWithExport/-25-12                 5.184µ ± 6%   4.375µ ±  7%  -15.59% (p=0.000 n=10)
GraphNewCallNewElementParallelWithExport/-10-12                 3.189µ ± 3%   2.772µ ±  6%  -13.08% (p=0.000 n=10)
GraphNewCallNewElementParallelWithExport/-5-12                  2.520µ ± 7%   2.342µ ±  3%   -7.08% (p=0.007 n=10)
GraphNewCallNewElementParallelWithExport/-2-12                  2.115µ ± 5%   2.171µ ±  4%   +2.67% (p=0.050 n=10)
GraphNewCallNewElementParallelWithExport/-1-12                  2.080µ ± 3%   2.111µ ±  6%        ~ (p=0.089 n=10)
GraphNewCallNewElementParallelWithExport/-0-12                  2.023µ ± 3%   2.131µ ±  2%   +5.36% (p=0.000 n=10)

(100->0 is the % of new elements).

I tried to hack through the code a bit, but it doesn't seem immediately trivial.

  • unrolling the inner loop didn't change anything
  • I tried to simplify the hasher a bit (fewer closures?) but that didn't change much.

I don't know if solving this is achievable, but I certainly found this interesting.

Panic while storing

I was trying to benchmark the map implementation against sync.Map with concurrent load, but could not because of panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x56b27a]

goroutine 73 [running]:
github.com/puzpuzpuz/xsync.(*Map).doStore(0xc00009e6c0, {0xc0000b4028, 0x7}, {0x59bb40, 0xc0000d6040}, 0x0)
	/home/runner/go/pkg/mod/github.com/puzpuzpuz/[email protected]/map.go:190 +0x21a
github.com/puzpuzpuz/xsync.(*Map).Store(...)
	/home/runner/go/pkg/mod/github.com/puzpuzpuz/[email protected]/map.go:163
github.com/veartutop/cachex_test.xSyncMap.make({0x0, 0x0, 0x4800}, 0x0, 0xf4240)
	/home/runner/work/cache/cache/benchmark_test.go:330 +0x12a
github.com/veartutop/cachex_test.Benchmark_concurrent.func1(0xc0000cb200)

Reproducer: https://github.com/vearutop/cache/pull/6/checks?check_run_id=3389323336
https://github.com/vearutop/cache/blob/xsync/benchmark_test.go#L31
https://github.com/vearutop/cache/blob/xsync/benchmark_test.go#L312-L364

The benchmark stores 1e6 entries into the map and then reads "random" keys from multiple goroutines with occasional 10% writes.

Same benchmark passes for sync.Map, please help to check if I'm misusing this lib.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.