alphadose / haxmap Goto Github PK
View Code? Open in Web Editor NEWFastest and most memory efficient golang concurrent hashmap
License: MIT License
Fastest and most memory efficient golang concurrent hashmap
License: MIT License
Go 1.19, ubuntu 22.04
Run with -test.cpu 4
func TestInfiniteLoop(t *testing.T) {
t.Run("infinite loop", func(b *testing.T) {
m := haxmap.New[int, int](512)
for i := 0; i < 112050; i++ {
if i > 112024 {
fmt.Printf("%d\n", i)
m.Set(i, i) // set debug point here and step into until .inject
} else {
m.Set(i, i)
}
}
})
}
Hi, I saw that the sorted linked list is based on Harris's linked list, but according to my understanding, it's not correctly written. Harris's linked list is based on 2 assumptions: 1. Atomic operation on node.next can see changes to node.delete. 2. node.next on a deleted node is still valid and can be used to track to the original next. In this implementation, I see that: 1. node.delete and node.next can't be dealt with in a single atomic operation. This is problematic, consider: node.delete can change immediately before(after the if checks) or during a CAS operation on node.next, and during this process, a successful physical deletion can happen before the CAS operation completes/starts, therefore, the new node is linked onto a deleted node. This is my understanding, correct me if I'm wrong.
I encountered the above problem in my initial attempts to implement such a hashmap using Harris's linked list.
With this in mind, I designed a few cases that can reflect the above problem. However, I'm not sure whether the failures in the below cases are solely caused by the above reason or is/are caused by other problems. It appears to me that at least on my end Case1 has some other problem because a given key is guaranteed to fail. Anyway, let's see these cases.
Case 1:
func BenchmarkHaxMap_Case1(b *testing.B) {
b.StopTimer()
wg := sync.WaitGroup{}
for i := 0; i < b.N; i++ {
M := haxmap.New[int, int]()
b.StartTimer()
for k := 0; k < iter0; k++ {
wg.Add(1)
go func(l, h int) {
for j := l; j < h; j++ {
M.Set(j, j)
}
for j := l; j < h; j++ {
_, a := M.Get(j)
if !a {
b.Error("key doesn't exist", j)
}
}
for j := l; j < h; j++ {
x, _ := M.Get(j)
if x != j {
b.Error("incorrect value", j, x)
}
}
wg.Done()
}(k*elementNum0, (k+1)*elementNum0)
}
wg.Wait()
b.StopTimer()
}
}
Case 2:
func BenchmarkHaxMap_Case3(b *testing.B) {
b.StopTimer()
wg := &sync.WaitGroup{}
for a := 0; a < b.N; a++ {
M := haxmap.New[int, int]()
b.StartTimer()
for j := 0; j < iter0; j++ {
wg.Add(1)
go func(l, h int) {
defer wg.Done()
for i := l; i < h; i++ {
M.Set(i, i)
}
for i := l; i < h; i++ {
_, x := M.Get(i)
if !x {
b.Errorf("not put: %v\n", i)
}
}
for i := l; i < h; i++ {
M.Del(i)
}
for i := l; i < h; i++ {
_, x := M.Get(i)
if x {
b.Errorf("not removed: %v\n", i)
}
}
}(j*elementNum0, (j+1)*elementNum0)
}
wg.Wait()
b.StopTimer()
}
}
const (
iter0 = 1 << 3
elementNum0 = 1 << 10
)
If you increase the data size, this problem becomes more severe. You can delete all the benchmark and timing things.
Modifying these tests to sync.Map or an ordinary map with Mutex will show that no failures happen. In addition, cornelk's hashmap also fails at these tests.
This repository uses code from other libraries without respecting their copyright.
For example, the file hash64.go
contains code that is copied from https://github.com/cespare/xxhash/
The license clearly states:
Copyright (c) 2016 Caleb Spare
MIT License
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
As I understand it golang's preemption model is that a goroutine may be interrupted either at "wait" boundaries (such as locks, sleeps) & when making system calls, or at function call boundaries where the stack may increase. In other words, the preemption model does allow for preemption between golang statements involving a function call.
Therefore, this code in GetOrSet() contains a race, no?
if elem.key == key && !elem.isDeleted() {
actual, loaded = *elem.value.Load(), true
Is your thinking that the race is of no consequence in a concurrent map because it would be no worse, in that case, than in the alternative, having scheduled the Set prior to the delete implicated in the race?
In atomic.go:
type noCopy struct{}
func (c *noCopy) Lock() {}
func (c *noCopy) Unlock() {}
type atomicUint32 struct {
_ noCopy
v uint32
}
is this filler (_ noCopy) for semantics or does it actually prevent copying?
LoadAndDelete func deletes the value for a key, returning the previous value if any. The loaded result reports whether the key was present
// your custom hash function
func customStringHasher(s string) uintptr {
return uintptr(xxh3.HashString(s))
}
test code:
hmap := haxmap.New[int64, interface{}](32)
go func() {
var idx int64
for i := 1; i <= 300; i++ {
time.Sleep(time.Millisecond * 250)
idx++
hmap.Set(idx, idx)
idx++ // Accelerated progress
hmap.Set(idx, idx)
fmt.Println("new..........", idx-1, idx)
}
}()
go func() {
var idx int64 = 1
for {
if _, ok := hmap.Get(idx); ok {
fmt.Println("get_del...........", idx)
hmap.Del(idx)
idx++
}
time.Sleep(time.Millisecond * 10)
}
}()
time.Sleep(time.Hour)
After looping for a while, no more data is obtained
Development Environment: Windows10(x64), go1.19.1
Fillrate is not calculated correctly.
m := New[int, any]()
for i := 0; i < 1000; i++ {
m.Set(i, nil)
}
for i := 0; i < 1000; i++ {
m.Del(i)
}
fmt.Println(m.Fillrate())
// output: 38
It is caused by the index which is set when an element is removed from its index.
func (m *Map[K, V]) removeItemFromIndex(item *element[K, V]) {
for {
data := m.metadata.Load()
index := item.keyHash >> data.keyshifts
ptr := (*unsafe.Pointer)(unsafe.Pointer(uintptr(data.data) + index*intSizeBytes))
next := item.next()
if next != nil && item.keyHash>>data.keyshifts != index {
next = nil // do not set index to next item if it's not the same slice index
}
atomic.CompareAndSwapPointer(ptr, unsafe.Pointer(item), unsafe.Pointer(next))
...
}
}
the index should be set to the next element only if the next element has the same index value
...
if next != nil && next.keyHash>>data.keyshifts != index {
next = nil // do not set index to next item if it's not the same slice index
}
...
and also it would avoid the scenario to be emerged I mentioned in #33 (comment)
Thank you for excellent map library.
It looks that several fixes and new APIs are added after v1.2.0.
Do you have any plan to release them ?
For concurrent programs a compare and swap value would be really neat, as you might have a race if multiple threads are doing:
value, _ := map.Load(key)
value.Modify()
map.Store(key, value)
The pattern could be changed to
for {
oldvalue, _ := map.Load(key)
newvalue := oldvalue
newvalue.Modify()
success := map.CompareAndStore(key, oldvalue, newvalue)
if success {
break
}
}
Hi it's a great project but miss LoadOrStore func.
Can just Get first then set?
In the case of a lot of data, I need to stop this iteration after I get a piece of data.
But there is no way to top in the current iteration.
syncmaps.Range(func(key, value any) bool {
if ok {
return false
}
return true
})
Like in sync.Map I can return false to stop the iteration
mep := haxmap.New[int, string]()
mep.Set(1, "one")
println(mep.Len()) // 1
mep.Del(1) // delegate key 1
println(mep.Len()) // 0
// Still can traverse the key/value pair ?
mep.ForEach(func(key int, value string) bool {
fmt.Printf("Key -> %d | Value -> %s\n", key, value)
return true
})
// Print: Key -> 1 | Value -> one
I mean, I have deleted key 1, mep.len() is already 0, why ForEach
still iterates over the deleted key-value pair? How to actually remove them from mep
?
Code:
package main
import (
"context"
"fmt"
"math/rand"
"time"
"github.com/alphadose/haxmap"
)
type data struct {
id int
exp time.Time
}
func main() {
c := haxmap.New[int, *data](256)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go func() {
t := time.NewTicker(time.Second * 2)
defer t.Stop()
var count int
for {
select {
case <-t.C:
count = 0
c.ForEach(func(s int, b *data) bool {
if time.Now().After(b.exp) {
c.Del(s)
count++
}
return true
})
fmt.Println("Del", count)
case <-ctx.Done():
return
}
}
}()
for i := 0; i < 20000; i++ {
c.Set(i, &data{id: i, exp: time.Now().Add(time.Millisecond * time.Duration((1000 + rand.Intn(800))))})
time.Sleep(time.Microsecond * time.Duration(rand.Intn(200)+10))
if i%100 == 0 {
fmt.Println(i)
}
}
time.Sleep(time.Second * 3)
fmt.Println("LEN", c.Len())
}
Running the above code, setting the new Key will stop.
I upgraded from v0.1.0 to v0.3.1 and it seems to hang in the set command. The CPU stays stuck at 100% and the application does not run but haxmap internals are the only things running. I did a profiler in this condition and here is the image. When I downgraded v0.1.0, all was ok. Problem appears to exist for anything above v.0.1.0
Hi, thank you for your excellent map. How can I use google/UUID for the key? or can you add this feature to the map?
In which cases using this package really needed vs standard go map with mutex?
For example if I have 10 entries?
fantastic library , can you add a drop all/clear method.
Say after n period of time, I want to clear out everything but not resize down .
Hi, i was trying to benchmark haxmap vs map,
https://github.com/kokizzu/kokizzu-benchmark/blob/master/assoc/go-haxmap/haxmap.go
vs
https://github.com/kokizzu/kokizzu-benchmark/blob/master/assoc/map.go
the diff
https://pastebin.com/diff/V3Y04Uha
but haxmap took like 51s vs 14s using map
time go run go-haxmap/haxmap.go
6009354 6009348 611297
36186112 159701682 23370001
CPU: 51.86s Real: 26.87s RAM: 2 386 608KB
time go run map.go
6009354 6009348 611297
36186112 159701682 23370001
CPU: 14.29s Real: 12.43s RAM: 2 271 672KB
Map type does not implement json.Marshaler and json.Unmarshaler.
go 1.19
func TestMakeHaxmap(t *testing.T) {
for f := 1; f < 1000000; f *= 5 {
m := haxmap.New[int, string]()
t.Logf("creating %d", f)
for i := 0; i < f; i++ {
m.Set(i, fmt.Sprintf("a%d", i))
}
t.Logf("size: %d", m.Size())
}
}
Randomly hangs forever...
The map.Grow
method's comment states:
Grow resizes the hashmap to a new size, gets rounded up to next power of 2
To double the size of the hashmap use newSize 0
This function returns immediately, the resize operation is done in a goroutine
No resizing is done in case of another resize operation already being in progress
Growth and map bucket policy is inspired from https://github.com/cornelk/hashmap
But commit d071dd5f749f86017a32bc126ea40eaade5f3dfc changed map.Grow
to be sync, making this part of the comment inaccurate:
This function returns immediately, the resize operation is done in a goroutine
Pls, help
Very nice library for concurrent maps! For scenarios where you need to delete keys one at a time (not batching), the current performance makes it unusable.
I have an analysis application (https://github.com/lkarlslund/adalanche) and I've tried replacing some of the maps with yours (adding unsafe.Pointer in my fork).
With a nasty workaround where I don't delete keys, but set the value to a deleted flag, it works fairly good. But this is not the way.
Also looking at your code, I'm curious how you distinguish from a hash which is ^0 and the "marked" value which is the same?
google "golang concurrent map" or "golang lockfree map" and this project does not come up. I only found it after following various links in issues reported on golang's sync.Map. You may want to do whatever's necessary to bring more attention to this rep (titles? README.md content? etc)
... I'm bothering to say this because some of the other projects that come up are riddled with bugs ...
The following test fails with h.Get(1)
returning that the key:value entry does not exist:
func TestHaxmap(t *testing.T) {
h := haxmap.New[int, string]()
for i := 1; i <= 10; i++ {
h.Set(i, strconv.Itoa(i))
}
for i := 1; i <= 10; i++ {
h.Del(i)
}
for i := 1; i <= 10; i++ {
h.Set(i, strconv.Itoa(i))
}
for i := 1; i <= 10; i++ {
id, ok := h.Get(i)
assert.Equal(t, strconv.Itoa(i), id)
assert.True(t, ok)
}
}
I'm assuming it has to do with the lazy delete, where the h.Del(i)
only flags it and h.Set(i)
deletes the entry rather than setting it, but I haven't looked too deeply into it. My local environment is an M1 Macbook with Go version 1.19.
I update the vendor with the latest main branch and wrote this test code to simulate the situation.
func TestDebug(t *testing.T) {
var wg sync.WaitGroup
m := haxmap.New[string, struct{}]()
acquire := func(key string) (free func(), acquired bool) {
if _, loaded := m.GetOrSet(key, struct{}{}); loaded {
return nil, false
}
free = func() {
m.Del(key)
}
return free, true
}
n := 1000
key := "key"
var sum int32
wg.Add(n)
for i := 0; i < n; i++ {
go func(idx int) {
defer wg.Done()
_, acq := acquire(fmt.Sprintf("%d", idx))
require.True(t, acq)
free, acquired := acquire(key)
t.Log(acquired)
if !acquired {
return
}
// makes sure that there're only one thread has been acquired.
require.True(t, atomic.CompareAndSwapInt32(&sum, 0, 1), atomic.LoadInt32(&sum))
// marks there's no thread is acquired in advance.
require.True(t, atomic.CompareAndSwapInt32(&sum, 1, 0))
free()
}(i)
}
wg.Wait()
}
When run go test without -race it shows fine, no errors. But if enable race detection, it will fail like
sync_test.go:94: false
sync_test.go:100:
Error Trace: /Users/cmgs/.go/src/github/projecteru2/agent/utils/sync_test.go:100
/Users/cmgs/.go/src/github/projecteru2/agent/utils/asm_arm64.s:1172
Error: Should be true
Test: TestDebug
Messages: 1
sync_test.go:100:
Error Trace: /Users/cmgs/.go/src/github/projecteru2/agent/utils/sync_test.go:100
/Users/cmgs/.go/src/github/projecteru2/agent/utils/asm_arm64.s:1172
Error: Should be true
Test: TestDebug
Messages: 1
FAIL
FAIL github.com/projecteru2/agent/utils 0.208s
FAIL
The test command is (I put this code under the utils pkg)
GOOS=darwin GOARCH=arm64 go test -race -count=1 -timeout 120s -run TestDebug ./utils/...
Not sure what happen, because the this error only shows under GOOS=darwin and GOARCH=arm64 (I use M1 macbook). can pass in linux env(https://github.com/projecteru2/agent/actions/runs/3359505108/jobs/5567595644).
Any ideas?
Hello! Thank you for your awesome package and for the recent (important) GetOrSet functionality.
I want to ask if you can allow (and implement) the usage of function (as the value constructor) which will be called just once (to avoid unnecessary value construction logic everytime using GetOrSet).
The second thing is to ask to allow the usage of types which underlying type is among of those which your hashable supports. For example, type SeqId uint64.
Map is resizing unnecessarily after same operations although the map length is always same.
m := New[int, any]()
fmt.Printf("len: %d, indexCount: %d\n", m.Len(), len(m.metadata.Load().index))
for i := 0; i < 10000; i++ {
m.Set(i, nil)
m.Del(i)
}
fmt.Printf("len: %d, indexCount: %d\n", m.Len(), len(m.metadata.Load().index))
the output:
len: 0, indexCount: 8
len: 0, indexCount: 16384
I wanted to initialize a haxmap B with the same size as haxmap A with the following code:
B := haxmap.New[string, struct{}](A.Len())
which worked for most cases except when A.Len() is 0. When A has 0 key-values, the initializer would panic.
Would be awesome if this is handled 🙏
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.