alphadose / haxmap Goto Github PK

View Code? Open in Web Editor NEW

832.0 14.0 43.0 121 KB

Fastest and most memory efficient golang concurrent hashmap

License: MIT License

Go 100.00%

concurrent fast go golang hashmap lock-free map memory-efficient thread-safe

haxmap's People

Contributors

Stargazers

Watchers

haxmap's Issues

Infinite loop test case

Go 1.19, ubuntu 22.04

Run with -test.cpu 4

func TestInfiniteLoop(t *testing.T) {
	t.Run("infinite loop", func(b *testing.T) {
		m := haxmap.New[int, int](512)
		for i := 0; i < 112050; i++ {
			if i > 112024 {
				fmt.Printf("%d\n", i)
				m.Set(i, i) // set debug point here and step into until .inject
			} else {
				m.Set(i, i)
			}
		}
	})
}

If there were any considering support LRU?

Hi, I saw that the sorted linked list is based on Harris's linked list, but according to my understanding, it's not correctly written. Harris's linked list is based on 2 assumptions: 1. Atomic operation on node.next can see changes to node.delete. 2. node.next on a deleted node is still valid and can be used to track to the original next. In this implementation, I see that: 1. node.delete and node.next can't be dealt with in a single atomic operation. This is problematic, consider: node.delete can change immediately before(after the if checks) or during a CAS operation on node.next, and during this process, a successful physical deletion can happen before the CAS operation completes/starts, therefore, the new node is linked onto a deleted node. This is my understanding, correct me if I'm wrong.

I encountered the above problem in my initial attempts to implement such a hashmap using Harris's linked list.

With this in mind, I designed a few cases that can reflect the above problem. However, I'm not sure whether the failures in the below cases are solely caused by the above reason or is/are caused by other problems. It appears to me that at least on my end Case1 has some other problem because a given key is guaranteed to fail. Anyway, let's see these cases.
Case 1:

func BenchmarkHaxMap_Case1(b *testing.B) {
	b.StopTimer()
	wg := sync.WaitGroup{}
	for i := 0; i < b.N; i++ {
		M := haxmap.New[int, int]()
		b.StartTimer()
		for k := 0; k < iter0; k++ {
			wg.Add(1)
			go func(l, h int) {
				for j := l; j < h; j++ {
					M.Set(j, j)
				}
				for j := l; j < h; j++ {
					_, a := M.Get(j)
					if !a {
						b.Error("key doesn't exist", j)
					}
				}
				for j := l; j < h; j++ {
					x, _ := M.Get(j)
					if x != j {
						b.Error("incorrect value", j, x)
					}
				}
				wg.Done()
			}(k*elementNum0, (k+1)*elementNum0)
		}
		wg.Wait()
		b.StopTimer()
	}
}

Case 2:

func BenchmarkHaxMap_Case3(b *testing.B) {
	b.StopTimer()
	wg := &sync.WaitGroup{}
	for a := 0; a < b.N; a++ {
		M := haxmap.New[int, int]()
		b.StartTimer()
		for j := 0; j < iter0; j++ {
			wg.Add(1)
			go func(l, h int) {
				defer wg.Done()
				for i := l; i < h; i++ {
					M.Set(i, i)
				}

				for i := l; i < h; i++ {
					_, x := M.Get(i)
					if !x {
						b.Errorf("not put: %v\n", i)
					}
				}
				for i := l; i < h; i++ {
					M.Del(i)

				}
				for i := l; i < h; i++ {
					_, x := M.Get(i)
					if x {
						b.Errorf("not removed: %v\n", i)
					}
				}

			}(j*elementNum0, (j+1)*elementNum0)
		}
		wg.Wait()
		b.StopTimer()
	}

}

const (
	iter0       = 1 << 3
	elementNum0 = 1 << 10
)

If you increase the data size, this problem becomes more severe. You can delete all the benchmark and timing things.

Modifying these tests to sync.Map or an ordinary map with Mutex will show that no failures happen. In addition, cornelk's hashmap also fails at these tests.

Copyright violation

This repository uses code from other libraries without respecting their copyright.

For example, the file hash64.go contains code that is copied from https://github.com/cespare/xxhash/

The license clearly states:

Copyright (c) 2016 Caleb Spare

MIT License

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

GetOrSet race?

As I understand it golang's preemption model is that a goroutine may be interrupted either at "wait" boundaries (such as locks, sleeps) & when making system calls, or at function call boundaries where the stack may increase. In other words, the preemption model does allow for preemption between golang statements involving a function call.

Therefore, this code in GetOrSet() contains a race, no?

		if elem.key == key && !elem.isDeleted() {
			actual, loaded = *elem.value.Load(), true

Is your thinking that the race is of no consequence in a concurrent map because it would be no worse, in that case, than in the alternative, having scheduled the Set prior to the delete implicated in the race?

[Question]: why are there empty structs in atomic?

In atomic.go:

type noCopy struct{}

func (c *noCopy) Lock()   {}
func (c *noCopy) Unlock() {}

type atomicUint32 struct {
  _ noCopy
  v uint32
}

is this filler (_ noCopy) for semantics or does it actually prevent copying?

Missed func LoadAndDelete

LoadAndDelete func deletes the value for a key, returning the previous value if any. The loaded result reports whether the key was present

now we can use xxh3 to speed up

https://github.com/zeebo/xxh3

// your custom hash function
func customStringHasher(s string) uintptr {
return uintptr(xxh3.HashString(s))
}

Can't get data after multiple sets

test code:

hmap := haxmap.New[int64, interface{}](32)
go func() {
	var idx int64
	for i := 1; i <= 300; i++ {
		time.Sleep(time.Millisecond * 250)
		idx++
		hmap.Set(idx, idx)
		idx++    // Accelerated progress
		hmap.Set(idx, idx)
		fmt.Println("new..........", idx-1, idx)
	}
}()
go func() {
	var idx int64 = 1
	for {
		if _, ok := hmap.Get(idx); ok {
			fmt.Println("get_del...........", idx)
			hmap.Del(idx)
			idx++
		}
		time.Sleep(time.Millisecond * 10)
	}
}()
time.Sleep(time.Hour)

After looping for a while, no more data is obtained

Development Environment: Windows10(x64), go1.19.1

Incorrect fillrate value

Fillrate is not calculated correctly.

m := New[int, any]()
for i := 0; i < 1000; i++ {
	m.Set(i, nil)
}
for i := 0; i < 1000; i++ {
	m.Del(i)
}
fmt.Println(m.Fillrate())
// output: 38

It is caused by the index which is set when an element is removed from its index.

func (m *Map[K, V]) removeItemFromIndex(item *element[K, V]) {
	for {
		data := m.metadata.Load()
		index := item.keyHash >> data.keyshifts
		ptr := (*unsafe.Pointer)(unsafe.Pointer(uintptr(data.data) + index*intSizeBytes))

		next := item.next()
		if next != nil && item.keyHash>>data.keyshifts != index {
			next = nil // do not set index to next item if it's not the same slice index
		}
		atomic.CompareAndSwapPointer(ptr, unsafe.Pointer(item), unsafe.Pointer(next))
		...
	}
}

the index should be set to the next element only if the next element has the same index value

	...
	if next != nil && next.keyHash>>data.keyshifts != index {
		next = nil // do not set index to next item if it's not the same slice index
	}
	...

and also it would avoid the scenario to be emerged I mentioned in #33 (comment)

Plan for new release

Thank you for excellent map library.

It looks that several fixes and new APIs are added after v1.2.0.
Do you have any plan to release them ?

CompareAndStore would be nice

For concurrent programs a compare and swap value would be really neat, as you might have a race if multiple threads are doing:

value, _ := map.Load(key)
value.Modify()
map.Store(key, value)

The pattern could be changed to

for {
  oldvalue, _ := map.Load(key)
  newvalue := oldvalue
  newvalue.Modify()
  success := map.CompareAndStore(key, oldvalue, newvalue)
  if success {
    break
  }
}

Missed func LoadOrStore

Hi it's a great project but miss LoadOrStore func.

Can just Get first then set?

Iterate(ForEach) has no way to break

In the case of a lot of data, I need to stop this iteration after I get a piece of data.
But there is no way to top in the current iteration.

		syncmaps.Range(func(key, value any) bool {
		
			if ok {
	
				return false
			}
			return true
		})

Like in sync.Map I can return false to stop the iteration

How Do I actually delete a key?

mep := haxmap.New[int, string]()
    
mep.Set(1, "one")
println(mep.Len()) // 1

mep.Del(1) // delegate key 1
println(mep.Len()) // 0

// Still can traverse the key/value pair ？
mep.ForEach(func(key int, value string) bool {
    fmt.Printf("Key -> %d | Value -> %s\n", key, value)
    return true
})

// Print: Key -> 1 | Value -> one

I mean, I have deleted key 1, mep.len() is already 0, why ForEach still iterates over the deleted key-value pair? How to actually remove them from mep?

Will it cause new keys can't been set when deleting keys in ForEach iteration?

Code：

package main

import (
	"context"
	"fmt"
	"math/rand"
	"time"

	"github.com/alphadose/haxmap"
)

type data struct {
	id  int
	exp time.Time
}

func main() {
	c := haxmap.New[int, *data](256)
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	go func() {
		t := time.NewTicker(time.Second * 2)
		defer t.Stop()
		var count int
		for {
			select {
			case <-t.C:
				count = 0
				c.ForEach(func(s int, b *data) bool {
					if time.Now().After(b.exp) {
						c.Del(s)
						count++
					}
					return true
				})
				fmt.Println("Del", count)
			case <-ctx.Done():
				return
			}
		}
	}()

	for i := 0; i < 20000; i++ {
		c.Set(i, &data{id: i, exp: time.Now().Add(time.Millisecond * time.Duration((1000 + rand.Intn(800))))})
		time.Sleep(time.Microsecond * time.Duration(rand.Intn(200)+10))
		if i%100 == 0 {
			fmt.Println(i)
		}
	}

	time.Sleep(time.Second * 3)
	fmt.Println("LEN", c.Len())
}

Running the above code, setting the new Key will stop.

Major Bug

I upgraded from v0.1.0 to v0.3.1 and it seems to hang in the set command. The CPU stays stuck at 100% and the application does not run but haxmap internals are the only things running. I did a profiler in this condition and here is the image. When I downgraded v0.1.0, all was ok. Problem appears to exist for anything above v.0.1.0

UUID for Key?

Hi, thank you for your excellent map. How can I use google/UUID for the key? or can you add this feature to the map?

When to use this package

In which cases using this package really needed vs standard go map with mutex?
For example if I have 10 entries?

Add clear all method

fantastic library , can you add a drop all/clear method.

Say after n period of time, I want to clear out everything but not resize down .

Slow compared to map?

Hi, i was trying to benchmark haxmap vs map,

https://github.com/kokizzu/kokizzu-benchmark/blob/master/assoc/go-haxmap/haxmap.go
vs
https://github.com/kokizzu/kokizzu-benchmark/blob/master/assoc/map.go
the diff
https://pastebin.com/diff/V3Y04Uha

but haxmap took like 51s vs 14s using map

time go run go-haxmap/haxmap.go                                                                                                                    
6009354 6009348 611297
36186112 159701682 23370001

CPU: 51.86s     Real: 26.87s    RAM: 2 386 608KB

time go run map.go 
6009354 6009348 611297
36186112 159701682 23370001

CPU: 14.29s     Real: 12.43s    RAM: 2 271 672KB

Json encoding support

Map type does not implement json.Marshaler and json.Unmarshaler.

Cannot set maps above a certain size

go 1.19

func TestMakeHaxmap(t *testing.T) {
	for f := 1; f < 1000000; f *= 5 {
		m := haxmap.New[int, string]()
		t.Logf("creating %d", f)

		for i := 0; i < f; i++ {
			m.Set(i, fmt.Sprintf("a%d", i))
		}

		t.Logf("size: %d", m.Size())
	}
}

Randomly hangs forever...

Outdated Documentation for Map.Grow

The map.Grow method's comment states:

Grow resizes the hashmap to a new size, gets rounded up to next power of 2
To double the size of the hashmap use newSize 0
This function returns immediately, the resize operation is done in a goroutine
No resizing is done in case of another resize operation already being in progress
Growth and map bucket policy is inspired from https://github.com/cornelk/hashmap

But commit d071dd5f749f86017a32bc126ea40eaade5f3dfc changed map.Grow to be sync, making this part of the comment inaccurate:

This function returns immediately, the resize operation is done in a goroutine

Save to file, load from file

Pls, help

Delete performance would benefit from improvement

Very nice library for concurrent maps! For scenarios where you need to delete keys one at a time (not batching), the current performance makes it unusable.

I have an analysis application (https://github.com/lkarlslund/adalanche) and I've tried replacing some of the maps with yours (adding unsafe.Pointer in my fork).

With a nasty workaround where I don't delete keys, but set the value to a deleted flag, it works fairly good. But this is not the way.

Also looking at your code, I'm curious how you distinguish from a hash which is ^0 and the "marked" value which is the same?

Finding this project...

google "golang concurrent map" or "golang lockfree map" and this project does not come up. I only found it after following various links in issues reported on golang's sync.Map. You may want to do whatever's necessary to bring more attention to this rep (titles? README.md content? etc)

... I'm bothering to say this because some of the other projects that come up are riddled with bugs ...

Set after Delete seems to delete key

The following test fails with h.Get(1) returning that the key:value entry does not exist:

func TestHaxmap(t *testing.T) {
	h := haxmap.New[int, string]()
	for i := 1; i <= 10; i++ {
		h.Set(i, strconv.Itoa(i))
	}
	for i := 1; i <= 10; i++ {
		h.Del(i)
	}
	for i := 1; i <= 10; i++ {
		h.Set(i, strconv.Itoa(i))
	}
	for i := 1; i <= 10; i++ {
		id, ok := h.Get(i)
		assert.Equal(t, strconv.Itoa(i), id)
		assert.True(t, ok)
	}
}

I'm assuming it has to do with the lazy delete, where the h.Del(i) only flags it and h.Set(i) deletes the entry rather than setting it, but I haven't looked too deeply into it. My local environment is an M1 Macbook with Go version 1.19.

Seems not thread safe

I update the vendor with the latest main branch and wrote this test code to simulate the situation.

func TestDebug(t *testing.T) {
	var wg sync.WaitGroup
	m := haxmap.New[string, struct{}]()

	acquire := func(key string) (free func(), acquired bool) {
		if _, loaded := m.GetOrSet(key, struct{}{}); loaded {
			return nil, false
		}

		free = func() {
			m.Del(key)
		}

		return free, true
	}

	n := 1000
	key := "key"
	var sum int32
	wg.Add(n)

	for i := 0; i < n; i++ {
		go func(idx int) {
			defer wg.Done()

			_, acq := acquire(fmt.Sprintf("%d", idx))
			require.True(t, acq)

			free, acquired := acquire(key)
			t.Log(acquired)
			if !acquired {
				return
			}

			// makes sure that there're only one thread has been acquired.
			require.True(t, atomic.CompareAndSwapInt32(&sum, 0, 1), atomic.LoadInt32(&sum))
			// marks there's no thread is acquired in advance.
			require.True(t, atomic.CompareAndSwapInt32(&sum, 1, 0))

			free()
		}(i)
	}

	wg.Wait()
}

When run go test without -race it shows fine, no errors. But if enable race detection, it will fail like

    sync_test.go:94: false
    sync_test.go:100:
        	Error Trace:	/Users/cmgs/.go/src/github/projecteru2/agent/utils/sync_test.go:100
        	            				/Users/cmgs/.go/src/github/projecteru2/agent/utils/asm_arm64.s:1172
        	Error:      	Should be true
        	Test:       	TestDebug
        	Messages:   	1
    sync_test.go:100:
        	Error Trace:	/Users/cmgs/.go/src/github/projecteru2/agent/utils/sync_test.go:100
        	            				/Users/cmgs/.go/src/github/projecteru2/agent/utils/asm_arm64.s:1172
        	Error:      	Should be true
        	Test:       	TestDebug
        	Messages:   	1
FAIL
FAIL	github.com/projecteru2/agent/utils	0.208s
FAIL

The test command is (I put this code under the utils pkg)

GOOS=darwin GOARCH=arm64 go test -race -count=1 -timeout 120s -run TestDebug ./utils/...

Not sure what happen, because the this error only shows under GOOS=darwin and GOARCH=arm64 (I use M1 macbook). can pass in linux env（https://github.com/projecteru2/agent/actions/runs/3359505108/jobs/5567595644）.

Any ideas?

GetOrSet and map key types

Hello! Thank you for your awesome package and for the recent (important) GetOrSet functionality.

I want to ask if you can allow (and implement) the usage of function (as the value constructor) which will be called just once (to avoid unnecessary value construction logic everytime using GetOrSet).

The second thing is to ask to allow the usage of types which underlying type is among of those which your hashable supports. For example, type SeqId uint64.

Unnecessary resizing

Map is resizing unnecessarily after same operations although the map length is always same.

m := New[int, any]()
fmt.Printf("len: %d, indexCount: %d\n", m.Len(), len(m.metadata.Load().index))
for i := 0; i < 10000; i++ {
	m.Set(i, nil)
	m.Del(i)
}
fmt.Printf("len: %d, indexCount: %d\n", m.Len(), len(m.metadata.Load().index))

the output:

len: 0, indexCount: 8
len: 0, indexCount: 16384

New() initializer panics if the given size is uintptr(0)

I wanted to initialize a haxmap B with the same size as haxmap A with the following code:

B := haxmap.New[string, struct{}](A.Len())

which worked for most cases except when A.Len() is 0. When A has 0 key-values, the initializer would panic.

Would be awesome if this is handled 🙏

alphadose / haxmap Goto Github PK

haxmap's People

Contributors

Stargazers

Watchers

Forkers

haxmap's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs