GithubHelp home page GithubHelp logo

workiva / go-datastructures Goto Github PK

View Code? Open in Web Editor NEW
7.3K 320.0 832.0 8.4 MB

A collection of useful, performant, and threadsafe Go datastructures.

License: Apache License 2.0

Go 99.52% Python 0.42% Dockerfile 0.06%

go-datastructures's Introduction

go-datastructures

Go-datastructures is a collection of useful, performant, and threadsafe Go datastructures.

NOTE: only tested with Go 1.3+.

Augmented Tree

Interval tree for collision in n-dimensional ranges. Implemented via a red-black augmented tree. Extra dimensions are handled in simultaneous inserts/queries to save space although this may result in suboptimal time complexity. Intersection determined using bit arrays. In a single dimension, inserts, deletes, and queries should be in O(log n) time.

Bitarray

Bitarray used to detect existence without having to resort to hashing with hashmaps. Requires entities have a uint64 unique identifier. Two implementations exist, regular and sparse. Sparse saves a great deal of space but insertions are O(log n). There are some useful functions on the BitArray interface to detect intersection between two bitarrays. This package also includes bitmaps of length 32 and 64 that provide increased speed and O(1) for all operations by storing the bitmaps in unsigned integers rather than arrays.

Futures

A helpful tool to send a "broadcast" message to listeners. Channels have the issue that once one listener takes a message from a channel the other listeners aren't notified. There were many cases when I wanted to notify many listeners of a single event and this package helps.

Queue

Package contains both a normal and priority queue. Both implementations never block on send and grow as much as necessary. Both also only return errors if you attempt to push to a disposed queue and will not panic like sending a message on a closed channel. The priority queue also allows you to place items in priority order inside the queue. If you give a useful hint to the regular queue, it is actually faster than a channel. The priority queue is somewhat slow currently and targeted for an update to a Fibonacci heap.

Also included in the queue package is a MPMC threadsafe ring buffer. This is a block full/empty queue, but will return a blocked thread if the queue is disposed while a thread is blocked. This can be used to synchronize goroutines and ensure goroutines quit so objects can be GC'd. Threadsafety is achieved using only CAS operations making this queue quite fast. Benchmarks can be found in that package.

Fibonacci Heap

A standard Fibonacci heap providing the usual operations. Can be useful in executing Dijkstra or Prim's algorithms in the theoretically minimal time. Also useful as a general-purpose priority queue. The special thing about Fibonacci heaps versus other heap variants is the cheap decrease-key operation. This heap has a constant complexity for find minimum, insert and merge of two heaps, an amortized constant complexity for decrease key and O(log(n)) complexity for a deletion or dequeue minimum. In practice the constant factors are large, so Fibonacci heaps could be slower than Pairing heaps, depending on usage. Benchmarks - in the project subfolder. The heap has not been designed for thread-safety.

Range Tree

Useful to determine if n-dimensional points fall within an n-dimensional range. Not a typical range tree however, as we are actually using an n-dimensional sorted list of points as this proved to be simpler and faster than attempting a traditional range tree while saving space on any dimension greater than one. Inserts are typical BBST times at O(log n^d) where d is the number of dimensions.

Set

Our Set implementation is very simple, accepts items of type interface{} and includes only a few methods. If your application requires a richer Set implementation over lists of type sort.Interface, see xtgo/set and goware/set.

Threadsafe

A package that is meant to contain some commonly used items but in a threadsafe way. Example: there's a threadsafe error in there as I commonly found myself wanting to set an error in many threads at the same time (yes, I know, but channels are slow).

AVL Tree

This is an example of a branch copy immutable AVL BBST. Any operation on a node makes a copy of that node's branch. Because of this, this tree is inherently threadsafe although the writes will likely still need to be serialized. This structure is good if your use case is a large number of reads and infrequent writes as reads will be highly available but writes somewhat slow due to the copying. This structure serves as a basis for a large number of functional data structures.

X-Fast Trie

An interesting design that treats integers as words and uses a trie structure to reduce time complexities by matching prefixes. This structure is really fast for finding values or making predecessor/successor types of queries, but also results in greater than linear space consumption. The exact time complexities can be found in that package.

Y-Fast Trie

An extension of the X-Fast trie in which an X-Fast trie is combined with some other ordered data structure to reduce space consumption and improve CRUD types of operations. These secondary structures are often BSTs, but our implementation uses a simple ordered list as I believe this improves cache locality. We also use fixed size buckets to aid in parallelization of operations. Exact time complexities are in that package.

Fast integer hashmap

A datastructure used for checking existence but without knowing the bounds of your data. If you have a limited small bounds, the bitarray package might be a better choice. This implementation uses a fairly simple hashing algorithm combined with linear probing and a flat datastructure to provide optimal performance up to a few million integers (faster than the native Golang implementation). Beyond that, the native implementation is faster (I believe they are using a large -ary B-tree). In the future, this will be implemented with a B-tree for scale.

Skiplist

An ordered structure that provides amortized logarithmic operations but without the complication of rotations that are required by BSTs. In testing, however, the performance of the skip list is often far worse than the guaranteed log n time of a BBST. Tall nodes tend to "cast shadows", especially when large bitsizes are required as the optimum maximum height for a node is often based on this. More detailed performance characteristics are provided in that package.

Sort

The sort package implements a multithreaded bucket sort that can be up to 3x faster than the native Golang sort package. These buckets are then merged using a symmetrical merge, similar to the stable sort in the Golang package. However, our algorithm is modified so that two sorted lists can be merged by using symmetrical decomposition.

Numerics

Early work on some nonlinear optimization problems. The initial implementation allows a simple use case with either linear or nonlinear constraints. You can find min/max or target an optimal value. The package currently employs a probabilistic global restart system in an attempt to avoid local critical points. More details can be found in that package.

B+ Tree

Initial implementation of a B+ tree. Delete method still needs added as well as some performance optimization. Specific performance characteristics can be found in that package. Despite the theoretical superiority of BSTs, the B-tree often has better all around performance due to cache locality. The current implementation is mutable, but the immutable AVL tree can be used to build an immutable version. Unfortunately, to make the B-tree generic we require an interface and the most expensive operation in CPU profiling is the interface method which in turn calls into runtime.assertI2T. We need generics.

Immutable B Tree

A btree based on two principals, immutablability and concurrency. Somewhat slow for single value lookups and puts, it is very fast for bulk operations. A persister can be injected to make this index persistent.

Ctrie

A concurrent, lock-free hash array mapped trie with efficient non-blocking snapshots. For lookups, Ctries have comparable performance to concurrent skip lists and concurrent hashmaps. One key advantage of Ctries is they are dynamically allocated. Memory consumption is always proportional to the number of keys in the Ctrie, while hashmaps typically have to grow and shrink. Lookups, inserts, and removes are O(logn).

One interesting advantage Ctries have over traditional concurrent data structures is support for lock-free, linearizable, constant-time snapshots. Most concurrent data structures do not support snapshots, instead opting for locks or requiring a quiescent state. This allows Ctries to have O(1) iterator creation and clear operations and O(logn) size retrieval.

Dtrie

A persistent hash trie that dynamically expands or shrinks to provide efficient memory allocation. Being persistent, the Dtrie is immutable and any modification yields a new version of the Dtrie rather than changing the original. Bitmapped nodes allow for O(log32(n)) get, remove, and update operations. Insertions are O(n) and iteration is O(1).

Persistent List

A persistent, immutable linked list. All write operations yield a new, updated structure which preserve and reuse previous versions. This uses a very functional, cons-style of list manipulation. Insert, get, remove, and size operations are O(n) as you would expect.

Simple Graph

A mutable, non-persistent undirected graph where parallel edges and self-loops are not permitted. Operations to add an edge as well as retrieve the total number of vertices/edges are O(1) while the operation to retrieve the vertices adjacent to a target is O(n). For more details see wikipedia

Installation

  1. Install Go 1.3 or higher.
  2. Run go get github.com/Workiva/go-datastructures/...

Updating

When new code is merged to master, you can use

go get -u github.com/Workiva/go-datastructures/...

To retrieve the latest version of go-datastructures.

Testing

To run all the unit tests use these commands:

cd $GOPATH/src/github.com/Workiva/go-datastructures
go get -t -u ./...
go test ./...

Once you've done this once, you can simply use this command to run all unit tests:

go test ./...

Contributing

Requirements to commit here:

Maintainers

go-datastructures's People

Contributors

alexandercampbell-wk avatar andreis avatar andygrunwald avatar azr avatar bcicen avatar brianshannan-wf avatar danielway-wk avatar dustinhiatt-wf avatar dzyp avatar evanh avatar fasaxc avatar funny-falcon avatar ichinaski avatar jacobmoss-wf avatar juliangriggs avatar matthinrichsen-wf avatar noxiouz avatar rmconsole3-wf avatar rosshendrickson-wf avatar seanstrickland-wf avatar simeonsteward-wk avatar slatteryjim avatar stephane-martin avatar stevenosborne-wf avatar tannermiller-wf avatar theodus avatar tomdeering-wf avatar tylertreat avatar tylertreat-wf avatar williamsigouin-wk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-datastructures's Issues

State of rtree different after insert/delete (0,0,0,0) and insert/delete (0,0,10,10)

The state of rtree is different after insert/delete (0,0,0,0) and insert/delete (0,0,10,10).

The difference is that maxHilbert changed from 0 to 34 in the (0,0,10,10) case. Is this intentional or a bug? I expect that the rtree behaves the same, independently of what kind of rectangles I have.

Maybe it's not relevant but I have one case, where an other rTree gives a correct search result whereas this one here is wrong. I search for P1 = (1,1,1,1) and P2 = (1001,1,1001,1) The sequence is:

Node-1: (0,0, 2000, 1000)
Node-12: (0,0,2000,1000)

P1 and P2 are correctly found.

Adding Node-13 and changing Node-12 by remove and insert:

Node-1: (0,0, 2000, 1000)
Node-12: (0,0,1000,1000)
Node-13: (1000,0,2000,1000)

P1 correctly found, P2 reports Node-1 and Node-12 as result, which is wrong.

yfast trie Successor doesn't work well with big gaps between numbers

this test will fail

func TestTrieSuccessorBigGaps(t *testing.T) {
        yfast := New(uint64(0))

        e3 := newMockEntry(13 << 32)
        yfast.Insert(e3)

        successor := yfast.Successor(0)
        assert.Equal(t, e3, successor)

        e1 := newMockEntry(3 << 32)
        e2 := newMockEntry(7 << 32)

        yfast.Insert(e1, e2)

        successor = yfast.Successor(0)
        assert.Equal(t, e1, successor)

        successor = yfast.Successor(1)
        assert.Equal(t, e1, successor)

        successor = yfast.Successor(3<<32 + 1)
        assert.Equal(t, e2, successor)

        successor = yfast.Successor(8 << 32)
        assert.Equal(t, e3, successor)

        successor = yfast.Successor(14 << 32)
        assert.Nil(t, successor)

        successor = yfast.Successor(100 << 32)
        assert.Nil(t, successor)
}

Race Condition in Ctrie Tests

Example failures caused by a race condition in the ctrie tests:

https://travis-ci.org/Workiva/go-datastructures/jobs/69646914
https://travis-ci.org/Workiva/go-datastructures/jobs/69537489
https://travis-ci.org/Workiva/go-datastructures/jobs/69537490

I can't reproduce these locally. The first two builds (on Go 1.3) never terminated and were killed by Travis after ten minutes. The last build was on Go 1.4 and shows a ctrie unit test failure.

Any ideas @dustinhiatt-wf?

DTrie race condition

There's a race condition in the dtrie logic which causes an occasional error in CI:

--- FAIL: TestIterate-8 (0.40s)
    Error Trace:    dtrie_test.go:119
    Error:      Should be true

@Theodus

queue.TakeUntil - does not pop final matching element

This isn't quite the behavior I expected the queue.TakeUntil(...) function to behave. Here's a quick non-test-harness way I wrote a snippit to check my understanding of how this feature works:

    q := queue.New(10)
    q.Put(`a`, `b`, `c`)
    takeItems, _ := q.TakeUntil(func(item interface{}) bool {
        return item != `c`
    })

    restItems, _ := q.Get(3)

    fmt.Sprintf("TakeUntil Test. takeItems: %v, restItems: %v",
        takeItems, restItems)

This prints:
TakeUntil Test. takeItems: [a b], restItems: [b c]

I would have assumed that restItems would be either [c] (mutable call) or [a b c] (immutable call) but the result of [b c] is unexpected because the function both returned [a b] (looks like an immutable return value) but when I see [b c] it looks as if the queue was indeed mutated, but I would have expected in that case for the queue to have been [c]. Am I not understanding something, or is this a bug?

Future

I think it is better to make future to look like

type Future struct {
    m sync.Mutex
    val interface{}
    err error
    wait chan struct{}
    filled bool
    timeout *time.Timer
}

func NewFuture(d time.Duration) (f *Future) {
    ch := make(chan struct{})
    f = &Future { wait: ch }
    if d > 0 {
        f.timeout = time.AfterFunc(d, f.timedout)
    }
    return
}

func (f *Future) timedout() {
    f.m.Lock()
    if !f.filled {
        f.err = errors.New("timeout")
        close(f.wait)
        f.filled = true
        f.timeout = nil
    }
    f.m.Unlock()
}

func (f *Future) Fill(val interface{}) (err error) {
    f.m.Lock()
    if !f.filled {
        f.val = val
        if f.timeout != nil {
            f.timeout.Stop()
            f.timeout = nil
        }
        close(f.wait)
        f.filled = true
    } else if (f.err != nil) {
        err = f.err
    } else {
        err = errors.New("already filled")
    }
    f.m.Unlock()
    return
}

func (f *Future) Wait() <-chan struct{} {
    return f.wait
}

func (f *Future) Value() (interface{}, error) {
    /* normally one will call f.Value() only after f.Wait() already closed,
       but lets check it once again */
    <- f.wait
    return f.val, f.err
}

Not tested.

Queue Get func

When the queue is empty and you use Get I get a strange error instead of ErrEmptyQueue error and please add any error when using Get will output a shorter slice.

RingBuffer: 100% cpu usage

$uname -a
Linux Lee-Ubuntu 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$cat /etc/issue
Linux Lee-Ubuntu 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$go env
Linux Lee-Ubuntu 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Test code:


import (
    "fmt"
    "sync"
    "time"

"github.com/Workiva/go-datastructures/queue"


)

var wg sync.WaitGroup

func main() {
    wg.Add(2)
    q := queue.NewRingBuffer(4096)

go func() {
defer wg.Done()
for {
time.Sleep(3 * time.Second)
q.Offer("hello world")
}
}()

go func() {
defer wg.Done()
for {
str, _ := q.Get()
fmt.Println(str)
}
}()

wg.Wait()


}

ctrie: iterator performance

I've been comparing an implementation of a database cache using ctrie & the immutable avl tree. The ctrie is more efficient except when iterating. I think the ctrie iteration scheme of using a goroutine and channel gives pretty bad performance compared to the straight traversal of the avl tree nodes, plus the channel is prone to leakage (as noted in the code).

Get from PriorityQueue

This snippet of code gives me:

func main() {
	myqueue := queue.NewPriorityQueue(20, true)
	myqueue.Put(Myfloat(4))
	myqueue.Put(Myfloat(-1))
	myqueue.Put(Myfloat(-99))
	myqueue.Put(Myfloat(104))

	result, _ := myqueue.Get(3)
	fmt.Println(result)
}

[-99 -1]

While I expected [-99 -1 4]

Why did it happen?

P.S Compare method for Item interface.

type Myfloat float32

func (mi Myfloat) Compare(other queue.Item) int {
	omi := other.(Myfloat)
	if mi > omi {
		return 1
	} else if mi == omi {
		return 0
	}
	return -1
}

Create a DecreaseKey operation for the priority queue

Many priority queue implementations provide for a DecreaseKey operation, where a specific key's priority may be decreased. This should be implemented for the PriorityQueue, especially if there is a plan to eventually implement it using a Fib Heap.

Merge with emirpasic/gods ?

A few days ago i found https://github.com/emirpasic/gods.

Go Data Structures. Tags: Containers, Sets, Lists, Stacks, Maps, Trees, HashSet, TreeSet, ArrayList, SinglyLinkedList, DoublyLinkedList, LinkedListStack, ArrayStack, HashMap, TreeMap, RedBlackTree, BinaryHeap, Comparator, Sort

Maybe it make sense to merge both together? Or to benefit from each other?

using augmentedtree with non-integer intervals

In our use case, each interval's start and end are of type []byte.

Is it possible to use augmentedtree for this use case? Seems like we'd quickly overflow the int64 return required by Interval.{Low,High}AtDimension.

Flatten() returns empty slice while reusing Set.

find a bug while using set, I'll make a PR late.

set := set.New()
set.Add(1)
set.Add(2)
fmt.Println(set.Flatten()) // console: [1 2]
set.Dispose()
newSet := set.New(1)
fmt.Println(set.Flatten()) // console:[] expect:[1]

Reason:

While dispose an set, the flattened was just set to set.flattened[:0] in order to save the alloc for slice.

Every time calling updating function such like Add(...) or Remove(...) will reset flattened to nil.

And while calling Flatten(), if the flattened equals nil, will not recreate slice.

But while calling New(...) with init items, the flattened won't set to nil and next time to call Flatten() will got flattened != nil and return empty slice.

Sparse array can retain zeroed blocks

Sparse array AND'ed with dense array resulting in all zeros can leave zeroed blocks, causing the IsEmpty() check to incorrectly return false because it checks to see if the indices array has length 0.

trimmed-screenshot

fastinteger.FastIntegerHashMap delete does not rehash current cluster of packets

When a "packet" is deleted from FastIntegerHashMap subsequent packets are not rehashed. If three packets are inserted that hash to the same position, then the second one is deleted, then a get of the third one is performed, the get will fail because a nil value for the second packet will be located before the third item is encountered during linear probing. Here's a quick test case:

func TestDeleteCollision(t *testing.T) {
    // 1, 27, 42 all hash to the same value using our hash function % 32
    if hash(1)%32 != 12 || hash(27)%32 != 12 || hash(42)%32 != 12 {
        t.Error("test values don't hash to the same value")
    }

    m := New(32)
    m.Set(1, 1)
    m.Set(27, 27)
    m.Set(42, 42)

    m.Delete(27)
    value, ok := m.Get(42)
    assert.True(t, ok)
    assert.Equal(t, uint64(42), value)
}

And here's a modified delete method that fixes the issue:

func (packets packets) delete(key uint64) bool {
    i := packets.find(key)
    if packets[i] == nil {
        return false
    }
    packets[i] = nil
    i = (i + 1) & (uint64(len(packets)) - 1)
    for packets[i] != nil {
        p := packets[i]
        packets[i] = nil
        packets.set(p)
        i = (i + 1) & (uint64(len(packets)) - 1)
    }
    return true
}

Alternatively, you could use a sentry/tombstone element to mark things that are "deleted" instead of niling them out, which would probably be faster for most typical use cases but cost more in memory.

dtrie appears to be mutable (and not threadsafe)

Tested on v1.0.50:

From the documentation, it sounds like the dtrie is intended to be an immutable, hash array mapped trie implementation:

Dtrie
A persistent hash trie that dynamically expands or shrinks to provide efficient memory allocation. Being persistent, the Dtrie is immutable and any modification yields a new version of the Dtrie rather than changing the original. Bitmapped nodes allow for O(log32(n)) get, remove, and update operations. Insertions are O(n) and iteration is O(1).

However, this doesn't appear true:

t0 := dtrie.New(nil)
t1 := t0.Insert("a", "b")
t2 := t1.Insert("c", "d")

fmt.Println("trie sizes", t0.Size(), t1.Size(), t2.Size()) // yields 2, 2, 2; should be 0, 1, 2
fmt.Println(t0.Get("c")) // "d"; should be nil
fmt.Println(t1.Get("c")) // "d"; should be nil
fmt.Println(t2.Get("c")) // "d"; which is correct

It's also pretty easy to trigger a panic with concurrent access. This will reliably result in a panic on my machine -

insertLots := func() {
	for i := 0; i < 1000000; i++ {
		m0.Insert(stringEntry("v"), "value")
	}
}
	
go func() {
	insertLots()
}()
go func() {
	insertLots()
}()

Now, I know that this project hasn't seen a ton of updates in the past couple of years, and I'd understand if no-one wants to dig in to fix it properly. Nonetheless, I feel like a limitation like this would at least be worth documenting if it's not going to be fixed. I'd be happy to make a PR to that effect.

install err

go get github.com/Workiva/go-datastructures/...

# github.com/Workiva/go-datastructures/btree/immutable
D:\go_project\src\github.com\Workiva\go-datastructures\btree\immutable\node.go:123:26: multiple-value uuid.NewV4() in single-value context
D:\go_project\src\github.com\Workiva\go-datastructures\btree\immutable\node.go:300:22: multiple-value uuid.NewV4() in single-value context
D:\go_project\src\github.com\Workiva\go-datastructures\btree\immutable\node.go:323:22: multiple-value uuid.NewV4() in single-value context
D:\go_project\src\github.com\Workiva\go-datastructures\btree\immutable\node.go:424:17: multiple-value uuid.NewV4() in single-value context
D:\go_project\src\github.com\Workiva\go-datastructures\btree\immutable\rt.go:152:24: multiple-value uuid.NewV4() in single-value context
D:\go_project\src\github.com\Workiva\go-datastructures\btree\immutable\rt.go:200:21: multiple-value uuid.NewV4() in single-value context

RingBuffer's Offer returns false (queue full) even when there is space in queue

As per documentation, rb.Offer should return false only when the queue is full:

// Offer adds the provided item to the queue if there is space.  If the queue
// is full, this call will return false.  An error will be returned if the
// queue is disposed.
func (rb *RingBuffer) Offer(item interface{}) (bool, error) {
	return rb.put(item, true)
}

But, when adding elements to queue by concurrently calling rb.Offer, It returns false even when there is enough space.
The following test file can reproduce the issue.

package test

import (
	"sync"
	"testing"

	"github.com/Workiva/go-datastructures/queue"
)

func TestRingQueueOffer_parallel(t *testing.T) {
	size := 256
	parallelGoroutines := 8

	rb := queue.NewRingBuffer(uint64(size * parallelGoroutines))

	wg := new(sync.WaitGroup)
	wg.Add(parallelGoroutines)

	for i := 0; i < parallelGoroutines; i++ {
		go func(id int) {
			defer wg.Done()

			for el := 1; el <= size; el++ {
				ok, err := rb.Offer(el)
				if err != nil {
					t.Errorf("error in goroutine-%d: %v", id, err)
					return
				}

				if !ok {
					t.Errorf("queue full before expected on adding %d element, len: %d, cap: %d ", el, rb.Len(), rb.Cap())
				}
			}
		}(i)
	}

	wg.Wait()
}

On running the above test file, I got:

--- FAIL: TestRingQueueOffer_parallel (0.00s)
    ring_buffer_test.go:31: queue full before expected on adding 1 element, len: 49, cap: 2048 
    ring_buffer_test.go:31: queue full before expected on adding 256 element, len: 1391, cap: 2048 
    ring_buffer_test.go:31: queue full before expected on adding 1 element, len: 1730, cap: 2048 
FAIL
FAIL	command-line-arguments	0.028s
FAIL

memory leak in queue.Poll

There is a memory leak while handling waiters in the queue package. If I periodically queue.Poll from an empty queue and never calls queue.Put for whatever reason, eventually the program will be OOM from keeping too many waiters around in the memory.

q := queue.New(123)
m := &runtime.MemStats{}

var currHeapAlloc, prevHeapAlloc uint64
for i := 0; i < 100; i++ {
    q.Poll(1, time.Millisecond)

    prevHeapAlloc = currHeapAlloc
    runtime.ReadMemStats(m)
    currHeapAlloc = m.HeapAlloc

    fmt.Println(prevHeapAlloc, currHeapAlloc)
}

Dtrie.Get() panics on unexisting key

// Get returns the value for the associated key or returns nil if the
// key does not exist.
func (d *Dtrie) Get(key interface{}) interface{} {
	return get(d.root, d.hasher(key), key).Value()
}

get() returns nil if the key is not found and thus Value() raises a panic. Is that by design?

ctrie data race

I just got a report of data race in the ctrie impl on committed.

func (c *Ctrie) rdcssRoot(old *iNode, expected *mainNode, nv *iNode) bool {
    desc := &iNode{
        rdcss: &rdcssDescriptor{
            old:      old,
            expected: expected,
            nv:       nv,
        },
    }
    if c.casRoot(old, desc) {
        c.rdcssComplete(false)
        return desc.rdcss.committed
    }
    return false
}
func (c *Ctrie) rdcssComplete(abort bool) *iNode {
...
            if oldeMain == exp {
            // Commit the RDCSS.
            if c.casRoot(r, nv) {
                desc.committed = true
                return nv
Read by goroutine 9:
  customerio/chgbuf/ctrie.(*Ctrie).rdcssRoot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:881 +0x1e7
  customerio/chgbuf/ctrie.(*Ctrie).Snapshot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:318 +0xb7
...
Previous write by goroutine 6:
  customerio/chgbuf/ctrie.(*Ctrie).rdcssComplete()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:912 +0x1a1
  customerio/chgbuf/ctrie.(*Ctrie).rdcssReadRoot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:863 +0x78
  customerio/chgbuf/ctrie.(*Ctrie).readRoot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:855 +0x33
  customerio/chgbuf/ctrie.(*Ctrie).ReadOnlySnapshot()
...

Incorrect handling of hash collisions in ctrie

Hello,

suppose you have two strings X and Y that have the same hash. We know that such strings exist with every hash function.

In the ctrie, when we insert X and Y, they will be stored in a common lNode. lNode as currently implemented is a simple persistent Linked List. That implementation is not correct for the ctrie, as duplicates will not be detected in the lNode.

package main

import (
	"fmt"
	"hash"

	"github.com/Workiva/go-datastructures/trie/ctrie"
)

type fakehash struct{}

func (h *fakehash) Sum32() uint32 {
	return 42
}

func (h *fakehash) Sum(b []byte) []byte {
	return nil
}

func (h *fakehash) Size() int {
	return 0
}

func (h *fakehash) BlockSize() int {
	return 0
}

func (h *fakehash) Reset() {

}

func (h *fakehash) Write(b []byte) (int, error) {
	return 0, nil
}

func factory() hash.Hash32 {
	return &fakehash{}
}

func main() {
	trie := ctrie.New(factory)
	trie.Insert([]byte("foobar"), 1)
	fmt.Println(trie.Lookup([]byte("foobar")))
	trie.Insert([]byte("zogzog"), 2)
	fmt.Println(trie.Lookup([]byte("zogzog")))
	trie.Insert([]byte("foobar"), 3)
	fmt.Println(trie.Lookup([]byte("foobar")))
	trie.Remove([]byte("foobar"))
	// foobar is supposed to be removed, but the next lookup returns 1
	fmt.Println(trie.Lookup([]byte("foobar")))
}

avl bug

The following test fails.

func TestAVLFails(t *testing.T) {
        keys := []mockEntry{
                mockEntry(0),
                mockEntry(1),
                mockEntry(3),
                mockEntry(4),
                mockEntry(5),
                mockEntry(6),
                mockEntry(7),
                mockEntry(2),
        }
        i1 := NewImmutable()
        for _, k := range keys {
                i1, _ = i1.Insert(k)
        }

        for _, k := range keys {
                var deleted Entries
                i1, deleted = i1.Delete(k)
                assert.Equal(t, Entries{k}, deleted)
        }
}

Will rtree recognize coordinate/dimension changes of inserted rects?

If I have a rtree where I insert 50 rectangles and change the coordinates/dimensions of some rectangles afterwards, will rtree recognize this and adapt the internal data structure to work correctly? Or do I have to somehow let the tree know, that some rectangles changed?

Do you need generics, or not

This I've noticed that you use interface{} (EFACE-not ideal due to run time type checking) and also regular interfaces (IFACE-slow dispatch).

I've hacked the gccgo to add basic generic (1-parametric polymorphic functions) to go.
If my proposal is completed, your data structures will be fully performant, work on any type and there will be NO code bloat (NO hidden code generation for different types like the go generate etc).

The go people said they will NOT work on adding generics to go.

Problem is due to ideological issues I'm afraid it probably won't be merged to the go language. This means if I finish this, and will be rejected, will need to fork go language (3 compilers) and compete with go language. So i will probably throw away the whole project. I will soon quit go

Side note: please look how generic algorithm e.g. arbitrary slice sort can be implemented by breaking the type system "escaping the google walled garden" using unsafe: https://github.com/gomacro/sort

If this is just a toy project, excuse me and, have fun.

Nil pointer error in yfast trie

Here's a minimal reproducing example using Go 1.9 on macOS Sierra. I couldn't break it down any further. It will fail pretty much every time, but at a different loop iteration. I tried to simplify the example but this was the simplest that would reliably reproduce the problem.

type Entry uint64

func (s Entry) Key() uint64 { return uint64(s) }

func main() {
	y := yfast.New(uint64(1))

	i := 0
	defer func() {
		err := recover()
		fmt.Println(i)
		panic(err)
	}()
	for ; i < 10000; i++ {
		e1 := Entry(time.Now().UnixNano())
		y.Insert(e1)

		e2 := Entry(time.Now().Add(20 * time.Millisecond).UnixNano())
		y.Insert(e2)

		e3 := Entry(time.Now().Add(10 * time.Millisecond).UnixNano())
		y.Insert(e3)

		y.Delete(e3.Key())
	}
}

Panic during BenchmarkRBPut

Windows 7 x86_64
Go version go1.4.2 windows/386 (NB: Using 386 build for stdlib change testing, can switch later to test)

Panic stacktrace below:

BenchmarkRBPut  fatal error: runtime: cannot map pages in arena address space

runtime stack:
runtime.SysMap(0x0, 0x100000, 0xcfd01, 0x7b7db8)
        C:/apps/go/src/runtime/mem_windows.c:131 +0x7f
runtime.MHeap_SysAlloc(0x7baba0, 0x100000, 0x642d0168)
        C:/apps/go/src/runtime/malloc.c:284 +0xf1
runtime.MHeap_Alloc(0x7baba0, 0x1, 0x13, 0x700100, 0x40a9a8)
        C:/apps/go/src/runtime/mheap.c:240 +0x66
runtime.MCentral_CacheSpan(0x7bf694, 0x642d0168)
        C:/apps/go/src/runtime/mcentral.c:85 +0x122
runtime.MCache_Refill(0xb70000, 0x13, 0x140)
        C:/apps/go/src/runtime/mcache.c:90 +0x83

goroutine 30400 [running]:
runtime.switchtoM()
        C:/apps/go/src/runtime/asm_386.s:208 fp=0x12531eb0 sp=0x12531eac
runtime.mallocgc(0x140, 0x650b40, 0x0, 0x2)
        C:/apps/go/src/runtime/malloc.go:178 +0x68e fp=0x12531f08 sp=0x12531eb0
runtime.newobject(0x650b40, 0x2)
        C:/apps/go/src/runtime/malloc.go:353 +0x48 fp=0x12531f1c sp=0x12531f08
github.com/Workiva/go-datastructures/queue.NewRingBuffer(0x2, 0x0, 0x63adfe00)
        c:/working/go/external/src/github.com/Workiva/go-datastructures/queue/ring.go:192 +0x28 fp=0x12531f30 sp=0x12531f1c
github.com/Workiva/go-datastructures/queue.BenchmarkRBPut(0x12564000)
        c:/working/go/external/src/github.com/Workiva/go-datastructures/queue/ring_test.go:286 +0x89 fp=0x12531f80 sp=0x12531f30
testing.(*B).runN(0x12564000, 0x989680)
        C:/apps/go/src/testing/benchmark.go:124 +0x81 fp=0x12531f88 sp=0x12531f80
testing.(*B).launch(0x12564000)
        C:/apps/go/src/testing/benchmark.go:216 +0x13a fp=0x12531fe8 sp=0x12531f88
runtime.goexit()
        C:/apps/go/src/runtime/asm_386.s:2287 +0x1 fp=0x12531fec sp=0x12531fe8
created by testing.(*B).run
        C:/apps/go/src/testing/benchmark.go:179 +0x3a

goroutine 1 [chan receive]:
testing.(*B).run(0x12564000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        C:/apps/go/src/testing/benchmark.go:180 +0x5b
testing.RunBenchmarks(0x6e2f5c, 0x7a8ec0, 0xa, 0xa)
        C:/apps/go/src/testing/benchmark.go:312 +0x53b
testing.(*M).Run(0x12514600, 0x7b2c20)
        C:/apps/go/src/testing/testing.go:494 +0x158
main.main()
        github.com/Workiva/go-datastructures/queue/_test/_testmain.go:140 +0x177
exit status 2
FAIL    github.com/Workiva/go-datastructures/queue      33.889s

Out of curiosity, is there a benefit to benchmarking a single insert on b.N RingBuffers? Switching the benchmark to something like below gives what I believe would be more useful metrics, plus runs without issue:

func BenchmarkRBPut(b *testing.B) {
    rb := NewRingBuffer(uint64(b.N))

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        ok, err := rb.Offer(i)
        if !ok {
            b.Fail()
        }
        if err != nil {
            b.Log(err)
            b.Fail()
        }
    }
}

func BenchmarkRBGet(b *testing.B) {
    rb := NewRingBuffer(uint64(b.N))

    for i := 0; i < b.N; i++ {
        rb.Offer(i)
    }

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        rb.Get()
    }
}

ctrie: hasherPool poor performance

I found getting rid of the hasher pool made a pretty big difference in the performance of the ctrie imp (I can't remember exactly how much but I think it was ~5% performance improvement).

How to use rtree?

Maybe I'm just to new to Go, but can't get it to work:

package main

import (
  "github.com/Workiva/go-datastructures/rtree/hilbert"
)

func main() {
  rtree:= New(100,2) // 2D rtree with buffer size = 100
}

I get:

% go build
# go-datastructures_rtree.go
./go-datastructures_rtree.go:4:3: imported and not used: "github.com/Workiva/go-datastructures/rtree/hilbert"
./go-datastructures_rtree.go:8:11: undefined: New

Any idea?

Using augmented tree to find which interval a point is in is failing

Hi,

My goal is to find which interval a point is in. I am under the impression that this is possible with augmentedtree package based on the documentation here stating

You can also check intersections against a point by constructing a range of encompassed solely if a single point.

Hence, I have constructed an augmented tree containing a single 2D interval having lows at [0,0] and highs at [1,1]. However, when I tried to query which interval that the point (0,0) is in, I am getting an empty result. Any help is much appreciated.

Code snippet:

type IntervalContainerTest struct {
	Id           int
	Low          []int64
	High         []int64
}

func (itv IntervalContainerTest) LowAtDimension(dim uint64) int64{
	return itv.Low[dim - 1]
}

func (itv IntervalContainerTest) HighAtDimension(dim uint64) int64{
	return itv.High[dim - 1]
}

// OverlapsAtDimension should return a bool indicating if the provided
// interval overlaps this interval at the dimension requested.
func (itv IntervalContainerTest) OverlapsAtDimension(interval augmentedtree.Interval, dim uint64) bool{
	check := false
	if interval.LowAtDimension(dim) <= itv.HighAtDimension(dim) &&
		interval.HighAtDimension(dim) >= itv.LowAtDimension(dim){
		check = true
	} else {
		check = false
	}
	return check
}

func (itv IntervalContainerTest) ID() uint64{
	return uint64(itv.Id)
}

func TestIntervalQueryInt(t *testing.T){
	tree := NewIntervalTree(2)
	intervals_container := IntervalContainerTest{Id: 1, Low: []int64{0, 0}, High: []int64{1, 1}}
	intervals := augmentedtree.Interval(intervals_container)
	tree.Add(intervals)

	interval_test_query := tree.Query(IntervalContainerTest{Id: 1, Low: []int64{0, 0}, High: []int64{0, 0}})
	assert.Len(t, interval_test_query, 1, "Interval not found: %v", interval_test_query)
}

Result:

--- FAIL: TestIntervalQueryInt (0.00s)
        ......
        Error:          "[]" should have 1 item(s), but has 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.