grandecola / bigqueue Goto Github PK

View Code? Open in Web Editor NEW

442.0 10.0 32.0 118 KB

Embedded, Fast and Persistent bigqueue implementation

License: MIT License

Go 100.00%

queue golang persistence bigqueue embedded

bigqueue's People

Contributors

Stargazers

Watchers

bigqueue's Issues

Add support for go mod on Travis CI

EnsureArena should avoid iterating from tail to head, this slows down `Dequeue`

Furthermore, EnsureArena should avoid removing HEAD from memory

Why mmap?

Just curious, why mmap the files? It seems like simply writing serially and reading serially also should work, no? Would also give higher control over flushing and buffering.

Add PeekAndDequeue function and update external sort example

@darkcoderrises

Optimize dequeue if peek has already been performed

Implement bitcask equivalent using bigqueue

https://github.com/basho/bitcask

Release 0.5.0

@mangalaman93 please issue a new release and update benchmarks!

We should expose the flush function as part of the bigqueue interface. Additionally, we should not trust the OS periodic syncing, and instead, enable flushing periodically, with a timer or probably by amount of data change, with configuration parameters to choose the period.

Ensure that it is okay to not have Arena Size in multiple of page size

Implement BigArray

Add support for multithreading

We can try two difference approaches -

In this approach, we keep the queue single threaded and acquire lock from outside
In this approach, we acquire locks on all the subcomponents such as index, arena.

Support for periodic GC

GC will delete files from disk that are not in use anymore. This will be done periodically or on demand.

Allow compression of data stored in bigqueue

Write multicore benchmark and Improve performance

Benchmarks currently single threaded. Now that bigqueue is thread safe, we should be able to utilize the multi core performance for a higher throughput. We can also profile the system and improve the performance.

Add package level documentation with examples

Support for limited memory use

Given that it is a queue, I think we at least need following pages in memory -

the page that is currently written
the page that is currently read

Further, we can keep more pages after the currently read page in memory as well for more performance.

Whenever we mmap a new page, before mmap, we need to ensure that we do not cross the provided threshold limit for memory usage. If so, we remove some pages from memory before proceeding.

Add support for Go modules

bigqueue panics when benchmarks are run

goos: linux
goarch: amd64
pkg: github.com/grandecola/bigqueue
BenchmarkNewMmapQueue/ArenaSize-4KB-8         	     288	   4376092 ns/op	    2853 B/op	      46 allocs/op
BenchmarkNewMmapQueue/ArenaSize-128KB-8       	     278	   4286689 ns/op	    2837 B/op	      46 allocs/op
BenchmarkNewMmapQueue/ArenaSize-4MB-8         	     282	   4316120 ns/op	    2835 B/op	      46 allocs/op
BenchmarkNewMmapQueue/ArenaSize-128MB-8       	     282	   4317199 ns/op	    2817 B/op	      46 allocs/op
BenchmarkEnqueue/ArenaSize-4KB/MessageSize-128B/MaxMem-12KB-8         	 1195894	      1006 ns/op	      50 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-4KB/MessageSize-128B/MaxMem-40KB-8         	  992317	      1037 ns/op	      50 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-4KB/MessageSize-128B/MaxMem-NoLimit-8      	 1240567	       967 ns/op	      53 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-128KB/MessageSize-4KB/MaxMem-384KB-8       	  321355	      3634 ns/op	      49 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-128KB/MessageSize-4KB/MaxMem-1.25MB-8      	  296962	      3627 ns/op	      49 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-128KB/MessageSize-4KB/MaxMem-NoLimit-8     	  337028	      3608 ns/op	      51 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-4MB/MessageSize-128KB/MaxMem-12MB-8        	   14071	     85243 ns/op	      49 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-4MB/MessageSize-128KB/MaxMem-40MB-8        	   14617	     82313 ns/op	      49 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-4MB/MessageSize-128KB/MaxMem-NoLimit-8     	   14323	     89502 ns/op	      52 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-128MB/MessageSize-4MB/MaxMem-256MB-8       	     469	   2677177 ns/op	      49 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-128MB/MessageSize-4MB/MaxMem-1.25GB-8      	     450	   3082011 ns/op	      50 B/op	       1 allocs/op
BenchmarkEnqueue/ArenaSize-128MB/MessageSize-4MB/MaxMem-NoLimit-8     	     444	   3013916 ns/op	      50 B/op	       1 allocs/op
BenchmarkEnqueueString/ArenaSize-4KB/MessageSize-128B/MaxMem-12KB-8   	 1000000	      1072 ns/op	      34 B/op	       1 allocs/op
BenchmarkEnqueueString/ArenaSize-4KB/MessageSize-128B/MaxMem-40KB-8   	 1152850	      1096 ns/op	      34 B/op	       1 allocs/op
BenchmarkEnqueueString/ArenaSize-4KB/MessageSize-128B/MaxMem-NoLimit-8         	panic: runtime error: index out of range [0] with length 0

goroutine 1864 [running]:
github.com/grandecola/mmap.(*File).Flush(0xc0009e4d50, 0x4, 0xc0002f8628, 0x685bc0)
	/home/aman/gocode/pkg/mod/github.com/grandecola/[email protected]/mmap_data.go:93 +0xd6
github.com/grandecola/bigqueue.(*arenaManager).flush(0xc000ed0180, 0xc000b9e120, 0xc0004dce08)
	/home/aman/gocode/src/github.com/grandecola/bigqueue/arenamanager.go:135 +0xaf
github.com/grandecola/bigqueue.(*MmapQueue).Flush(0xc00021c000, 0x0, 0x0)
	/home/aman/gocode/src/github.com/grandecola/bigqueue/bigqueue.go:163 +0x8a
github.com/grandecola/bigqueue.(*MmapQueue).periodicFlush(0xc00021c000)
	/home/aman/gocode/src/github.com/grandecola/bigqueue/bigqueue.go:208 +0x1d3
created by github.com/grandecola/bigqueue.NewMmapQueue
	/home/aman/gocode/src/github.com/grandecola/bigqueue/bigqueue.go:96 +0x422
exit status 2
FAIL	github.com/grandecola/bigqueue	64.609s

Refactoring and new functionality

Refactored and added DequeueAppend([]byte) ([]byte, error)
Please review #84

Allow single writer and multiple reader processes

Integrate with codefactor, codeclimate, codacy

Add support for configuring bigqueue

We should use a Config object to allow configuring BigQueue. We should allow a useful default value for each configuration as well as ensure that configurations are set correctly using possible checks around each parameter. We should allow creating bigqueue without the config object using the default values.

Here is the list of configuration parameters -

Maximum size of Arena (check for a value at least as much as size of a OS page)
Maximum memory used by BigQueue, allow an option for using minimum possible memory, or all available memory (check for a value of at least as much as 2 * Size of Arena)
GC frequency (check for a positive value)

We will have to persist the configuration parameters so that we can read these parameters back across different invocation of same application.

I still think that we should keep the directory as an argument to NewBigQueue to ensure an explicit invocation of creating a queue using a path. Given that path is what defines a BigQueue, the expectation will be set properly in that if you lose the directory, you lose the queue

Why no Len() uint64 method for bigqueue?

Add policy to decide when to delete old data

Implement Kafka using BigQueue

Allow calls to be blocked on empty queue

This is similar to a producer-consumer model, except that queue can never be full. We should potentially add an API such that instead of returning an error when queue is empty, we simply block the caller until an element is added and can be returned.

Document thread-safety

There's no documentation on the thread-safety of the API. Would be great to have some explicit statements there, describing the current state of things.

Microbenchmark

Add benchmark to find performance for each function in BigQueue. This will answer following questions -

Performance of read and write when size of the buffer is increased (with and without flush)
Performance for creating queues and closing them

Concurrent requests/reply IPC

I'm trying to improve the throughput of an app I'm building. Essentially, it currently uses Unix Domain Sockets to transfer messages between two processes. This is because they are well supported in many programming languages, and easy to use for request/reply.

But combining the two process into a single process I get 10x the TPS. So I know there is up-to 10x potential improvement.

My questions is, can mmap do this, give some constraints:

Request/reply (like HTTP).
Concurrent, multiple request/replies in flight.

Limit memory size not available

Bigqueue. Setarenasize() default 128M

Bigqueue. Setmaxinmememarenas() defaults to 3

Theoretically, the maximum number of Enqueue data is 128 * 3, but in practice, I can test unlimited Enqueue data.

How can I limit the Enqueue data to 1024m?

Update APIs to use offsets

This is along the lines of how Kafka works. Given that bigqueue is persistent, it makes a lot of sense to not delete the data after it is read once. Certainly, it could be configured to do so, but that shouldn't be the default choice. Instead, we should allow using offsets per client so that the data could be read from anywhere in the queue as needed.

Update benchmarks

Provide functions to read/write string directly to Arena

Currently, we need to first copy the string (say) into an array of bytes. Then, we will copy this array into the Arena. The double copy can be avoided.

Build a queue lock

A lock that provides write, read and delete lock.

Write Lock

Only one producer can write to BigQueue. Write is acceptable as long as it is at the tail of the queue.

Read Lock

Multiple consumers can simultaneously read from BigQueue. Read is acceptable as long as it is within the boundary of head to tail.

Delete Lock

This is fine, reads may fail if deletes are done before reads in the same region.

grandecola / bigqueue Goto Github PK

bigqueue's People

Contributors

Stargazers

Watchers

Forkers

bigqueue's Issues

Write Lock

Read Lock

Delete Lock

Recommend Projects

Recommend Topics

Recommend Org

Jobs