GithubHelp home page GithubHelp logo

akrylysov / pogreb Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 89.0 105 KB

Embedded key-value store for read-heavy workloads written in Go

License: Apache License 2.0

Go 100.00%
go hash-table key-value key-value-store

pogreb's People

Contributors

akrylysov avatar betawaffle avatar cristaloleg avatar jpillora avatar kajjagtenberg avatar leviathan1995 avatar mattn avatar rfyiamcool avatar testwill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pogreb's Issues

Documentation Clarification: Rebuilding Daily

In the documentation, you say:

I needed to rebuild the mapping once a day and then access it in read-only mode.

From this it makes me wonder whether pogreb is intended to be used that way, or if it was intended to solve the problem of having to do that.

Need information on few points about pogreb

Hi Team

Need to know whether pogreb can support TB of data in the data store?
And retrieve more than 500k values for a specific hash key using prefix iteration?

Thanks
Vishal

Panic, not error, when opening a locked or improperly closed DB

I ran this (on Linux) twice:

package main
import(
 DB "github.com/akrylysov/pogreb"

 "fmt"
)
func main(){
 db, err := DB.Open("test.db", &DB.Options{0, nil})
 db=db
 fmt.Println(err)
 //db.Close()
}

The first run creates a lock file. The second panics with:

pogreb: Performing recovery...
pogreb: Index file size=1024; data file size=512
pogreb: Header dbInfo {level:0 count:0 nBuckets:1 splitBucketIdx:0 freelistOff:-1 hashSeed:2342761507}
panic: runtime error: slice bounds out of range

goroutine 1 [running]:
github.com/akrylysov/pogreb/fs.(*osfile).Slice(0xc4200ee320, 0x20000000000, 0x20000000200, 0x200, 0x200, 0xc420055cd0)
	/home/xenox/go/src/github.com/akrylysov/pogreb/fs/os.go:60 +0x60
github.com/akrylysov/pogreb.(*bucketHandle).read(0xc4200559c8, 0x0, 0x0)
	/home/xenox/go/src/github.com/akrylysov/pogreb/bucket.go:75 +0x56
github.com/akrylysov/pogreb.(*DB).forEachBucket(0xc420108000, 0xfff00000ffffffff, 0xc420055ce0, 0xc420055ce8, 0x0)
	/home/xenox/go/src/github.com/akrylysov/pogreb/db.go:149 +0xc4
github.com/akrylysov/pogreb.recoverSplitCrash(0xc420108000, 0x0, 0x0)
	/home/xenox/go/src/github.com/akrylysov/pogreb/recovery.go:60 +0xbf
github.com/akrylysov/pogreb.(*DB).recover(0xc420108000, 0x400, 0x0)
	/home/xenox/go/src/github.com/akrylysov/pogreb/recovery.go:140 +0x2e1
github.com/akrylysov/pogreb.Open(0x694821, 0x7, 0xc420055f60, 0x0, 0xc42000e020, 0x694b29)
	/home/xenox/go/src/github.com/akrylysov/pogreb/db.go:106 +0x66b
main.main()
	/home/xenox/y.go:8 +0x5e
exit status 2

Everything is fine (no error either) if either

  • the first run includes the db.Close(), or
  • I manually remove the lock file between the first and second runs

4 billion records max?

I just realized that index.numKeys is a 32-bit uint, and there's MaxKeys = math.MaxUint32 😲

I think it would make sense to change it to 64-bit (any reason why we wouldn't support max 64-bit number of records)? I assume it would break existing dbs (but is still necessary)?

At least it should be clearly stated as limitation in the readme I would suggest.

Our use case is to store billions of records. We've reached already 2 billion records with Pogreb - which means in a matter of weeks we'll hit the current upper limit 😢

Extremely slow read speed while put speed is fine on Debian Machine

Hi there I am currently testing if pogreb fits my needs and am very impressed by its speed however I recently ran some benchmarks ( pogreb-benchmark ) on a Debian Server
./pogreb-bench -n 10_000_000 -p ./pogreb_test/
and am experiencing extremely slow read speed

put: 503.882s 19845 ops/s
I don't have a full duration for read speed since it would take too long to finish but it read about 630000 in 1500s

I also made same test on Macbook where everything works great
Any idea how this is possible? What can I do to pinpoint the issue?

Edit:
I tried it without mmap:
put: 65.852s 151855 ops/s
get: 25.389s 393876 ops/s

However the issue persists at n=100_000_000

Any idea why this is faster

Thanks a lot!

Large database truncate problem

We are recently running into this problem which prevents the database from growing. Every time we call db.Put we get this error message:

truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation

The whole database folder is 255 GB. The file main.pix is 37.3 GB of size. Running on Windows Server 2019 as admin and the disk has plenty of storage (4 TB total).

Any idea of the root cause and how to fix it?

I suppose the error message origins from here?

pogreb/file.go

Lines 79 to 86 in e182fb0

func (f *file) extend(size uint32) (int64, error) {
off := f.size
if err := f.Truncate(off + int64(size)); err != nil {
return 0, err
}
f.size += int64(size)
return off, f.Mmap(f.size)
}

Edit: Unrelated to this problem, but in truncate used by recoveryIterator.next it uses uint32. That could lead to problems down the road for large segment files?

pogreb/file.go

Lines 97 to 107 in e182fb0

func (f *file) truncate(size uint32) error {
// Truncating memory-mapped file will fail on Windows. Unmap it first.
if err := f.Mmap(0); err != nil {
return err
}
if err := f.Truncate(int64(size)); err != nil {
return err
}
f.size = int64(size)
return f.Mmap(f.size)
}

Question about your loops

Hello, nice to meet you, I was experimenting with your library and I noticed a few things in my review but one thing jumped out and I felt it would be worth bringing up to you.

You decided to do your loops like this:

for i := 0; i < slotsPerBucket; i++ {
		_ = data[18] // bounds check hint to compiler; see golang.org/issue/14808
		b.slots[i].hash = binary.LittleEndian.Uint32(data[:4])
		b.slots[i].keySize = binary.LittleEndian.Uint16(data[4:6])
		b.slots[i].valueSize = binary.LittleEndian.Uint32(data[6:10])
		b.slots[i].kvOffset = int64(binary.LittleEndian.Uint64(data[10:18]))
		data = data[18:]
}

Which is a bit more of a Java or C style. And you could have done it like this:

for _, slot := range b.slots {
		_ = data[18] // bounds check hint to compiler; see golang.org/issue/14808
		slot.hash = binary.LittleEndian.Uint32(data[:4])
		slot.keySize = binary.LittleEndian.Uint16(data[4:6])
		slot.valueSize = binary.LittleEndian.Uint32(data[6:10])
		slot.kvOffset = int64(binary.LittleEndian.Uint64(data[10:18]))
		data = data[18:]
}

I would put forth that it is a bit more than just an aesthetic choice, it becomes both more readable and more manageable. And this could be done throughout your code to reduce the overall footprint a fair size.

I didn't want to create a pull request until I began a discussion regarding the topic in-case there was something I was overlooking and you had a specific reason for this choice.

panic after restart

After restart

`panic: runtime error: slice bounds out of range [:8511984455920089209] with capacity 1073741824

goroutine 1 [running]:
github.com/akrylysov/pogreb/fs.(*osfile).Slice(0xc0002ea3f0, 0x7620a4c3a4c37679, 0x7620a4c3a4c37879, 0xc0000b7b58, 0xc0000b7af8, 0xc0000b7b48, 0xc0009a9340, 0xc0000b7b50)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/fs/os.go:68 +0xa8
github.com/akrylysov/pogreb.(*bucketHandle).read(0xc0000b77d8, 0x20616c6c, 0x20616c6c61766174)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/bucket.go:76 +0x56
github.com/akrylysov/pogreb.(*DB).forEachBucket(0xc0002f01a0, 0xc000000009, 0xc0000b7b58, 0x8928a1, 0x419b36)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:178 +0xc4
github.com/akrylysov/pogreb.(*DB).put(0xc0002f01a0, 0x9d3cc9e9, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:384 +0x161
github.com/akrylysov/pogreb.(*DB).Put(0xc0002f01a0, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:366 +0x16a
gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler.InsertAllQue(0xc0001481c0, 0xc000586000, 0x63, 0x80, 0xc000aae000, 0x9c4)
/exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler/pogrebhandler.go:25 +0x14e
main.main()
/exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/cmd/rpcfeeder/main.go:274 +0x456
exit status 2`

Add expiration

It will be great if key/value will have expiration timestamp in milliseconds, therefore compaction could drop such values.

It is not quite easy though: for optimimizations segment metadata could have some histograms, and compaction should make deletions from hash path.

bad test freelist_test.go

$ go test -v -bench=. -count 2

testing will panic in freelist_test.go.

To debug it, I modify some code :

func TestFreelistSerialization(t *testing.T) {
l := freelist{[]block{{1, 1}, {2, 2}, {3, 3}, {10, 10}}}
f, _ := openFile(fs.Mem, "test", 0, 0)

CHANGE TO :

func TestFreelistSerialization(t *testing.T) {
l := freelist{[]block{{1, 1}, {2, 2}, {3, 3}, {10, 10}}}
f, err := openFile(fs.Mem, "test", 0, 0)
if err != nil {
t.Fatal(err)
}

openFile() fail : freelist_test.go:130: file already exists

Its safe for multiple go instance writes?

Hi, from documentation its clear that storage can work with multiple goroutines inside one singleton application.
But can it work in scaled applications?

For example, i have N instances of go application. Each have X goroutines.
N * X functions will write data to db file in parallel, its safe?

How to ommit pogreb output before the result of get ?

How to ommit this pogreb output before get my result?

❯ go run main.go getkv prm2
pogreb: moving non-segment files...
pogreb: moved 00000-1.psg.pmt to 00000-1.psg.pmt.bac
pogreb: moved db.pmt to db.pmt.bac
pogreb: moved index.pmt to index.pmt.bac
pogreb: moved main.pix to main.pix.bac
pogreb: moved overflow.pix to overflow.pix.bac
pogreb: error reading segment meta 0: EOF
pogreb: started recovery
pogreb: rebuilding index...
pogreb: removing recovery backup files...
pogreb: removed 00000-1.psg.pmt.bac
pogreb: removed db.pmt.bac
pogreb: removed index.pmt.bac
pogreb: removed main.pix.bac
pogreb: removed overflow.pix.bac
pogreb: successfully recovered database
conten123Test

Low write performance on Windows

More details at #22.

Key size=16, value size=1.

Writing 1M items on Linux - 16 sec.
Writing 1M items on Mac - 14 sec.
Writing 1M items on Windows - 165 sec.

Document MaxKeys and workaround?

1 << 30 is 1 073 741 824, a fourth of the range of a uint32. One can't necessarily just make more db files (assuming the hash function is available to direct to the right database) because different operating systems have different file/inode limits and open file descriptor limits.

A related question is why this limit. I don't see how it'd be related to density because keys are arrays of any size. Is a file bigger than a terabyte outside the intended use cases?

High disk space occupation

For the same data, I only need 300m in size on bboltdb, but it takes up over 26GB of disk space on pogreb. I don't understand why this is?
I am using Bitmap data to store Roaring, and I am using it for inverted indexing. We will first check if the participle is in the aggregate, and then check if the document ID is in the bitmap. If it is not, we will add a new one. Then deposit it into the aggregate.
Here are some example codes。

func IndexDocuments(doc Document) error {
	var wg = sync.WaitGroup{}
	var err error
	for _, tk := range doc.TokenSlice {
		wg.Add(1)
		go func(tk string, err2 *error) {
			defer wg.Done()
			value, err := indexDB.Get([]byte(tk))
			if err != nil {
				*err2 = err
				return
			}
			if value == nil {
				rb := roaring.BitmapOf(doc.Id)
				data, err := rb.ToBytes()
				if err != nil {
					*err2 = err
					return
				}
				if err := indexDB.Put(utils.String2Bytes(tk), data); err != nil {
					*err2 = err
					return
				}
				return
			}
			rb, err := read(value)
			if err != nil {
				*err2 = err
				return
			}
			rb.Add(doc.Id)
			data, err := rb.ToBytes()
			if err != nil {
				*err2 = err
				return
			}
			if err := indexDB.Put(utils.String2Bytes(tk), data); err != nil {
				*err2 = err
				return
			}
		}(tk, &err)
	}
	wg.Wait()
	if err != nil {
		return err
	}
	return docDB.Put(utils.Uint2Bytes(doc.Id), utils.String2Bytes(doc.Word))
}

Data corruption due to slice internals exposed

Hi,
I tested pogreb out with a very simple fuzzer that I initially wrote for bigCache, with very small adaptations (which explains why the test is a bit wonky, calling it "cache", for example). Here's the program:

package main

import (
	"bytes"
	"context"
	"fmt"
	"github.com/akrylysov/pogreb"
	"math"
	"math/rand"
	"os"
	"os/signal"
	"sync"
	"syscall"
)
const (
	slotsPerBucket = 28
	loadFactor     = 0.7
	indexPostfix   = ".index"
	lockPostfix    = ".lock"
	version        = 1 // file format version

	// MaxKeyLength is the maximum size of a key in bytes.
	MaxKeyLength = 1 << 16

	// MaxValueLength is the maximum size of a value in bytes.
	MaxValueLength = 1 << 30

	// MaxKeys is the maximum numbers of keys in the DB.
	MaxKeys = math.MaxUint32
)


func removeAndOpen(path string, opts *pogreb.Options) ( *pogreb.DB, error) {
	os.Remove(path)
	os.Remove(path + indexPostfix)
	os.Remove(path + lockPostfix)
	return pogreb.Open(path, opts)
}


func fuzzDeletePutGet(ctx context.Context) {

	cache, err := removeAndOpen("test.db", nil)
	if err != nil {
		panic(err)
	}
	var wg sync.WaitGroup

	// Deleter
	wg.Add(1)
	go func() {
		defer wg.Done()
		for {
			select {
			case <-ctx.Done():
				return
			default:
				r := uint8(rand.Int())
				key := fmt.Sprintf("thekey%d", r)
				cache.Delete([]byte(key))
			}
		}
	}()

	// Setter
	wg.Add(1)
	go func() {
		defer wg.Done()
		val := make([]byte, 1024)
		for {
			select {
			case <-ctx.Done():
				return
			default:
				r := byte(rand.Int())
				key := fmt.Sprintf("thekey%d", r)

				for j := 0; j < len(val); j++ {
					val[j] = r
				}
				cache.Put([]byte(key), []byte(val))
			}
		}
	}()

	// Getter
	wg.Add(1)
	go func() {
		defer wg.Done()
		var (
			val    = make([]byte, 1024)
			hits   = uint64(0)
			misses = uint64(0)
		)
		for {
			select {
			case <-ctx.Done():
				return
			default:
				r := byte(rand.Int())
				key := fmt.Sprintf("thekey%d", r)

				for j := 0; j < len(val); j++ {
					val[j] = r
				}
				if got, err := cache.Get([]byte(key)); got != nil && !bytes.Equal(got, val) {
					errStr := fmt.Sprintf("got %s ->\n %x\n expected:\n %x\n ", key, got, val)
					panic(errStr)
				} else {
					if err == nil {
						hits++
					} else {
						misses++
					}
				}
				if total := hits + misses; total%1000000 == 0 {
					percentage := float64(100) * float64(hits) / float64(total)
					fmt.Printf("Hits %d (%.2f%%) misses %d \n", hits, percentage, misses)
				}
			}
		}
	}()
	wg.Wait()

}
func main() {

	sigs := make(chan os.Signal, 1)
	ctx, cancel := context.WithCancel(context.Background())
	signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
	fmt.Println("Press ctrl-c to exit")
	go fuzzDeletePutGet(ctx)

	<-sigs
	fmt.Println("Exiting...")
	cancel()

}

The program has three workers :

  • One that randomly deletes a key
  • One that randomly writes a key, where there's a well defined correlation between key and value.
  • One that randomly checks if a key/value mapping is consistent.

When I ran it, it errorred out after about 4M or 5M tests:

GOROOT=/rw/usrlocal/go #gosetup
GOPATH=/home/user/go #gosetup
/rw/usrlocal/go/bin/go build -o /tmp/___go_build_fuzzer_go /home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go #gosetup
/tmp/___go_build_fuzzer_go #gosetup
Press ctrl-c to exit
Hits 1000000 (100.00%) misses 0 
Hits 2000000 (100.00%) misses 0 
Hits 3000000 (100.00%) misses 0 
Hits 4000000 (100.00%) misses 0 
Hits 5000000 (100.00%) misses 0 
panic: got thekey112 ->
 b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6
 expected:
 70707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070
 

goroutine 10 [running]:
main.fuzzDeletePutGet.func3(0xc00001a650, 0x6ee480, 0xc0000601c0, 0xc00008b110)
	/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:108 +0x656
created by main.fuzzDeletePutGet
	/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:88 +0x17a

Looking into it a bit, I found that although the Get method is properly mutex:ed, the value is in fact a pointer to a slice, and not copied out into a new buffer.

I hacked on a little fix:

diff --git a/db.go b/db.go
index 967bbf0..961add9 100644
--- a/db.go
+++ b/db.go
@@ -288,7 +288,12 @@ func (db *DB) Get(key []byte) ([]byte, error) {
        if err != nil {
                return nil, err
        }
-       return retValue, nil
+       var safeRetValue []byte
+       if retValue != nil{
+               safeRetValue = make([]byte, len(retValue))
+               copy(safeRetValue, retValue)
+       }
+       return safeRetValue, nil
 }
 
 // Has returns true if the DB contains the given key.

And with the attached fix, I couldn't reproduce it any longer (at least not for 10M+ tests.

The benchmarks without and with the hacky fix are:

BenchmarkGet-6   	10000000	       166 ns/op
BenchmarkGet-6   	10000000	       182 ns/op

Now, I'm not totally sure if the testcase is fair, as I'm not 100% sure what concurrency-guarantees pogreb has. My test has both a setter and a deleter, so basically two writers and one reader, which might not be a supported setup? (on the other hand, I'm guessing this flaw should be reproducible even with only one writer)

Open/read does not fail on invalid file

Recently I realized I was opening the wrong database and it took me an hour to figure it out because (*DB).FileSize() was returning non-zero and (*DB).Count() was returning zero, and there were no errors reported by (*DB).Open(). We have no standard way to figure out if the DB is invalid?

As a bonus, doing this will also change the target file even if it wasn't a correct/working database file to begin with.

Slice out of bounds

I wanted to test this db but I got this error:

panic: runtime error: slice bounds out of range [:1073742336] with length 1073741824

goroutine 1 [running]:
github.com/akrylysov/pogreb/fs.mmap(0xc00008c038, 0x40000200, 0x80000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        .../github.com/akrylysov/pogreb/fs/os_windows.go:32 +0x259
github.com/akrylysov/pogreb/fs.(*osfile).Mmap(0xc000068c90, 0x40000200, 0x200, 0x200)
        .../github.com/akrylysov/pogreb/fs/os.go:100 +0x6e
github.com/akrylysov/pogreb.(*file).append(0xc00004f140, 0xc0001b6800, 0x200, 0x200, 0x0, 0x0, 0x0)
        .../github.com/akrylysov/pogreb/file.go:45 +0xc7
github.com/akrylysov/pogreb.(*dataFile).writeKeyValue(0xc00004f140, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x3ffffe00, 0x0, 0x0)
        .../github.com/akrylysov/pogreb/datafile.go:44 +0x1a7
github.com/akrylysov/pogreb.(*DB).put(0xc00004f110, 0xc95a802f, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
        .../github.com/akrylysov/pogreb/db.go:432 +0x260
github.com/akrylysov/pogreb.(*DB).Put(0xc00004f110, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
        .../github.com/akrylysov/pogreb/db.go:366 +0x171
main.main()
        .../main.go:27 +0x1b3
exit status 2

Code:

package main

import (
	"encoding/binary"
	"github.com/akrylysov/pogreb"
	"log"
	"time"
)

func main() {
	db, err := pogreb.Open("pogreb.test", nil)
	if err != nil {
		log.Fatal(err)
		return
	}
	defer db.Close()

	start := time.Now()
	var pk [8]byte
	for i := uint64(1); i <= 10000000; i++ {
		binary.BigEndian.PutUint64(pk[:], i)
		if err := db.Put(pk[:], pk[:]); err != nil {
			panic(err)
		}
	}

	log.Println("put 10M: ", time.Now().Sub(start).String())
}

I think the db needs to do automatic fsync when it reaches 1gb file?

Memory mapping all segment files causes memory exhaustion

We are storing billions of records using Pogreb. It creates many 4GB segment files (.PSG).
It is my understanding that those files represent the write-ahead log (WAL) which is only used in case of recover?

If that is indeed the case, then only the last WAL file needs to be open (for writing)?
Currently those files are literally exhausting our memory and use about 80 GB of RAM.

image

Using RamMap we found the culprit - memory mapped PSG files:
image

Are the metrics optimized out if unused?

I don't suppose you or someone would know the answer to this for the latest Go compilers? (version 1.10 at time of comment and assuming a GCC 8 release soon)

Idea: make such things configurable...

error reading segment meta 0

hi!

Every time I restart, I get this error.

pogreb: error reading segment meta 0: EOF

what it means and how critical it is?

Replication

Hi, its really fast.
Do you plan to develop any type of replication?

Need Guidance on Backing Up Running Database

Hey there,

I wanted to start by saying a big thank you for your library—it's been a real game-changer for us! The speed it provides is just incredible.

I'd love to know the best way to backup the database while it's running. Can you share some guidance or tips on how we can ensure a proper backup process without disrupting the ongoing operations? we're wondering if it's possible to copy the database folder directly and expect everything to work seamlessly if we restore that folder onto another machine.

Thank you

Improve crash handling

I read the code and documentation and wanted to ask if there is a specific reason why you are discarding the old index files and always recreating them? It sounds like a dangerous default and expensive especially re production environments.

In the event of a crash caused by a power loss or an operating system failure, Pogreb discards the index and replays the WAL building a new index from scratch. Segments are iterated from the oldest to the newest and items are inserted into the index.

My use case is to store billions of key-values - and if I read the code correctly, anytime it crashes for any reason, the lock file will be detected and causes Pogreb to discard the index files (*.pix). Current estimated indexing time is 8 days and likely hundreds of GB. Any reboot/crash to cause reindex of hundreds of GB and days of work doesn't make sense? Possible solutions:

  1. New Options.ReindexOnCrash to allow the user to specify whether (on false) it should try to re-open, or (on true) immediately reindex everything; or instead:
  2. Introduce Options.AutoReindexCorruptDatabase which triggers a reindex only in case openIndex returns an error. The lock file will be disregarded for crash detection and it will always try to open the existing database.

I believe the second option makes most sense. In case of crashes most if not all users assume the database will just pick up where it left - especially in production environments.

Some explanation of the internals ?

Hello, I am trying to understand the internals of pogreb, but unfortunately I cannot seem to understand the semantics of certain aspects of the database. Namely the the data storage aspects and how they provide for ACID semantics ( if and to the extent supported by the database ) and of course the very impressive performance :) Could you please write a few words on the internals of pogreb ? I am sure that such information would be well received. Thank-you.

REQUEST: Changes to allow testing & mocking

Many of the top structures in pogreb are concrete structs rather than interfaces, making it difficult to mock without effort in the using package. Changing some of these to interfaces would make mocking, and therefore, unit testing, easier.

crc checksum for header (key + value sizes)

If data will be corrupted, then key+value size could be decoded into large value: unneccessary allocations and unneccessary disk read will follow.

It is better to add checsum for key+value size header to early detect such corruption.

And then crc32 checksum for data path could be resided in a header as well, therefore there will be no need to allocate buffer for both header and data.

I suppose, header could have following structure:

keySize = 2 bytes
typeAndValueSize = 4 bytes
dataCRC = 4 bytes
headerCRC = 4 bytes
Therefore headerCRC will check dataCRC as well.

DB is in an incorrect state after failed recovery

Hi!

We encountered that our database cannot start with the following error opening index: opening index meta: EOF. This happened after our nodes continuously restarted for several times due to some panics in our code. These panics presumably happened when the DB tried to recover.

To reproduce this problem on a clean DB you can:

  • open a DB, but not close it, this will make the lock file stay in the folder without a process having a lock on it.
  • this will set acquiredExistingLock to true and all the index files would be renamed with the .bac in the end
  • the DB will run recover, but if you crash the program before clean = nil in pogreb.Open this will make main.pix stay in the folder, but with lock file deleted due to defer being called
  • therefore the next time you start this condition will be called if err := idx.readMeta(); err != nil {, where there will be no index.pmt file, thus we will create empty one in readGobFile only with header due to f.writeHeader(), but then will fail to decode dec.Decode(v)

Probably the solution will be to rebuild the indexes from scratch if there was a problem opening them, but probably you know better :-)

Btw the library is great and it is really fast for our purposes, so kudos to you for making it!

Thanks a lot!

Is UTF8 character encoding supported?

It has a []char interface, so this should be perfectly fine. However when using utf8 data the db was corrupting the responses. Sorry I don't have a concrete reproducible example. If I can isolate it will provide an example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.