akrylysov / pogreb Goto Github PK
View Code? Open in Web Editor NEWEmbedded key-value store for read-heavy workloads written in Go
License: Apache License 2.0
Embedded key-value store for read-heavy workloads written in Go
License: Apache License 2.0
In the documentation, you say:
I needed to rebuild the mapping once a day and then access it in read-only mode.
From this it makes me wonder whether pogreb is intended to be used that way, or if it was intended to solve the problem of having to do that.
Hi Team
Need to know whether pogreb can support TB of data in the data store?
And retrieve more than 500k values for a specific hash key using prefix iteration?
Thanks
Vishal
I ran this (on Linux) twice:
package main
import(
DB "github.com/akrylysov/pogreb"
"fmt"
)
func main(){
db, err := DB.Open("test.db", &DB.Options{0, nil})
db=db
fmt.Println(err)
//db.Close()
}
The first run creates a lock file. The second panics with:
pogreb: Performing recovery...
pogreb: Index file size=1024; data file size=512
pogreb: Header dbInfo {level:0 count:0 nBuckets:1 splitBucketIdx:0 freelistOff:-1 hashSeed:2342761507}
panic: runtime error: slice bounds out of range
goroutine 1 [running]:
github.com/akrylysov/pogreb/fs.(*osfile).Slice(0xc4200ee320, 0x20000000000, 0x20000000200, 0x200, 0x200, 0xc420055cd0)
/home/xenox/go/src/github.com/akrylysov/pogreb/fs/os.go:60 +0x60
github.com/akrylysov/pogreb.(*bucketHandle).read(0xc4200559c8, 0x0, 0x0)
/home/xenox/go/src/github.com/akrylysov/pogreb/bucket.go:75 +0x56
github.com/akrylysov/pogreb.(*DB).forEachBucket(0xc420108000, 0xfff00000ffffffff, 0xc420055ce0, 0xc420055ce8, 0x0)
/home/xenox/go/src/github.com/akrylysov/pogreb/db.go:149 +0xc4
github.com/akrylysov/pogreb.recoverSplitCrash(0xc420108000, 0x0, 0x0)
/home/xenox/go/src/github.com/akrylysov/pogreb/recovery.go:60 +0xbf
github.com/akrylysov/pogreb.(*DB).recover(0xc420108000, 0x400, 0x0)
/home/xenox/go/src/github.com/akrylysov/pogreb/recovery.go:140 +0x2e1
github.com/akrylysov/pogreb.Open(0x694821, 0x7, 0xc420055f60, 0x0, 0xc42000e020, 0x694b29)
/home/xenox/go/src/github.com/akrylysov/pogreb/db.go:106 +0x66b
main.main()
/home/xenox/y.go:8 +0x5e
exit status 2
Everything is fine (no error either) if either
db.Close()
, orI just realized that index.numKeys
is a 32-bit uint, and there's MaxKeys = math.MaxUint32
😲
I think it would make sense to change it to 64-bit (any reason why we wouldn't support max 64-bit number of records)? I assume it would break existing dbs (but is still necessary)?
At least it should be clearly stated as limitation in the readme I would suggest.
Our use case is to store billions of records. We've reached already 2 billion records with Pogreb - which means in a matter of weeks we'll hit the current upper limit 😢
Hi there I am currently testing if pogreb fits my needs and am very impressed by its speed however I recently ran some benchmarks ( pogreb-benchmark ) on a Debian Server
./pogreb-bench -n 10_000_000 -p ./pogreb_test/
and am experiencing extremely slow read speed
put: 503.882s 19845 ops/s
I don't have a full duration for read speed since it would take too long to finish but it read about 630000 in 1500s
I also made same test on Macbook where everything works great
Any idea how this is possible? What can I do to pinpoint the issue?
Edit:
I tried it without mmap:
put: 65.852s 151855 ops/s
get: 25.389s 393876 ops/s
However the issue persists at n=100_000_000
Any idea why this is faster
Thanks a lot!
We are recently running into this problem which prevents the database from growing. Every time we call db.Put
we get this error message:
truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation
The whole database folder is 255 GB. The file main.pix
is 37.3 GB of size. Running on Windows Server 2019 as admin and the disk has plenty of storage (4 TB total).
Any idea of the root cause and how to fix it?
I suppose the error message origins from here?
Lines 79 to 86 in e182fb0
Edit: Unrelated to this problem, but in truncate
used by recoveryIterator.next
it uses uint32. That could lead to problems down the road for large segment files?
Lines 97 to 107 in e182fb0
The Sum32WithSeed
function in /hash/murmur32.go fails with "checkptr: unsafe pointer arithmetic" from Go 1.14 onwards, due to the flag -race
now being applied automatically.
This prevents pogreb from working on any non-Windows version running on Go 1.14
An example of more correct code can be found here
Hello, nice to meet you, I was experimenting with your library and I noticed a few things in my review but one thing jumped out and I felt it would be worth bringing up to you.
You decided to do your loops like this:
for i := 0; i < slotsPerBucket; i++ {
_ = data[18] // bounds check hint to compiler; see golang.org/issue/14808
b.slots[i].hash = binary.LittleEndian.Uint32(data[:4])
b.slots[i].keySize = binary.LittleEndian.Uint16(data[4:6])
b.slots[i].valueSize = binary.LittleEndian.Uint32(data[6:10])
b.slots[i].kvOffset = int64(binary.LittleEndian.Uint64(data[10:18]))
data = data[18:]
}
Which is a bit more of a Java or C style. And you could have done it like this:
for _, slot := range b.slots {
_ = data[18] // bounds check hint to compiler; see golang.org/issue/14808
slot.hash = binary.LittleEndian.Uint32(data[:4])
slot.keySize = binary.LittleEndian.Uint16(data[4:6])
slot.valueSize = binary.LittleEndian.Uint32(data[6:10])
slot.kvOffset = int64(binary.LittleEndian.Uint64(data[10:18]))
data = data[18:]
}
I would put forth that it is a bit more than just an aesthetic choice, it becomes both more readable and more manageable. And this could be done throughout your code to reduce the overall footprint a fair size.
I didn't want to create a pull request until I began a discussion regarding the topic in-case there was something I was overlooking and you had a specific reason for this choice.
Pogreb,
Is there any way to reclaim disk storage space after deletion of records?
After restart
`panic: runtime error: slice bounds out of range [:8511984455920089209] with capacity 1073741824
goroutine 1 [running]:
github.com/akrylysov/pogreb/fs.(*osfile).Slice(0xc0002ea3f0, 0x7620a4c3a4c37679, 0x7620a4c3a4c37879, 0xc0000b7b58, 0xc0000b7af8, 0xc0000b7b48, 0xc0009a9340, 0xc0000b7b50)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/fs/os.go:68 +0xa8
github.com/akrylysov/pogreb.(*bucketHandle).read(0xc0000b77d8, 0x20616c6c, 0x20616c6c61766174)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/bucket.go:76 +0x56
github.com/akrylysov/pogreb.(*DB).forEachBucket(0xc0002f01a0, 0xc000000009, 0xc0000b7b58, 0x8928a1, 0x419b36)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:178 +0xc4
github.com/akrylysov/pogreb.(*DB).put(0xc0002f01a0, 0x9d3cc9e9, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:384 +0x161
github.com/akrylysov/pogreb.(*DB).Put(0xc0002f01a0, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0)
/exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:366 +0x16a
gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler.InsertAllQue(0xc0001481c0, 0xc000586000, 0x63, 0x80, 0xc000aae000, 0x9c4)
/exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler/pogrebhandler.go:25 +0x14e
main.main()
/exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/cmd/rpcfeeder/main.go:274 +0x456
exit status 2`
It will be great if key/value will have expiration timestamp in milliseconds, therefore compaction could drop such values.
It is not quite easy though: for optimimizations segment metadata could have some histograms, and compaction should make deletions from hash path.
$ go test -v -bench=. -count 2
testing will panic in freelist_test.go.
To debug it, I modify some code :
func TestFreelistSerialization(t *testing.T) {
l := freelist{[]block{{1, 1}, {2, 2}, {3, 3}, {10, 10}}}
f, _ := openFile(fs.Mem, "test", 0, 0)
CHANGE TO :
func TestFreelistSerialization(t *testing.T) {
l := freelist{[]block{{1, 1}, {2, 2}, {3, 3}, {10, 10}}}
f, err := openFile(fs.Mem, "test", 0, 0)
if err != nil {
t.Fatal(err)
}
openFile() fail : freelist_test.go:130: file already exists
Hi, from documentation its clear that storage can work with multiple goroutines inside one singleton application.
But can it work in scaled applications?
For example, i have N instances of go application. Each have X goroutines.
N * X functions will write data to db file in parallel, its safe?
Basically I want to store ~10TB. The current DB size is 340GB and growing by 150-170GB per day. The thing is i don't want it to die halfway.
So is there any db size limitations or is storing data as 4GB chunks allows it to scale indefinitely?
How to ommit this pogreb output before get my result?
❯ go run main.go getkv prm2
pogreb: moving non-segment files...
pogreb: moved 00000-1.psg.pmt to 00000-1.psg.pmt.bac
pogreb: moved db.pmt to db.pmt.bac
pogreb: moved index.pmt to index.pmt.bac
pogreb: moved main.pix to main.pix.bac
pogreb: moved overflow.pix to overflow.pix.bac
pogreb: error reading segment meta 0: EOF
pogreb: started recovery
pogreb: rebuilding index...
pogreb: removing recovery backup files...
pogreb: removed 00000-1.psg.pmt.bac
pogreb: removed db.pmt.bac
pogreb: removed index.pmt.bac
pogreb: removed main.pix.bac
pogreb: removed overflow.pix.bac
pogreb: successfully recovered database
conten123Test
More details at #22.
Key size=16, value size=1.
Writing 1M items on Linux - 16 sec.
Writing 1M items on Mac - 14 sec.
Writing 1M items on Windows - 165 sec.
1 << 30
is 1 073 741 824, a fourth of the range of a uint32
. One can't necessarily just make more db
files (assuming the hash function is available to direct to the right database) because different operating systems have different file/inode limits and open file descriptor limits.
A related question is why this limit. I don't see how it'd be related to density because keys are arrays of any size. Is a file bigger than a terabyte outside the intended use cases?
For the same data, I only need 300m in size on bboltdb, but it takes up over 26GB of disk space on pogreb. I don't understand why this is?
I am using Bitmap data to store Roaring, and I am using it for inverted indexing. We will first check if the participle is in the aggregate, and then check if the document ID is in the bitmap. If it is not, we will add a new one. Then deposit it into the aggregate.
Here are some example codes。
func IndexDocuments(doc Document) error {
var wg = sync.WaitGroup{}
var err error
for _, tk := range doc.TokenSlice {
wg.Add(1)
go func(tk string, err2 *error) {
defer wg.Done()
value, err := indexDB.Get([]byte(tk))
if err != nil {
*err2 = err
return
}
if value == nil {
rb := roaring.BitmapOf(doc.Id)
data, err := rb.ToBytes()
if err != nil {
*err2 = err
return
}
if err := indexDB.Put(utils.String2Bytes(tk), data); err != nil {
*err2 = err
return
}
return
}
rb, err := read(value)
if err != nil {
*err2 = err
return
}
rb.Add(doc.Id)
data, err := rb.ToBytes()
if err != nil {
*err2 = err
return
}
if err := indexDB.Put(utils.String2Bytes(tk), data); err != nil {
*err2 = err
return
}
}(tk, &err)
}
wg.Wait()
if err != nil {
return err
}
return docDB.Put(utils.Uint2Bytes(doc.Id), utils.String2Bytes(doc.Word))
}
Very slow on Windows. What's the reason? Only 5MB write speed per second
Hi.
db.Get() no return error if key is not exists 🤨
Hi,
I tested pogreb out with a very simple fuzzer that I initially wrote for bigCache
, with very small adaptations (which explains why the test is a bit wonky, calling it "cache", for example). Here's the program:
package main
import (
"bytes"
"context"
"fmt"
"github.com/akrylysov/pogreb"
"math"
"math/rand"
"os"
"os/signal"
"sync"
"syscall"
)
const (
slotsPerBucket = 28
loadFactor = 0.7
indexPostfix = ".index"
lockPostfix = ".lock"
version = 1 // file format version
// MaxKeyLength is the maximum size of a key in bytes.
MaxKeyLength = 1 << 16
// MaxValueLength is the maximum size of a value in bytes.
MaxValueLength = 1 << 30
// MaxKeys is the maximum numbers of keys in the DB.
MaxKeys = math.MaxUint32
)
func removeAndOpen(path string, opts *pogreb.Options) ( *pogreb.DB, error) {
os.Remove(path)
os.Remove(path + indexPostfix)
os.Remove(path + lockPostfix)
return pogreb.Open(path, opts)
}
func fuzzDeletePutGet(ctx context.Context) {
cache, err := removeAndOpen("test.db", nil)
if err != nil {
panic(err)
}
var wg sync.WaitGroup
// Deleter
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case <-ctx.Done():
return
default:
r := uint8(rand.Int())
key := fmt.Sprintf("thekey%d", r)
cache.Delete([]byte(key))
}
}
}()
// Setter
wg.Add(1)
go func() {
defer wg.Done()
val := make([]byte, 1024)
for {
select {
case <-ctx.Done():
return
default:
r := byte(rand.Int())
key := fmt.Sprintf("thekey%d", r)
for j := 0; j < len(val); j++ {
val[j] = r
}
cache.Put([]byte(key), []byte(val))
}
}
}()
// Getter
wg.Add(1)
go func() {
defer wg.Done()
var (
val = make([]byte, 1024)
hits = uint64(0)
misses = uint64(0)
)
for {
select {
case <-ctx.Done():
return
default:
r := byte(rand.Int())
key := fmt.Sprintf("thekey%d", r)
for j := 0; j < len(val); j++ {
val[j] = r
}
if got, err := cache.Get([]byte(key)); got != nil && !bytes.Equal(got, val) {
errStr := fmt.Sprintf("got %s ->\n %x\n expected:\n %x\n ", key, got, val)
panic(errStr)
} else {
if err == nil {
hits++
} else {
misses++
}
}
if total := hits + misses; total%1000000 == 0 {
percentage := float64(100) * float64(hits) / float64(total)
fmt.Printf("Hits %d (%.2f%%) misses %d \n", hits, percentage, misses)
}
}
}
}()
wg.Wait()
}
func main() {
sigs := make(chan os.Signal, 1)
ctx, cancel := context.WithCancel(context.Background())
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
fmt.Println("Press ctrl-c to exit")
go fuzzDeletePutGet(ctx)
<-sigs
fmt.Println("Exiting...")
cancel()
}
The program has three workers :
When I ran it, it errorred out after about 4M
or 5M
tests:
GOROOT=/rw/usrlocal/go #gosetup
GOPATH=/home/user/go #gosetup
/rw/usrlocal/go/bin/go build -o /tmp/___go_build_fuzzer_go /home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go #gosetup
/tmp/___go_build_fuzzer_go #gosetup
Press ctrl-c to exit
Hits 1000000 (100.00%) misses 0
Hits 2000000 (100.00%) misses 0
Hits 3000000 (100.00%) misses 0
Hits 4000000 (100.00%) misses 0
Hits 5000000 (100.00%) misses 0
panic: got thekey112 ->
b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6b6
expected:
70707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070707070
goroutine 10 [running]:
main.fuzzDeletePutGet.func3(0xc00001a650, 0x6ee480, 0xc0000601c0, 0xc00008b110)
/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:108 +0x656
created by main.fuzzDeletePutGet
/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:88 +0x17a
Looking into it a bit, I found that although the Get
method is properly mutex:ed, the value
is in fact a pointer to a slice, and not copied out into a new buffer.
I hacked on a little fix:
diff --git a/db.go b/db.go
index 967bbf0..961add9 100644
--- a/db.go
+++ b/db.go
@@ -288,7 +288,12 @@ func (db *DB) Get(key []byte) ([]byte, error) {
if err != nil {
return nil, err
}
- return retValue, nil
+ var safeRetValue []byte
+ if retValue != nil{
+ safeRetValue = make([]byte, len(retValue))
+ copy(safeRetValue, retValue)
+ }
+ return safeRetValue, nil
}
// Has returns true if the DB contains the given key.
And with the attached fix, I couldn't reproduce it any longer (at least not for 10M+
tests.
The benchmarks without and with the hacky fix are:
BenchmarkGet-6 10000000 166 ns/op
BenchmarkGet-6 10000000 182 ns/op
Now, I'm not totally sure if the testcase is fair, as I'm not 100% sure what concurrency-guarantees pogreb
has. My test has both a setter
and a deleter
, so basically two writers and one reader, which might not be a supported setup? (on the other hand, I'm guessing this flaw should be reproducible even with only one writer)
I am going to try to use your DB with https://github.com/vilterp/treesql
Recently I realized I was opening the wrong database and it took me an hour to figure it out because (*DB).FileSize()
was returning non-zero and (*DB).Count()
was returning zero, and there were no errors reported by (*DB).Open()
. We have no standard way to figure out if the DB is invalid?
As a bonus, doing this will also change the target file even if it wasn't a correct/working database file to begin with.
I wanted to test this db but I got this error:
panic: runtime error: slice bounds out of range [:1073742336] with length 1073741824
goroutine 1 [running]:
github.com/akrylysov/pogreb/fs.mmap(0xc00008c038, 0x40000200, 0x80000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
.../github.com/akrylysov/pogreb/fs/os_windows.go:32 +0x259
github.com/akrylysov/pogreb/fs.(*osfile).Mmap(0xc000068c90, 0x40000200, 0x200, 0x200)
.../github.com/akrylysov/pogreb/fs/os.go:100 +0x6e
github.com/akrylysov/pogreb.(*file).append(0xc00004f140, 0xc0001b6800, 0x200, 0x200, 0x0, 0x0, 0x0)
.../github.com/akrylysov/pogreb/file.go:45 +0xc7
github.com/akrylysov/pogreb.(*dataFile).writeKeyValue(0xc00004f140, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x3ffffe00, 0x0, 0x0)
.../github.com/akrylysov/pogreb/datafile.go:44 +0x1a7
github.com/akrylysov/pogreb.(*DB).put(0xc00004f110, 0xc95a802f, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
.../github.com/akrylysov/pogreb/db.go:432 +0x260
github.com/akrylysov/pogreb.(*DB).Put(0xc00004f110, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
.../github.com/akrylysov/pogreb/db.go:366 +0x171
main.main()
.../main.go:27 +0x1b3
exit status 2
Code:
package main
import (
"encoding/binary"
"github.com/akrylysov/pogreb"
"log"
"time"
)
func main() {
db, err := pogreb.Open("pogreb.test", nil)
if err != nil {
log.Fatal(err)
return
}
defer db.Close()
start := time.Now()
var pk [8]byte
for i := uint64(1); i <= 10000000; i++ {
binary.BigEndian.PutUint64(pk[:], i)
if err := db.Put(pk[:], pk[:]); err != nil {
panic(err)
}
}
log.Println("put 10M: ", time.Now().Sub(start).String())
}
I think the db needs to do automatic fsync when it reaches 1gb file?
We are storing billions of records using Pogreb. It creates many 4GB segment files (.PSG).
It is my understanding that those files represent the write-ahead log (WAL) which is only used in case of recover?
If that is indeed the case, then only the last WAL file needs to be open (for writing)?
Currently those files are literally exhausting our memory and use about 80 GB of RAM.
Using RamMap we found the culprit - memory mapped PSG files:
I don't suppose you or someone would know the answer to this for the latest Go compilers? (version 1.10 at time of comment and assuming a GCC 8 release soon)
Idea: make such things configurable...
If I get by key with empty value how to define that value exists and not only nil?
hi!
Every time I restart, I get this error.
pogreb: error reading segment meta 0: EOF
what it means and how critical it is?
Pogrep is really nice database solution, but i know whitedb is the fastest in world but written in c language.
https://github.com/priitj/whitedb
Thx and
Best regards
Hi, its really fast.
Do you plan to develop any type of replication?
Hey there,
I wanted to start by saying a big thank you for your library—it's been a real game-changer for us! The speed it provides is just incredible.
I'd love to know the best way to backup the database while it's running. Can you share some guidance or tips on how we can ensure a proper backup process without disrupting the ongoing operations? we're wondering if it's possible to copy the database folder directly and expect everything to work seamlessly if we restore that folder onto another machine.
Thank you
I read the code and documentation and wanted to ask if there is a specific reason why you are discarding the old index files and always recreating them? It sounds like a dangerous default and expensive especially re production environments.
In the event of a crash caused by a power loss or an operating system failure, Pogreb discards the index and replays the WAL building a new index from scratch. Segments are iterated from the oldest to the newest and items are inserted into the index.
My use case is to store billions of key-values - and if I read the code correctly, anytime it crashes for any reason, the lock
file will be detected and causes Pogreb to discard the index files (*.pix
). Current estimated indexing time is 8 days and likely hundreds of GB. Any reboot/crash to cause reindex of hundreds of GB and days of work doesn't make sense? Possible solutions:
Options.ReindexOnCrash
to allow the user to specify whether (on false) it should try to re-open, or (on true) immediately reindex everything; or instead:Options.AutoReindexCorruptDatabase
which triggers a reindex only in case openIndex
returns an error. The lock file will be disregarded for crash detection and it will always try to open the existing database.I believe the second option makes most sense. In case of crashes most if not all users assume the database will just pick up where it left - especially in production environments.
Hello, I am trying to understand the internals of pogreb, but unfortunately I cannot seem to understand the semantics of certain aspects of the database. Namely the the data storage aspects and how they provide for ACID semantics ( if and to the extent supported by the database ) and of course the very impressive performance :) Could you please write a few words on the internals of pogreb ? I am sure that such information would be well received. Thank-you.
I compile project for windows 64 bit or macOS 64 bit and it works fine.
When I set GOOS=windows and GOARCH=386, project fails to compile with the following error.
[email protected]\fs\os_mmap_windows.go:23:12: constant 2147483648 overflows int
[email protected]\fs\os_mmap_windows.go:23:12: array bound is too large
Many of the top structures in pogreb are concrete structs rather than interfaces, making it difficult to mock without effort in the using package. Changing some of these to interfaces would make mocking, and therefore, unit testing, easier.
Hi, Can you also share write performance benchmark against badger, bolt and other as provided for read
?
If data will be corrupted, then key+value size could be decoded into large value: unneccessary allocations and unneccessary disk read will follow.
It is better to add checsum for key+value size header to early detect such corruption.
And then crc32 checksum for data path could be resided in a header as well, therefore there will be no need to allocate buffer for both header and data.
I suppose, header could have following structure:
keySize = 2 bytes
typeAndValueSize = 4 bytes
dataCRC = 4 bytes
headerCRC = 4 bytes
Therefore headerCRC will check dataCRC as well.
Hi!
We encountered that our database cannot start with the following error opening index: opening index meta: EOF
. This happened after our nodes continuously restarted for several times due to some panics in our code. These panics presumably happened when the DB tried to recover.
To reproduce this problem on a clean DB you can:
lock
file stay in the folder without a process having a lock on it.acquiredExistingLock
to true and all the index files would be renamed with the .bac
in the endrecover
, but if you crash the program before clean = nil
in pogreb.Open
this will make main.pix
stay in the folder, but with lock
file deleted due to defer
being calledif err := idx.readMeta(); err != nil {
, where there will be no index.pmt
file, thus we will create empty one in readGobFile
only with header due to f.writeHeader()
, but then will fail to decode dec.Decode(v)
Probably the solution will be to rebuild the indexes from scratch if there was a problem opening them, but probably you know better :-)
Btw the library is great and it is really fast for our purposes, so kudos to you for making it!
Thanks a lot!
It has a []char interface, so this should be perfectly fine. However when using utf8 data the db was corrupting the responses. Sorry I don't have a concrete reproducible example. If I can isolate it will provide an example.
Details ethereum/go-ethereum#20029.
When storing small keys/values Pogreb wastes too much space by making all writes 512-byte aligned.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.