holiman / billy Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 7.0 2.95 MB

Very simple datastore

License: BSD 3-Clause "New" or "Revised" License

Go 99.17% Shell 0.83%

billy's People

Contributors

Stargazers

Watchers

Forkers

deepyv epuaphios karalabe cs-joy

billy's Issues

Make OnDataFn return error, delete item if so?

When doing the initial indexing, in theory it can happen that the item is non-parseable (e.g. we change the data format we store on disk). In that case, we won't be able to parse the item in OnDataFn and would be nice to allow deleting it instead.

Since startup and deletion might conflict, it perhaps would be simpler to allow returning an error which would implicitly remove the item from disk. Currently the best I can do is "ignore" the item on startup, collect the "faulty" indixes and then iterate over them and delete them one by one, which feels a bit wonky.

Then again, returning an error causing a deletion is also a bit wonky if there's just a parsing bug or similar weirdness. Open for discussion

Flaw found by fuzzer

[user@work billy]$ go run ./cmd/billyfuzz/ 
Opened ./
1 ops, 1 keys active
Reopening db, ops 1728, keys 844
Opened ./
Reopening db, ops 2612, keys 1001
Opened ./
2613 ops, 1002 keys active
panic: bad index: shelf 32768, slot 10, tail 0

goroutine 1 [running]:
main.doFuzz(0xc0000028c0?)
        /home/user/go/src/github.com/holiman/billy/cmd/billyfuzz/main.go:143 +0xe1b
github.com/urfave/cli/v2.(*Command).Run(0xc0000028c0, 0xc00002e380, {0xc000014240, 0x1, 0x1})
        /home/user/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:271 +0xa42
github.com/urfave/cli/v2.(*App).RunContext(0xc00017c000, {0x5ed3c0?, 0xc000018110}, {0xc000014240, 0x1, 0x1})
        /home/user/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:333 +0x665
github.com/urfave/cli/v2.(*App).Run(...)
        /home/user/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:310
main.main()
        /home/user/go/src/github.com/holiman/billy/cmd/billyfuzz/main.go:50 +0x1c5
exit status 2

and

[user@work billy]$ rm *.bag; go run ./cmd/billyfuzz/ 
Opened ./
1 ops, 1 keys active
Reopening db, ops 1399, keys 684
Opened ./
Reopening db, ops 2377, keys 1005
Opened ./
2378 ops, 1005 keys active
panic: bad index: EOF

goroutine 1 [running]:
main.doFuzz(0xc0000d0780?)
        /home/user/go/src/github.com/holiman/billy/cmd/billyfuzz/main.go:128 +0xf9b
github.com/urfave/cli/v2.(*Command).Run(0xc0000d0780, 0xc0000b4340, {0xc00009e200, 0x1, 0x1})
        /home/user/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:271 +0xa42
github.com/urfave/cli/v2.(*App).RunContext(0xc00012e000, {0x5ed3c0?, 0xc0000ba000}, {0xc00009e200, 0x1, 0x1})
        /home/user/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:333 +0x665
github.com/urfave/cli/v2.(*App).Run(...)
        /home/user/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:310
main.main()
        /home/user/go/src/github.com/holiman/billy/cmd/billyfuzz/main.go:50 +0x1c5

Most likely somewhere in the algo for compaction. Should be possible to repro on a single shelf, and iteratively narrow down on.
Might be a good idea to use the billy.Database interface to add a (n optional) logging interface, to make a repro.

Bucket / item index space not the best split?

Currently there can be 64K buckets and 16M items per bucket. If we'd like to use this as a storage backend for txs, 16M might be on the low end (:scream:) but 64K power-of-2 buckets is insane. Perhaps we could move 4 more bits over and have 8K buckets and 256M items?

Version header in shelf file

The shelf-files should have a magic,e.g. billy-<version byte>, in case we want to modify e.g the item headers. Right now, the shelf-size is part of the filename, but it might be cleaner to (also) put the shelf-size into the metadata, so it becomes

billy <version byte> <shelf-size 4 bytes>

Support linearly increasing bucket sizes

First!

For blob transactions, the sizes will be multiples of 128KB, not powers of 2. Might be significantly less wasteful if we could have an option to initialize the buckets with a linear increase?

Support Snappy compression

I'm unsure about this one, mostly a question. We could in theory add under-the-hood snappy compression, but maybe that would break the bucket-sorting a bit, at least for blob txs. If the originally stable sizes become slightly randomized, it might end up with weird bucket sizes. So might be a moot point, just a thought maybe for other use cases (i.e. optional).

holiman / billy Goto Github PK

billy's People

Contributors

Stargazers

Watchers

Forkers

billy's Issues

Make OnDataFn return error, delete item if so?

Flaw found by fuzzer

Bucket / item index space not the best split?

Version header in shelf file

Support linearly increasing bucket sizes

Support Snappy compression

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs