Comments (15)
@0joshuaolson1 Both index and data files are stored entirely on disk, so nothing special happens when you have a larger-than-memory data set. I tested ~16 GB database on a DigitalOcean droplet with 512 MB of memory and didn't notice any performance issues.
I'll publish a post that explains how pogreb works in a week.
from pogreb.
Thank-you very much for taking the time to write your blog post. It really is helpful for understanding the internals of pogreb.
from pogreb.
@akrylysov Or maybe how the load factor and stuff was found? Was it copied from Go's built-in map implementation?
I'm more curious, though, what happens when the data is larger than memory? This library could be what I'm looking for if I overlook the mutex, but I don't think kernel paging removes the performance penalties of larger-than-RAM random access?
Thanks for any help you can give!
from pogreb.
@akrylysov I look forward to it! Where should I watch for the post?
Possibly good reading:
- https://danluu.com/file-consistency/
- http://oldblog.antirez.com/post/what-is-wrong-with-2006-programming.html
from pogreb.
@suprafun @0joshuaolson1 check the first draft https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/. Please let me know if you have any questions.
from pogreb.
The post answered most of my questions! (great English, btw)
A few things that weren't addressed:
- The benchmarks are based on the size of the index file (proportional to the number of keys?). What other limits/dimensions are there? What about the ratio of reads and writes? Size of the db file? What use case(s) did you actually make this for?
- I saw a lot of slowdown with my own tests at much higher loads then the blog post or readme discuss. Why linear hashing in particular? At what point is hash collisions a bottleneck? Is MurmurHash best for short strings?
I hope you had fun making pogreb
. It's your choice whether to consider this issue closed.
from pogreb.
@0joshuaolson1 Thank you!
The benchmarks are based on the size of the index file (proportional to the number of keys?). What other limits/dimensions are there? What about the ratio of reads and writes? Size of the db file?
You mean the load factor benchmark? I measured only the number of get
operations per seconds. The write performance wasn't important for my use case - I needed fast random lookups.
What use case(s) did you actually make this for?
Pogreb was designed for systems with infrequent bulk inserts and frequent random lookups. For example you have a large numbers of keys like user_id:keyword
, you want to access them as fast as possible, but at the same time you don't want to keep them in memory.
I saw a lot of slowdown with my own tests at much higher loads then the blog post or readme discuss.
How many items did you have? What was the average size of keys and values?
Why linear hashing in particular?
Linear hashing allows growing the index one bucket at a time. Extendible hashing is another dynamic hash table algorithm, but it requires an additional "directory" which means one more I/O operation for lookups. Take a look at http://www.cs.sfu.ca/CourseCentral/354/lxwu/notes/chapter11.pdf.
At what point is hash collisions a bottleneck?
Hash collisions will become a problem if you have more than 232 items because pogreb uses 32-bit hashes.
Is MurmurHash best for short strings?
According to https://softwareengineering.stackexchange.com/a/145633/126609 MurmurHash is pretty good for different kinds of data.
I appreciate your feedback!
from pogreb.
How many items did you have? What was the average size of keys and values?
My program did nothing but insert [4]byte
keys and empty ([0]byte
) values and periodically measure time intervals. By 200,000,000 keys I seem to remember seeing 256*256 (less than 100,000) inserts taking about a minute but with huge variation.
Hash collisions will become a problem if you have more than 2^32 items
The library doesn't allow more than that right now. I meant, e.g. at 2^29
items is an insert significantly worse than O(1, or size of a bucket)?
from pogreb.
Pogreb was designed for systems
I mean, like what specifically, I'm curious? You can't talk about it? /:)
from pogreb.
My program did nothing but insert [4]byte keys and empty ([0]byte) values and periodically measure time intervals. By 200,000,000 keys I seem to remember seeing 256*256 (less than 100,000) inserts taking about a minute but with huge variation.
Thanks, I'll try to reproduce the issue. What OS/file system did you use?
I meant, e.g. at 2^29 items is an insert significantly worse than O(1, or size of a bucket)?
The amortized insertion complexity is O(1), it doesn't increase with the number of items.
from pogreb.
The amortized insertion complexity is O(1)
On average even with hash collisions? Neat.
Linux (64-bit) with ext4. Now I wish I still had the script around. It went something like
package main
import(
DB "github.com/akrylysov/pogreb"
"fmt"
"time"
)
func main(){
db, err := DB.open(...) // I think I tried with no sync and every few minutes?
...
var key [4]byte
var value [0]byte
keySlice := key[:]
valueSlice := value[:]
t1 := time.Now()
for;;{
db.Put(keySlice, valueSlice)
if key[0] < 255{key[0]++; continue}
key[0] = 0
if key[1] < 255{key[1]++; continue}
key[1] = 0
// these 3 lines could go after `key[2] = 0` instead for less frequent output
t2 := time.Now()
fmt.Println(db.Count(), db.FileSize(), t2.Sub(t1))
t1 = t2
if key[2] < 255{key[2]++; continue}
key[2] = 0
key[3]++ // run out of keys before overflow
}
}
from pogreb.
So yeah, it's not really a fair test, being single-threaded and write-only.
from pogreb.
Hello, I can't open this link:
https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/
from pogreb.
@YuriyIlyin are you trying to open the link from Russia? if yes, my website (and most of DigitalOcean I believe) was blocked by roskomnadzor last year. You can try to use VPN or webarchive https://web.archive.org/web/20190207005554/https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/. Sorry for the inconvenience.
from pogreb.
@akrylysov Yep, that's right. Many thanks!
from pogreb.
Related Issues (20)
- Documentation Clarification: Rebuilding Daily HOT 2
- panic after restart HOT 8
- Reclaim storage space after deletion HOT 1
- crc checksum for header (key + value sizes) HOT 1
- Add expiration HOT 1
- murmur hash functions fail on non-Windows machines due to unsafe pointers on go 1.14 HOT 2
- Improve crash handling
- Memory mapping all segment files causes memory exhaustion HOT 3
- Large database truncate problem HOT 3
- 4 billion records max? HOT 9
- Replication HOT 1
- How to define if the key not exists? HOT 1
- Failure to compile for x386 32 bit plaform HOT 1
- Sync method not sync mmap content to disk
- Its safe for multiple go instance writes? HOT 1
- Is UTF8 character encoding supported? HOT 1
- Need information on few points about pogreb HOT 1
- How to ommit pogreb output before the result of get ? HOT 1
- Very slow on Windows. What's the reason? Only 5MB write speed per second HOT 1
- Extremely slow read speed while put speed is fine on Debian Machine HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pogreb.