Hello, I am trying to understand the internals of pogreb, but unfortunately I cannot s

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Some explanation of the internals ? about pogreb HOT 15 CLOSED

akrylysov commented on May 10, 2024

Some explanation of the internals ?

from pogreb.

Comments (15)

akrylysov commented on May 10, 2024 2

@0joshuaolson1 Both index and data files are stored entirely on disk, so nothing special happens when you have a larger-than-memory data set. I tested ~16 GB database on a DigitalOcean droplet with 512 MB of memory and didn't notice any performance issues.

I'll publish a post that explains how pogreb works in a week.

from pogreb.

suprafun commented on May 10, 2024 1

Thank-you very much for taking the time to write your blog post. It really is helpful for understanding the internals of pogreb.

from pogreb.

0joshuaolson1 commented on May 10, 2024

@akrylysov Or maybe how the load factor and stuff was found? Was it copied from Go's built-in map implementation?

I'm more curious, though, what happens when the data is larger than memory? This library could be what I'm looking for if I overlook the mutex, but I don't think kernel paging removes the performance penalties of larger-than-RAM random access?

Thanks for any help you can give!

from pogreb.

0joshuaolson1 commented on May 10, 2024

@akrylysov I look forward to it! ~~Where should I watch for the post?~~

Possibly good reading:

from pogreb.

akrylysov commented on May 10, 2024

@suprafun @0joshuaolson1 check the first draft https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/. Please let me know if you have any questions.

from pogreb.

0joshuaolson1 commented on May 10, 2024

The post answered most of my questions! (great English, btw)

A few things that weren't addressed:

The benchmarks are based on the size of the index file (proportional to the number of keys?). What other limits/dimensions are there? What about the ratio of reads and writes? Size of the db file? What use case(s) did you actually make this for?
I saw a lot of slowdown with my own tests at much higher loads then the blog post or readme discuss. Why linear hashing in particular? At what point is hash collisions a bottleneck? Is MurmurHash best for short strings?

I hope you had fun making pogreb. It's your choice whether to consider this issue closed.

from pogreb.

akrylysov commented on May 10, 2024

@0joshuaolson1 Thank you!

The benchmarks are based on the size of the index file (proportional to the number of keys?). What other limits/dimensions are there? What about the ratio of reads and writes? Size of the db file?

You mean the load factor benchmark? I measured only the number of get operations per seconds. The write performance wasn't important for my use case - I needed fast random lookups.

What use case(s) did you actually make this for?

Pogreb was designed for systems with infrequent bulk inserts and frequent random lookups. For example you have a large numbers of keys like user_id:keyword, you want to access them as fast as possible, but at the same time you don't want to keep them in memory.

I saw a lot of slowdown with my own tests at much higher loads then the blog post or readme discuss.

How many items did you have? What was the average size of keys and values?

Why linear hashing in particular?

Linear hashing allows growing the index one bucket at a time. Extendible hashing is another dynamic hash table algorithm, but it requires an additional "directory" which means one more I/O operation for lookups. Take a look at http://www.cs.sfu.ca/CourseCentral/354/lxwu/notes/chapter11.pdf.

At what point is hash collisions a bottleneck?

Hash collisions will become a problem if you have more than 2³² items because pogreb uses 32-bit hashes.

Is MurmurHash best for short strings?

According to https://softwareengineering.stackexchange.com/a/145633/126609 MurmurHash is pretty good for different kinds of data.

I appreciate your feedback!

from pogreb.

0joshuaolson1 commented on May 10, 2024

How many items did you have? What was the average size of keys and values?

My program did nothing but insert [4]byte keys and empty ([0]byte) values and periodically measure time intervals. By 200,000,000 keys I seem to remember seeing 256*256 (less than 100,000) inserts taking about a minute but with huge variation.

Hash collisions will become a problem if you have more than 2^32 items

The library doesn't allow more than that right now. I meant, e.g. at 2^29 items is an insert significantly worse than O(1, or size of a bucket)?

from pogreb.

0joshuaolson1 commented on May 10, 2024

Pogreb was designed for systems

I mean, like what specifically, I'm curious? You can't talk about it? /:)

from pogreb.

akrylysov commented on May 10, 2024

My program did nothing but insert [4]byte keys and empty ([0]byte) values and periodically measure time intervals. By 200,000,000 keys I seem to remember seeing 256*256 (less than 100,000) inserts taking about a minute but with huge variation.

Thanks, I'll try to reproduce the issue. What OS/file system did you use?

I meant, e.g. at 2^29 items is an insert significantly worse than O(1, or size of a bucket)?

The amortized insertion complexity is O(1), it doesn't increase with the number of items.

from pogreb.

0joshuaolson1 commented on May 10, 2024

The amortized insertion complexity is O(1)

On average even with hash collisions? Neat.

Linux (64-bit) with ext4. Now I wish I still had the script around. It went something like

package main
import(
  DB "github.com/akrylysov/pogreb"
  "fmt"
  "time"
)
func main(){
  db, err := DB.open(...) // I think I tried with no sync and every few minutes?
  ...
  var key [4]byte
  var value [0]byte
  keySlice := key[:]
  valueSlice := value[:]
  t1 := time.Now()
  for;;{
    db.Put(keySlice, valueSlice)
    if key[0] < 255{key[0]++; continue}
    key[0] = 0
    if key[1] < 255{key[1]++; continue}
    key[1] = 0

    // these 3 lines could go after `key[2] = 0` instead for less frequent output
    t2 := time.Now()
    fmt.Println(db.Count(), db.FileSize(), t2.Sub(t1))
    t1 = t2

    if key[2] < 255{key[2]++; continue}
    key[2] = 0
    key[3]++ // run out of keys before overflow
  }
}

from pogreb.

0joshuaolson1 commented on May 10, 2024

So yeah, it's not really a fair test, being single-threaded and write-only.

from pogreb.

YuriyIlyin commented on May 10, 2024

Hello, I can't open this link:
https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/

from pogreb.

akrylysov commented on May 10, 2024

@YuriyIlyin are you trying to open the link from Russia? if yes, my website (and most of DigitalOcean I believe) was blocked by roskomnadzor last year. You can try to use VPN or webarchive https://web.archive.org/web/20190207005554/https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/. Sorry for the inconvenience.

from pogreb.

YuriyIlyin commented on May 10, 2024

@akrylysov Yep, that's right. Many thanks!

from pogreb.

Some explanation of the internals ? about pogreb HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs