GithubHelp home page GithubHelp logo

Comments (15)

akrylysov avatar akrylysov commented on May 10, 2024 2

@0joshuaolson1 Both index and data files are stored entirely on disk, so nothing special happens when you have a larger-than-memory data set. I tested ~16 GB database on a DigitalOcean droplet with 512 MB of memory and didn't notice any performance issues.

I'll publish a post that explains how pogreb works in a week.

from pogreb.

suprafun avatar suprafun commented on May 10, 2024 1

Thank-you very much for taking the time to write your blog post. It really is helpful for understanding the internals of pogreb.

from pogreb.

0joshuaolson1 avatar 0joshuaolson1 commented on May 10, 2024

@akrylysov Or maybe how the load factor and stuff was found? Was it copied from Go's built-in map implementation?

I'm more curious, though, what happens when the data is larger than memory? This library could be what I'm looking for if I overlook the mutex, but I don't think kernel paging removes the performance penalties of larger-than-RAM random access?

Thanks for any help you can give!

from pogreb.

0joshuaolson1 avatar 0joshuaolson1 commented on May 10, 2024

@akrylysov I look forward to it! Where should I watch for the post?

Possibly good reading:

from pogreb.

akrylysov avatar akrylysov commented on May 10, 2024

@suprafun @0joshuaolson1 check the first draft https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/. Please let me know if you have any questions.

from pogreb.

0joshuaolson1 avatar 0joshuaolson1 commented on May 10, 2024

The post answered most of my questions! (great English, btw)

A few things that weren't addressed:

  • The benchmarks are based on the size of the index file (proportional to the number of keys?). What other limits/dimensions are there? What about the ratio of reads and writes? Size of the db file? What use case(s) did you actually make this for?
  • I saw a lot of slowdown with my own tests at much higher loads then the blog post or readme discuss. Why linear hashing in particular? At what point is hash collisions a bottleneck? Is MurmurHash best for short strings?

I hope you had fun making pogreb. It's your choice whether to consider this issue closed.

from pogreb.

akrylysov avatar akrylysov commented on May 10, 2024

@0joshuaolson1 Thank you!

The benchmarks are based on the size of the index file (proportional to the number of keys?). What other limits/dimensions are there? What about the ratio of reads and writes? Size of the db file?

You mean the load factor benchmark? I measured only the number of get operations per seconds. The write performance wasn't important for my use case - I needed fast random lookups.

What use case(s) did you actually make this for?

Pogreb was designed for systems with infrequent bulk inserts and frequent random lookups. For example you have a large numbers of keys like user_id:keyword, you want to access them as fast as possible, but at the same time you don't want to keep them in memory.

I saw a lot of slowdown with my own tests at much higher loads then the blog post or readme discuss.

How many items did you have? What was the average size of keys and values?

Why linear hashing in particular?

Linear hashing allows growing the index one bucket at a time. Extendible hashing is another dynamic hash table algorithm, but it requires an additional "directory" which means one more I/O operation for lookups. Take a look at http://www.cs.sfu.ca/CourseCentral/354/lxwu/notes/chapter11.pdf.

At what point is hash collisions a bottleneck?

Hash collisions will become a problem if you have more than 232 items because pogreb uses 32-bit hashes.

Is MurmurHash best for short strings?

According to https://softwareengineering.stackexchange.com/a/145633/126609 MurmurHash is pretty good for different kinds of data.

I appreciate your feedback!

from pogreb.

0joshuaolson1 avatar 0joshuaolson1 commented on May 10, 2024

How many items did you have? What was the average size of keys and values?

My program did nothing but insert [4]byte keys and empty ([0]byte) values and periodically measure time intervals. By 200,000,000 keys I seem to remember seeing 256*256 (less than 100,000) inserts taking about a minute but with huge variation.

Hash collisions will become a problem if you have more than 2^32 items

The library doesn't allow more than that right now. I meant, e.g. at 2^29 items is an insert significantly worse than O(1, or size of a bucket)?

from pogreb.

0joshuaolson1 avatar 0joshuaolson1 commented on May 10, 2024

Pogreb was designed for systems

I mean, like what specifically, I'm curious? You can't talk about it? /:)

from pogreb.

akrylysov avatar akrylysov commented on May 10, 2024

My program did nothing but insert [4]byte keys and empty ([0]byte) values and periodically measure time intervals. By 200,000,000 keys I seem to remember seeing 256*256 (less than 100,000) inserts taking about a minute but with huge variation.

Thanks, I'll try to reproduce the issue. What OS/file system did you use?

I meant, e.g. at 2^29 items is an insert significantly worse than O(1, or size of a bucket)?

The amortized insertion complexity is O(1), it doesn't increase with the number of items.

from pogreb.

0joshuaolson1 avatar 0joshuaolson1 commented on May 10, 2024

The amortized insertion complexity is O(1)

On average even with hash collisions? Neat.

Linux (64-bit) with ext4. Now I wish I still had the script around. It went something like

package main
import(
  DB "github.com/akrylysov/pogreb"
  "fmt"
  "time"
)
func main(){
  db, err := DB.open(...) // I think I tried with no sync and every few minutes?
  ...
  var key [4]byte
  var value [0]byte
  keySlice := key[:]
  valueSlice := value[:]
  t1 := time.Now()
  for;;{
    db.Put(keySlice, valueSlice)
    if key[0] < 255{key[0]++; continue}
    key[0] = 0
    if key[1] < 255{key[1]++; continue}
    key[1] = 0

    // these 3 lines could go after `key[2] = 0` instead for less frequent output
    t2 := time.Now()
    fmt.Println(db.Count(), db.FileSize(), t2.Sub(t1))
    t1 = t2

    if key[2] < 255{key[2]++; continue}
    key[2] = 0
    key[3]++ // run out of keys before overflow
  }
}

from pogreb.

0joshuaolson1 avatar 0joshuaolson1 commented on May 10, 2024

So yeah, it's not really a fair test, being single-threaded and write-only.

from pogreb.

YuriyIlyin avatar YuriyIlyin commented on May 10, 2024

Hello, I can't open this link:
https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/

from pogreb.

akrylysov avatar akrylysov commented on May 10, 2024

@YuriyIlyin are you trying to open the link from Russia? if yes, my website (and most of DigitalOcean I believe) was blocked by roskomnadzor last year. You can try to use VPN or webarchive https://web.archive.org/web/20190207005554/https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/. Sorry for the inconvenience.

from pogreb.

YuriyIlyin avatar YuriyIlyin commented on May 10, 2024

@akrylysov Yep, that's right. Many thanks!

from pogreb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.