GithubHelp home page GithubHelp logo

geohot / minikeyvalue Goto Github PK

View Code? Open in Web Editor NEW
2.8K 65.0 287.0 226 KB

A distributed key value store in under 1000 lines. Used in production at comma.ai

License: MIT License

Shell 5.88% Dockerfile 1.44% Go 70.29% Python 22.38%

minikeyvalue's Introduction

minikeyvalue

Tests

Fed up with the complexity of distributed filesystems?

minikeyvalue is a ~1000 line distributed key value store, with support for replication, multiple machines, and multiple drives per machine. Optimized for values between 1MB and 1GB. Inspired by SeaweedFS, but simple. Should scale to billions of files and petabytes of data. Used in production at comma.ai.

A key part of minikeyvalue's simplicity is using stock nginx as the volume server.

Even if this code is crap, the on disk format is super simple! We rely on a filesystem for blob storage and a LevelDB for indexing. The index can be reconstructed with rebuild. Volumes can be added or removed with rebalance.

API

  • GET /key
    • 302 redirect to nginx volume server.
  • PUT /key
    • Blocks. 201 = written, anything else = probably not written.
  • DELETE /key
    • Blocks. 204 = deleted, anything else = probably not deleted.

It also now supports a subset of S3 requests, so some S3 libraries will be somewhat compatible.

Start Volume Servers (default port 3001)

# this is just nginx under the hood
PORT=3001 ./volume /tmp/volume1/ &;
PORT=3002 ./volume /tmp/volume2/ &;
PORT=3003 ./volume /tmp/volume3/ &;

Start Master Server (default port 3000)

./mkv -volumes localhost:3001,localhost:3002,localhost:3003 -db /tmp/indexdb/ server

Usage

# put "bigswag" in key "wehave" (will 403 if it already exists)
curl -v -L -X PUT -d bigswag localhost:3000/wehave

# get key "wehave" (should be "bigswag")
curl -v -L localhost:3000/wehave

# delete key "wehave"
curl -v -L -X DELETE localhost:3000/wehave

# unlink key "wehave", this is a virtual delete
curl -v -L -X UNLINK localhost:3000/wehave

# list keys starting with "we"
curl -v -L localhost:3000/we?list

# list unlinked keys ripe for DELETE
curl -v -L localhost:3000/?unlinked

# put file in key "file.txt"
curl -v -L -X PUT -T /path/to/local/file.txt localhost:3000/file.txt

# get file in key "file.txt"
curl -v -L -o /path/to/local/file.txt localhost:3000/file.txt

./mkv Usage

Usage: ./mkv <server, rebuild, rebalance>

  -db string
        Path to leveldb
  -fallback string
        Fallback server for missing keys
  -port int
        Port for the server to listen on (default 3000)
  -protect
        Force UNLINK before DELETE
  -replicas int
        Amount of replicas to make of the data (default 3)
  -subvolumes int
        Amount of subvolumes, disks per machine (default 10)
  -volumes string
        Volumes to use for storage, comma separated

Rebalancing (to change the amount of volume servers)

# must shut down master first, since LevelDB can only be accessed by one process
./mkv -volumes localhost:3001,localhost:3002,localhost:3003 -db /tmp/indexdb/ rebalance

Rebuilding (to regenerate the LevelDB)

./mkv -volumes localhost:3001,localhost:3002,localhost:3003 -db /tmp/indexdbalt/ rebuild

Performance

# Fetching non-existent key: 116338 req/sec
wrk -t2 -c100 -d10s http://localhost:3000/key

# go run thrasher.go
starting thrasher
10000 write/read/delete in 2.620922675s
thats 3815.40/sec

minikeyvalue's People

Contributors

abriosi avatar abserari avatar cedricbojoly avatar elimisteve avatar geohot avatar gregjhogan avatar n0nz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minikeyvalue's Issues

How does the distributed part work?

When a project mentions 'distributed', I think of things like wesher, Consul, and GlusterFS, which are distributed across multiple hosts.

I'm somewhat confused as to how the distributed nature of this project works? Could you explain this a little better please? How does it handle storing the data? Does each host have to keep a copy of the data, or is it mounted via e.g. NFS? Does the system support fault-tolerance - i.e. if 1 out of 3 hosts is down?

Thank you. A quick question on adding a security layer

Thank you very much for putting this repository together. Reading these lines of code has taught me a lot. Simple, scalable and structured.

How do you add a security layer to this filesystem in case you need to access it from other services which are not on the same network:

  1. Do you create a VPN? (wouldn't this bottleneck the distributed nature of PUTs and GETs since all traffic would have to me routed by the VPN server?)
  2. A reverse proxy (same problem has 1.)
  3. Do you add authentication, such as, Basic Authentication together with https?
  4. Is there a simpler solution I'm missing?

Missing go.mod/go.sum

Hi

When I run ./mkv, I found that there is no go.mod.go.sum file in the project.
Do you want to add these go module file?

And did you consider about use sha256 instead of md5 (hash collision)?

Thanks.

Single Point of Failure

If I'm reading correctly, the whole thing could break if server went down? Can we do anything about that?

Just say Hi from SeaweedFS

Hi, George,

You are one of the guys that I respect. I was watching the youtube video https://www.youtube.com/watch?v=iwcYp-XT7UI where Lex Fridman interviewed you, and you mentioned SeaweedFS for 0.5 seconds. :)

I work on SeaweedFS. And I wanted to learn your approach to file storage. In another coding session, you mentioned SeaweedFS has some bugs. If you still remember the exact bugs, please let me know.

Thanks and keep up the nice work!

Chris

Question about stored file name

I'm not clear why the stored file name is not set to the requested file name
like this:

fmt.Sprintf("/%02x/%02x/%s", mkey[0], mkey[1], key)

So that we can simply download the files as their default file name.

replica 0 write failed: http://localhost:3001/sv07/60/08/L3dlaGF2ZQ==

Hi,
I do this: curl -v -L -X PUT -d bigswag lo:3000/wehave, and server print: replica 0 write failed: http://localhost:3001/sv07/60/08/L3dlaGF2ZQ==

then I go on: curl -v -L localhost:3000/bigswag or curl -v -L localhost:3000/wehave
the result is:

* About to connect() to localhost port 3000 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET /wehave HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:3000
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Length: 0
< Date: Sun, 31 Jan 2021 08:24:09 GMT
<
* Connection #0 to host localhost left intact

I can't get the right value,

Could you tell me what is wrong, please?

do not write multi-part upload data to disk

Writing to /tmp will wear out your OS drive pretty fast pumping hundreds of terabytes into minikeyvalue. Since RAM is good enough for non-multipart uploads, it should be fine for multi-part uploads, too. Maybe we want to suggest using a RAM disk and add expiring partial uploads where the final PUT never happens within some time period?

Data integrity feature?

Just a suggestion for implementing (tell me if it doesn't make sense for the project):

  • File integrity: Append the hash of of the previous index value of data (SHA-256) to the newest index value. This prevents tampering with file contents, because every machine can check to make sure that the appended block hash of some index is equal to the block hash of the (index - 1) contents.

Further clarification with this image:
image

Keep in mind that h_0 represents hash of the newest data || h_1, and h_1 is the hash of data || h_2, and so on. This nested check ensures file integrity.

Mutability requires a re-computation of the hashes, but can only be done with a key.

Feedback? Will this fit in the 1000-line requirement?

Please do Gofmt to *.go files in the project

Please do gofmt to the project (*.go files) because some contributors may use Goland (with File Watchers + Auto gofmt) or VSCode (with go plugins including gofmt so it will auto gofmt when saved the file) so It may caused merge conflict as code formatting is not the same with manual format.

MD5

There is no requirement for a cryptographic hash to use, right?
Let us use a non cryptographic hash then like xxhash or murmur3 as they are much faster.

I am happy to send a PR in case we agree :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.