GithubHelp home page GithubHelp logo

isabella232 / flockdb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from twitter-archive/flockdb

0.0 0.0 0.0 5.27 MB

A distributed, fault-tolerant graph database

License: Other

Shell 1.68% Ruby 14.25% Scala 82.63% Thrift 1.44%

flockdb's Introduction

STATUS

Twitter is no longer maintaining this project or responding to issues or PRs.

FlockDB

FlockDB is a distributed graph database for storing adjancency lists, with goals of supporting:

  • a high rate of add/update/remove operations
  • potientially complex set arithmetic queries
  • paging through query result sets containing millions of entries
  • ability to "archive" and later restore archived edges
  • horizontal scaling including replication
  • online data migration

Non-goals include:

  • multi-hop queries (or graph-walking queries)
  • automatic shard migrations

FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as web-sites.

Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster stores 13+ billion edges and sustains peak traffic of 20k writes/second and 100k reads/second.

It does what?

If, for example, you're storing a social graph (user A follows user B), and it's not necessarily symmetrical (A can follow B without B following A), then FlockDB can store that relationship as an edge: node A points to node B. It stores this edge with a sort position, and in both directions, so that it can answer the question "Who follows A?" as well as "Whom is A following?"

This is called a directed graph. (Technically, FlockDB stores the adjacency lists of a directed graph.) Each edge has a 64-bit source ID, a 64-bit destination ID, a state (normal, removed, archived), and a 32-bit position used for sorting. The edges are stored in both a forward and backward direction, meaning that an edge can be queried based on either the source or destination ID.

For example, if node 134 points to node 90, and its sort position is 5, then there are two rows written into the backing store:

forward: 134 -> 90 at position 5
backward: 90 <- 134 at position 5

If you're storing a social graph, the graph might be called "following", and you might use the current time as the position, so that a listing of followers is in recency order. In that case, if user 134 is Nick, and user 90 is Robey, then FlockDB can store:

forward: Nick follows Robey at 9:54 today
backward: Robey is followed by Nick at 9:54 today

The (source, destination) must be unique: only one edge can point from node A to node B, but the position and state may be modified at any time. Position is used only for sorting the results of queries, and state is used to mark edges that have been removed or archived (placed into cold sleep).

Building

In theory, building is as simple as

$ sbt clean update package-dist

but there are some pre-requisites. You need:

  • java 1.6
  • sbt 0.7.4
  • thrift 0.5.0

If you haven't used sbt before, this page has a quick setup: http://code.google.com/p/simple-build-tool/wiki/Setup. My ~/bin/sbt looks like this:

#!/bin/bash
java -server -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m -Xmx1024m -jar `dirname $0`/sbt-launch-0.7.4.jar "$@"

Apache Thrift 0.5.0 is pre-requisite for building java stubs of the thrift IDL. It can't be installed via jar, so you'll need to install it separately before you build. It can be found on the apache thrift site: http://thrift.apache.org/. You can find the download for 0.5.0 here: http://archive.apache.org/dist/incubator/thrift/0.5.0-incubating/.

In addition, the tests require a local mysql instance to be running, and for DB_USERNAME and DB_PASSWORD env vars to contain login info for it. You can skip the tests if you want (but you should feel a pang of guilt):

$ NO_TESTS=1 sbt package-dist

Running

Check out the demo for instructions on how to start up a local development instance of FlockDB. It also shows how to add edges, query them, etc.

Community

Contributors

  • Nick Kallen @nk
  • Robey Pointer @robey
  • John Kalucki @jkalucki
  • Ed Ceaser @asdf

flockdb's People

Contributors

freels avatar rcohen avatar hyungoo avatar fizx avatar alanliang avatar boazavital avatar bmatheny avatar caniszczyk avatar lenn0x avatar grimreaper avatar ksauzz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.