GithubHelp home page GithubHelp logo

timak's Introduction

timak

timak is a Python library for storing timelines (activity streams) in Riak. It is very alpha and rough around the edges.

It is loosely based on my understanding of Yammer's Streamie.

Example

Timelines are unique sets of objects (unique by the ID you provide) ordered by a datetime (that you also provide). They are bounded, so items fall off the end when a (user defined) capacity is reached.

>>> from datetime import datetime >>> import riak >>> from timak.timelines import Timeline

>>> conn = riak.RiakClient()

>>> tl = Timeline(connection=conn, max_items=3)

>>> # t1.add("key", "unique_id", "score") >>> tl.add("brett:tweets", 1, datetime(2011, 1, 1)) [1] >>> tl.add("brett:tweets", 2, datetime(2011, 1, 2)) [2, 1] >>> tl.add("brett:tweets", 3, datetime(2011, 1, 3)) [3, 2, 1] >>> tl.add("brett:tweets", 4, datetime(2011, 1, 4)) [4, 3, 2] >>> tl.delete("brett:tweets", 2, datetime(2011, 1, 2)) [4, 3]

If you provide a datetime.datetime value to score Timak will automatically convert to a sortable score value.

As you can see the default order is descending by the date you provide, and the object IDs are returned by default. You can also provide an obj_data argument (must be JSON serializable) which will be returned instead.

>>> tl.add("brett:tweets", 5, datetime(2011, 1, 5), obj_data={'body': 'Hello world, this is my first tweet'}) [{'body': 'Hello world, this is my first tweet'}, 4, 3]

Why?

I needed highly available, linearly scalable timelines where readers and writers don't block one another. Because Riak is a Dynamo based system, multiple writers can update a single value and I can merge the conflicts on a later read. I can also add a machine to the cluster for more throughput, and since it's simply fetching denormalized timelines by key it should be incredibly performant.

So what? I could write this in...

PostgreSQL or MySQL

This would be a very simple table in a RDBMS. It could even be boundless (though without some PLSQL hackery large OFFSETS are very expensive). You'd be hitting large indexes instead of fetching values directly by key. The biggest problem is it all has to fit on a single system, unless you manually shard the data (and re-shard if you ever grew out of that size). Plus you'd have to deal with availability using read slaves and failover.

MongoDB

The only possible difference I see from the RDBMSs above is that you could use Mongo's "auto-sharding." If that's your thing, and you trust it, then I wish you the best of luck. You may want to read this.

Redis

You can fake timelines in Redis using a list or sorted set. Like RDBMS you have to handle all of the sharding yourself, re-shard on growth, and use slaves and failover for availability. In addition to these, and even more critical for my use case: all of your timelines would have to fit in RAM. If you have this problem and that kind of money please send me some.

Cassandra

Probably another great fit. You could even store much longer timelines, though I'm not sure what the cost is of doing a SELECT with OFFSET equivalent on the columns in a Cassandra row.

TODO

  1. Add better API with cursors (last seen obj_date?) for pagination.
  2. Built-in Django support for update on post_save and post_delete.
  3. Compress values.

timak's People

Contributors

bretthoerner avatar dcramer avatar benweatherman avatar

Stargazers

Angus H. avatar  avatar

Watchers

 avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.