GithubHelp home page GithubHelp logo

forgetsy's Introduction

Forgetsy

Forgetsy is a scalable trending library designed to track temporal trends in non-stationary categorical distributions. It uses forget-table style data structures which decay observations over time. Using a ratio of two such sets decaying over different lifetimes, it picks up on changes to recent dynamics in your observations, whilst forgetting historical data responsibly. The technique is closely related to exponential moving average (EMA) ratios used for detecting trends in financial data.

Trends are encapsulated by a construct named Delta. A Delta consists of two sets of counters, each of which implements exponential time decay of the form:

equation

Where the inverse of the decay rate (lambda) is the mean lifetime of an observation in the set. By normalising such a set by a set with half the decay rate, we obtain a trending score for each category in a distribution. This score expresses the change in the rate of observations of a category over the lifetime of the set, as a proportion in the range 0..1.

Forgetsy removes the need for manually sliding time windows or explicitly maintaining rolling counts, as observations naturally decay away over time. It's designed for heavy writes and sparse reads, as it implements decay at read time.

Each set is implemented as a redis sorted set, and keys are scrubbed when a count is decayed to near zero, providing storage efficiency.

Forgetsy handles distributions with upto around 106 active categories, receiving hundreds of writes per second, without much fuss. Its scalability is dependent on your redis deployment.

It requires redis to be running on localhost at the default port (6379).

Installation

Add this to your Gemfile:

gem 'forgetsy', github: 'cavvia/forgetsy', branch: 'v0.2.7'

Configuration

You may want to change the Redis host and port Forgetsy connects to, or set various other options at startup.

Forgetsy has a redis setter which can be given a string or a Redis object. This means if you're already using Redis in your app, Forgetsy can re-use the existing connection.

String: Forgetsy.redis = 'localhost:6379'

Redis: Forgetsy.redis = Redis.current

Usage

Take, for example, a social network in which users can follow each other. You want to track trending users. You construct a one week delta, to capture trends in your follows data over one week periods:

follows_delta = Forgetsy::Delta.create('user_follows', t: 1.week, replay: true)

The delta consists of two sets of counters indexed by category identifiers. In this example, the identifiers will be user ids. One set decays over the mean lifetime specified by t, and another set decays over double the lifetime.

You can now add observations to the delta, in the form of follow events. Each time a user follows another, you increment the followed user id. We can also do this retrospectively, since we have passed the replay option to the factory method above:

follows_delta = Forgetsy::Delta.fetch('user_follows')
follows_delta.incr('UserFoo', date: 2.weeks.ago)
follows_delta.incr('UserBar', date: 10.days.ago)
follows_delta.incr('UserBar', date: 1.week.ago)
follows_delta.incr('UserFoo', date: 1.day.ago)
follows_delta.incr('UserFoo')

Providing an explicit date is useful if you are processing data asynchronously. You can also use incr_by to increment a counter in batches.

You can now consult your follows delta to find your top trending users:

puts follows_delta.fetch

Will print:

{ 'UserFoo' => 0.667, 'UserBar' => 0.500 }

Each user is given a dimensionless score in the range 0..1 corresponding to the normalised follows delta over the time period. This expresses the proportion of follows gained by the user over the last week compared to double that lifetime.

Optionally fetch the top n users, or an individual user's trending score:

follows_delta.fetch(n: 20)
follows_delta.fetch(bin: 'UserFoo')

Contributing

Just fork the repo and submit a pull request.

Copyright & License

MIT license. See LICENSE for details.

(c) 2013 Art.sy Inc.

forgetsy's People

Contributors

cavvia avatar dinomite avatar dlackty avatar forest avatar mmozuras avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.