mozilla / kpiggybank Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jedp/kpiggybank

1.0 13.0 2.0 328 KB

INACTIVE - KPI interaction data store

License: Other

JavaScript 88.76% CSS 10.28% Shell 0.96%

inactive unmaintained

kpiggybank's Introduction

KPIggyBank

A backend store for Key Performance Indicator (KPI) data.

Node.JS + CouchDB.

Piggybanks have a somewhat peculiar protocol in that they are write-many read-once. You can put as many blobs (coins, bills, gift cards, it doesn't care because it's document-centric) as you want into piggybank (within allocated storage of course), but the extraction of resources from piggybank (coins, etc) is a one-time destructive operation (what old-timers referred to as "smashing it on the floor").

Requirements

An accessible CouchDB server for persistence.
Node.JS (0.6.17 or greater)

Installation

git clone: https://github.com/mozilla/kpiggybank
npm install

Testing

npm test

The test suite simulates throwing a thousand login sequences at the KPI store.

It is anticipated that, with 1 million users, BrowserID will generate some 100 sign-in activities per second. The test suite requires that kpiggy bank can completely store and retrieve records at a rate at least twice as fast as this.

If you want to experiment with the server without having couch installed, use the in-memory data store:

DB_BACKEND=memory node lib/server.js

Note that the in-memory data is not saved anywhere. It's just for testing.

Running

For configuration, the file env.sh.dist can be copied to env.sh and edited. kpiggybank will look for the following environment variables:

DB_BACKEND: One of "couchdb", "memory", "dummy". Default "couchdb".
DB_HOST: IP addr of couch server. Default "127.0.0.1".
DB_PORT: Port number of couch server. Default "5984".
DB_NAME: Name of the database. Default "bid_kpi".
DB_USER: Username for database if required. Default "kpiggybank".
DB_PASS: Password for database if required. Default "kpiggybank".
HOST: "127.0.0.1"
PORT: Port for the kpiggybank server. Default "3000".
MODE: Governs how verbose logging should be. Set to "prod" for quieter logging. Default "dev".

Start the server like so:

npm start

Or like so:

node lib/server.js

Or change your env configuration with something like:

DB_NAME=bid_kpi_test npm start

When running kpiggybank for the first time on a given database, it will ensure that the db exists, creating it if it doesn't.

Please note that the database named bid_kpi_test is deleted as part of the test suite.

Running on AWS

You can use in-tree awsbox scripts to deploy kpiggybank on Amazon's cloud infrastructure.

This process is now just like the process of deploying browserid on AWS, see: https://github.com/mozilla/browserid/blob/dev/docs/AWS_DEPLOYMENT.md

The one modification is that kpiggybank's deploy script ignores mail setup.

JS API

Methods

api.saveData(blob [, callback]) - save a hopefully valid event blob
api.fetchRange([ options, ] callback) - fetch some or all events
api.count(callback) - get number of records in DB
api.followChanges() - connect to event stream

Events

change - a newly-arrived json blob of delicious KPI data
error - oh noes

Examples

The HTTP API calls are wrapped for convenience in a JS module. You can of course call the HTTP methods directly if you want. Example of using the JS API:

    var API = require("lib/api");
    var api = new API(server_host, server_port);
    api.saveData(yourblob, yourcallback);

The callback is optional.

To query a range:

    var options = {start: 1, end: 42}; // optional 
    api.fetchRange(options, callback);

options are ... optional, so you can get all records like so:

    api.fetchRange(callback);

Subscribe to changes stream. The changes stream is an event emitter. Use like so:

    api.followChanges()  // now subscribed

    api.on('change', function(change) {
        // do something visually stunning
    });

HTTP API

Post Data

Post a blob of data to /wsapi/interaction_data.
The post data should contain a JSON object following the example here: https://wiki.mozilla.org/Privacy/Reviews/KPI_Backend#Example_data

In particular, the timestamp field is required, and should be a unix timestamp (seconds since the epoch); not an ISO date string.

url: /wsapi/interaction_data
method: POST
required param: {data: <your data blob>}

Get Data

Retrieve a range of records; returns a JSON string.

url: /wsapi/interaction_data?start=<date-start>&end=<date-end>
method: GET

Count Records

Retrieve a count of the number of records; returns a JSON encoded number.

url: /wsapi/interaction_data/count
method: GET

License

All source code here is available under the MPL 2.0 license, unless otherwise indicated.

kpiggybank's People

Contributors

Stargazers

Watchers

Forkers

kparlante

kpiggybank's Issues

readableDate is hosed.

somehow we've regressed here, readable date is:

  "readableDate": "undefined undefined-PM-11:30:00"

This is looking at recent kpi data while ssh'd into our prod KPI instances.

Move log files to appropriate place on aws deployments

Do it as browserid does it (var_path)

https mixed content warning on https://kpiggybank-stage.personatest.org

Probably due to <a href="http://github.com/jedp/kpiggybank" id="fork"><img src="/img/github.png"></a>

Configure couchdb in awsbox scripts

Configure couchdb in awsbox scripts -- automate this instead of configuring by hand

Automate AWS security groups and adding of larger volumes.

Using AWS security groups instead of adding a node middleware ip whitelist

Document couchdb versions

In README or elsewhere, document version or versions of CouchDB.

Probably version we're targeting in production is enough.

Bonus points for Ubuntu and RHCE package names.

Review couchdb settings for "dev deployment"

There appears to be a memory leak with the collector

From jedp#8, logged by @jrgm

In the process of driving some high load for interaction_data against the collector, it was noticed that RSS was steadily growing. I suspect there is a leak in that code path, but haven't looked further. And it wasn't a careful experiment as I was playing with kill and iptables tricks before it was noticed, so I may have set the collector on a bad code path (but still...).

Anyways, this could use some investigation in a clone of the kpiggybank-stage instance.

kpiggybank-stage slowed in throughput during a long loadtest on aws browserid stage environment.

About 2013-04-22T12:00, the throughput of POSTs recorded in /home/app/code/kpiggybank.log dropped from ~15 req/sec to ~2 req/sec. This then caused the browserid processes to begin buffering up the KPI blobs in memory, so RSS on that process grew from the normal ~60MB to ~200MB, and would have kept growing until max heap size for v8 was exceeded and the browserid process would have crashed.

Not sure how to debug this further, as the logs say very little.

I restarted kpiggybank-stage, and the backlog of requests on the browserid process eventually cleared.

Wiki changes

FYI: The following changes were made to this repository's wiki:

defacing spam has been removed
the wiki has been disabled, as it was not used

These were made as the result of a recent automated defacement of publically writeable wikis.

Revisit architecture for sending data from browserid to kpiggybank

The original plan was to use http (when everything lived behind mozilla firewall).

Moving to AWS, the current plan is to use https.

It has been suggested we move to use UDP. Another thought was to write to the filesystem and have a separate process send the data to kpiggybank.

Don't block on this, but take another look at it after we get up and running.

Automate DNS/SSL setup

Automate DNS/SSL setup, similar to browserid servers

Use convict for config

Should be able to start server up easily if it has been stopped

We wanted to stop the server during load to see what would happen to browserid, but can't find a way to bring up kpiggybank again cleanly after forever stopall. Perhaps we want to invoke post-update.js with the right context.

JSON.parse failure for KPI blob should be 400 Bad Request, not 500 Internal Server Error

Originally filed in jedp#7

Maybe a nit but 500 errors should be reserved for "Oh shit. I have no idea what's happening". Fending off bad JSON input should be a 400 error.

See around https://github.com/mozilla/kpiggybank/blob/master/lib/server.js#L119

For production instance, implement a watchdog killer if 127.0.0.1:10000 on kpi collector is non-responsive

This is information we can afford to lose. But we had a hang of the stage collector, and later it occurred to me that this would have had negative implications for the webheads (we have a fix to address that, but still, if this thing hangs again, I'd prefer to just restart it).

Need nice ways to manage data periodically (backup and/or purge)

Similar to jedp#4 only the plan is to not store data beyond 6 months.