trials

Tiny Bayesian A/B testing library

Installation

pip install git+git://github.com/bogdan-kulynych/trials.git@master

pip might not install all the system packages needed for scipy. To install them on Debian:

sudo apt-get install libatlas-dev libatlas-base-dev liblapack-dev gfortran

Run the tests:

nosetests trials/tests

Usage

Import package

from trials import Trials

Start a split test with Bernoulli (binary) observations

test = Trials(['A', 'B', 'C'])

Observe successes and failures

test.update({
    'A': (50, 10), # 50 successes, 10 failures, total 60
    'B': (75, 15), # 75 successes, 15 failures, total 90
    'C': (20, 15)  # 20 successes, 15 failures, total 35
})

Evaluate some statistics

dominances = test.evaluate('dominance', control='A')         # Dominance probabilities P(X > A)
lifts = test.evaluate('expected lift', control='A')          # Expected lifts E[(X-A)/A]
intervals = test.evaluate('lift CI', control='A', level=95)  # Lifts' 95%-credible intervals

Available statistics for Bernoulli observation variations: expected posterior, posterior CI, expected lift, lift CI, empirical lift, dominance, z-test dominance.

Print or visualize results

for variation in ['B', 'C']:
    print('Variation {name}:'.format(name=variation))
    print('* E[lift] = {value:.2%}'.format(value=lifts[variation]))
    print('* P({lower:.2%} < lift < {upper:.2%}) = 95%' \
        .format(lower=intervals[variation][0], upper=intervals[variation][2]))
    print('* P({name} > {control}) = {value:.2%}' \
        .format(name=variation, control='A', value=dominances[variation]))

Examine the output:

Variation B:
* E[lift] = 0.22%                       # expected lift
* P(-13.47% < lift < 17.31%) = 95%      # lift CI
* P(B > A) = 49.27%                     # dominance
Variation C:
* E[lift] = -31.22%
* P(-51.33% < lift < -9.21%) = 95%
* P(C > A) = 0.25%

Interpreting and analyzing results

As per the output above there's 50% chance that variation B is better than A (dominance). Most likely it is better by about 0.2% (expected lift), but there's 95% chance that real lift is anywhere betwen -13% to 17% (lift CI). You need more data to know if B is better or worse for sure.

There's 100% - 0.25% = 99.75% chance that variation C is worse than A. Most likely it is worse by about 31%, and there's 95% chance that real lift falls betwen -51% to -9%. The data was sufficient to tell that this variation is almost certainly inferior to both A and B. However, if this 99.75% chance still doesn't convince you, you need more data.

Theory

Explanation of mathematics behind and usage guide are coming soon as a blog post.

Meanwhile, see the notebook for comparison of Bayesian lift (blue) and empirical lift (green) errors in a theoretical benchmark with equal sample sizes. Bayesian approach is a little better at predicting the lift, but no miracles here. Bayesian p-values and frequentist (z-test) p-values yield almost identical results.

griffinqiu / trials Goto Github PK

trials's Introduction

trials

Installation

Usage

Interpreting and analyzing results

Theory

trials's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs