GithubHelp home page GithubHelp logo

better / convoys Goto Github PK

View Code? Open in Web Editor NEW
255.0 255.0 41.0 10.87 MB

Implementation of statistical models to analyze time lagged conversions

Home Page: https://better.engineering/convoys/

License: MIT License

Python 100.00%

convoys's People

Contributors

adsglass avatar dsto avatar erikbern avatar kning avatar stphnma avatar victorquinn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convoys's Issues

Error in fit() method

Hey!
Awesome library - saved my day!

I've experienced an issue when trying to fit a model:

single.KaplanMeier.fit(B,T)

throws TypeError: fit() missing 1 required positional argument: 'T'.

Can not figure out what's the issue here.

Example dos does not work

Steps to reproduce:

  • clone the repo
  • python -m venv venv and source venv/bin/activate
  • pip install convoys==0.2.1
  • python examples/dob_violations.py

Stacktrace:

File "examples/dob_violations.py", line 50, in
run()
File "examples/dob_violations.py", line 25, in run
convoys.plotting.plot_cohorts(G, B, T, model=model, ci=0.95,
File "/Users/jacopotagliabue/Documents/repos/convoys/venv/lib/python3.8/site-packages/convoys/plotting.py", line 62, in plot_cohorts
m.fit(G, B, T)
File "/Users/jacopotagliabue/Documents/repos/convoys/venv/lib/python3.8/site-packages/convoys/multi.py", line 31, in fit
self.base_model.fit(X, B, T)
File "/Users/jacopotagliabue/Documents/repos/convoys/venv/lib/python3.8/site-packages/convoys/regression.py", line 269, in fit
for i, _ in enumerate(sampler.sample(p0, iterations=n_iterations)):
File "/Users/jacopotagliabue/Documents/repos/convoys/venv/lib/python3.8/site-packages/emcee/ensemble.py", line 379, in sample
self.backend.grow(iterations, state.blobs)
File "/Users/jacopotagliabue/Documents/repos/convoys/venv/lib/python3.8/site-packages/emcee/backends/backend.py", line 175, in grow
a = np.empty((i, self.nwalkers, self.ndim), dtype=self.dtype)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

Numpy type error occurs when setting mcmc==True

I am running into an numpy type error (see below) whenever I set value for the parameters mcmc==True or ci==0.95 in any convoys model. This is consistent across data sources and even occurs when using the example data sets. If I remove these parameters the code runs as expected with no errors. Is this something anyone else has come across? Any help is much appreciated!

TypeError: 'numpy.float64' object cannot be interpreted as an integer
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-160-acdf74229cef> in <module>
----> 1 model_test.fit(X,B,T)

~/opt/anaconda3/lib/python3.8/site-packages/convoys/multi.py in fit(self, G, B, T)
     29         for i, group in enumerate(G):
     30             X[i,group] = 1
---> 31         self.base_model.fit(X, B, T)
     32 
     33     def _get_x(self, group):

~/opt/anaconda3/lib/python3.8/site-packages/convoys/regression.py in fit(self, X, B, T, W)
    267                     ' %d walkers [' % n_walkers,
    268                     progressbar.AdaptiveETA(), ']'])
--> 269             for i, _ in enumerate(sampler.sample(p0, iterations=n_iterations)):
    270                 bar.update(i+1)
    271             result['samples'] = sampler.chain[:, n_burnin:, :] \

~/opt/anaconda3/lib/python3.8/site-packages/emcee/ensemble.py in sample(self, initial_state, log_prob0, rstate0, blobs0, iterations, tune, skip_initial_state_check, thin_by, thin, store, progress, progress_kwargs)
    377             checkpoint_step = thin_by
    378             if store:
--> 379                 self.backend.grow(iterations, state.blobs)
    380 
    381         # Set up a wrapper around the relevant model functions

~/opt/anaconda3/lib/python3.8/site-packages/emcee/backends/backend.py in grow(self, ngrow, blobs)
    173         self._check_blobs(blobs)
    174         i = ngrow - (len(self.chain) - self.iteration)
--> 175         a = np.empty((i, self.nwalkers, self.ndim), dtype=self.dtype)
    176         self.chain = np.concatenate((self.chain, a), axis=0)
    177         a = np.empty((i, self.nwalkers), dtype=self.dtype)

TypeError: 'numpy.float64' object cannot be interpreted as an integer

TypeError when run w/ emcee 3.0.0

Typeerror __init__() got an unexpected keyword argument 'dim' convoys is thrown when attempting to run with emcee 3.0.0.

The code in question is in regression.py (~line 248):

sampler = emcee.EnsembleSampler(
    nwalkers=n_walkers,
    dim=dim,
    lnpostfn = generalized_gamma_loss,
    args=args,
)  

I was able to fix it by changing dim to ndim and lnpostfn to log_prob_fn - no other changes were required.

Use for Real-Time Scoring

I'm trying to do some modeling where I have a large time lag for conversion, and I am interested in getting updated single observation likelihood of conversion predictions over the lifetime of an observation (at no specified interval, just when someone is interested and wants to look). Intuitively I'd expect the likelihood of conversion to be the highest for the first couple of days/weeks and past a certain point it essentially isn't going to convert, it's just too old.

I was looking at Cox Proportional Hazards models when I came across Convoys and it seemed to address my problem more directly, though many of the examples involve groups and aggregate conversion rates. I know there are regression classes and I was playing with those:

from convoys import regression, utils

unit, groups, (G, B, T) = utils.get_arrays(
    survival_df, 
    created='date_input', 
    converted='conversion_date', 
    unit='days', 
    features=[i for i in features if i not in ['date_input', 'conversion_date']]
)
gamma_model = regression.GeneralizedGamma(flavor='linear', ci=True)
gamma_model.fit(G,B,T)
gamma_model.predict([1 2 3], 30, ci=True)

but I was curious if I'm thinking about the interpretation of the output for real-time scoring correctly (i.e., an observation is to be scored at time t and the result is the likelihood of conversion at that point assuming the observation has not converted at this point). Similarly, if my features are time-dependent (e.g., may be null at creation, but I learn more about them over time), can that be factored in (after more thorough reading of the docs, I've seen this in future directions using RNN, do you have any papers you can point me at)?

Thank you in advance.

Return posterior distribution from cdf function

For multi models, the user can pass ci=True and cdf() returns a tuple with the mean, lower bound, and upper bound of the confidence interval.

In use cases where one wants to compare across groups, returning the posterior distribution itself is more desirable than summary metrics.

I'd like to add a parameter for the user to specify that they want the posterior distributions for each group. I'll link to the PR below.

Decoupling visualization from models

Hey there! We, the Buffer data team, recently discovered this awesome package, and we're starting to use it in different analysis.

We're used to doing most of the plotting with R. I've started to work on getting the data back from the Matplotlib figure but seems like a hack and was wondering if you've thought about decoupling the plotting from the modelling.

Prophet, from Facebook, does a great job at that and it'll return a DataFrame with the required data to plot. The same prophet library will also have a default .plot function that uses Matplotlib. That helps users use other plotting frameworks.

I'm happy to help with the coding if I can figure out how to better do the decoupling. Let me know if you have any questions too. ๐Ÿ˜„

Thanks for open sourcing such a helpful library!

PS: We've also found that using a large group size will result in a confusing legend in the final plot. This one can be probably fixed using the proper Matplotlib arguments though. This example shows weeks in one of our plots:

2020-03-24_16:28:35_183x348

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.