GithubHelp home page GithubHelp logo

pymc-devs / pymc-examples Goto Github PK

View Code? Open in Web Editor NEW
246.0 23.0 189.0 685.97 MB

Examples of PyMC models, including a library of Jupyter notebooks.

Home Page: https://www.pymc.io/projects/examples/en/latest/

License: MIT License

Python 53.89% HTML 29.32% Jupyter Notebook 16.79%
closember

pymc-examples's People

Contributors

aloctavodia avatar berylkanali avatar canyon289 avatar chiral-carbon avatar ckrapu avatar cloudchaoszero avatar cluhmann avatar danhphan avatar daniel-saunders-phil avatar drbenvincent avatar erik-werner avatar fonnesbeck avatar harshvirsandhu avatar johngoertz avatar juanitorduz avatar kavyajaiswal avatar kuvychko avatar larryshamalama avatar ltoniazzi avatar marcogorelli avatar maresb avatar michaelosthege avatar mjhajharia avatar nathanielf avatar olgadk7 avatar oriolabril avatar percevalve avatar reshamas avatar ricardov94 avatar twiecki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pymc-examples's Issues

GLM logistic

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-logistic.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

PyMC3 Examples Known Errors & Notes (March 2021)

Summary

The following are existing issues & suggestions in the pymc3-examples repo after going through an iteration of renaming plot dependencies from pm. to arviz.

Note: This is similar to the pre-existing #34 issue.

Installs

  • Pymc3: v3.11
  • theano-pymc (aesara): v 1.1.2

Issues

General Issues

  • Have param returninference =True

File-specific Issues

  • examples/pymc3_howto/lasso_block_update.ipynb
Issue

"return_inferencedata should be manually set, either to True or to False, but one of the two to avoid the warning.

If False, an inferencedata should be created within the model context, and passed to arviz. This will

  1. avoid the warning of conversion without the model context and
  2. push forward arviz best practices, it is probably not too relevant here but conversion may not be cheap for some models because it requires computing all the pointwise log likelihood values. az.plot_xyz(trace) works because ArviZ internally converts the data to inferencedata, then plots."
  • examples/pymc3_howto/data_container.ipynb
Issue We should have "keep_size=True to avoid the warning in the cell below, also because in a future ArviZ release the behaviour of hdi will change for 2d arrays (not for 1d or 3d+ arrays), so using 3d arrays with chain, draw, *shape should be used."
  • examples/pymc3_howto/sampling_conjugate_step.ipynb
Code
traces = []
models = []
names = ["Partial conjugate sampling", "Full NUTS"]

for use_conjugate in [True, False]:
    with pm.Model() as model:
        tau = pm.Exponential("tau", lam=1, testval=1.0)
        alpha = pm.Deterministic("alpha", tau * np.ones([N, J]))
        p = pm.Dirichlet("p", a=alpha)

        if use_conjugate:
            # If we use the conjugate sampling, we don't need to define the likelihood
            # as it's already taken into account in our custom step method
            step = [ConjugateStep(p.transformed, counts, tau)]

        else:
            x = pm.Multinomial("x", n=ncounts, p=p, observed=counts)
            step = []

        trace = pm.sample(step=step, chains=2, cores=1, return_inferencedata=True)
        traces.append(trace)

    assert all(az.summary(trace)["r_hat"] < 1.1)
    models.append(model)
Issue

"Since we are not storing the summary dataframe anywhere and we only want the rhat, we should use rhat instead. The assertion can be done with:

assert (az.rhat(trace).to_array() < 1.1).all()"

  • examples/ode_models/ODE_with_manual_gradients.ipynb
Code

Similar to PR 43, for line 33 at variable trace =

    Y_obs = pm.Lognormal("Y_obs", mu=pm.math.log(forward), sigma=sigma, observed=Y)

    trace = pm.sample(1500, init="jitter+adapt_diag", cores=1)
trace["diverging"].sum()

I changed init from adapt_diag to jitter+adapt_diag & added param cores=1.

Issue I get a sampling error when using adapt_diagor other adapter types....unsure why.

The error:

SamplingError: Bad initial energy

Seen here
Screen Shot 2021-03-14 at 6 56 40 PM

  • examples/generalized_linear_models/GLM-model-selection.ipynb

Issue 1

Code
ax = (
    dfll["log_likelihood"]
    .unstack()
    .plot.bar(subplots=True, layout=(1, 2), figsize=(12, 6), sharex=True)
)

ax[0, 0].set_xticks(range(5))
ax[0, 0].set_xticklabels(["k1", "k2", "k3", "k4", "k5"])
ax[0, 0].set_xlim(-0.25, 4.25);
Issue

One dependency errors out, this making remainder of notebook not run.
Errors on missing sd_log__, and therefore cannot run entire notebook due to dependency.

Particularly

GLM-model-selection KeyError: 'var names: "['sd_log__'] are not present" in dataset'

More details here

Issue 2

Code
dfll = pd.DataFrame(index=["k1", "k2", "k3", "k4", "k5"], columns=["lin", "quad"])
dfll.index.name = "model"

for nm in dfll.index:
    dfll.loc[nm, "lin"] = -models_lin[nm].logp(
        az.summary(traces_lin[nm], traces_lin[nm].varnames)["mean"].to_dict()
    )

    dfll.loc[nm, "quad"] = -models_quad[nm].logp(
        az.summary(traces_quad[nm], traces_quad[nm].varnames)["mean"].to_dict()
    )

dfll = pd.melt(dfll.reset_index(), id_vars=["model"], var_name="poly", value_name="log_likelihood")
dfll.index = pd.MultiIndex.from_frame(dfll[["model", "poly"]])
Issue I'm not familiar with this notebook and find the pandas stuff happening above quite confusing. The model selection should be simplified with newer ArviZ features. Having to work with straces directly is not something we should need to teach 😬

GLM in PyMC3: Out-Of-Sample Predictions Example

Hi 👋 ! I am very interested in the glm module of PyMC3, specially using it for prediction problems. There is an active thread about this topic: “Out of sample” predictions with the GLM sub-module. I wrote a small post summarising the potential solutions : https://juanitorduz.github.io/glm_pymc3/ Is this something you would be interested in having on the examples? If so, is it ok as it is now or do you want me to reduce it a just focus on the out-of-sample section. I am happy to contribute to the documentation 🚀 .

log gaussian cox process

File: https://nbviewer.jupyter.org/github/pymc-devs/pymc-examples/blob/main/examples/case_studies/log-gaussian-cox-process.ipynb
Reviewers: @ckrapu

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

  • Use xarray and from_pymc3_predictions to filter nans and slice/reduce intensity_samples

Notes

Exotic dependencies

None

Computing requirements

Model takes roughly 5 mins to sample.

GP sparse

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-SparseApprox.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

bayes factor

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/Bayes_factor.ipynb
Reviewers: @aloctavodia

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP-MeansAndCovs.ipynb throws LinAlgError

Cell 6 contains:

lengthscale = 0.2
eta = 2.0
cov = eta ** 2 * pm.gp.cov.ExpQuad(1, lengthscale)

X = np.linspace(0, 2, 200)[:, None]
K = cov(X).eval()

plt.figure(figsize=(14, 4))

plt.plot(X, pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K).random(size=3).T)
plt.title("Samples from the GP prior")
plt.ylabel("y")
plt.xlabel("X");

which throws

ValueError: input operand has more dimensions than allowed by the axis remapping

As per #11 , I can fix this by specifying the shape of the MvNormal:

lengthscale = 0.2
eta = 2.0
cov = eta ** 2 * pm.gp.cov.ExpQuad(1, lengthscale)

X = np.linspace(0, 2, 200)[:, None]
K = cov(X).eval()

plt.figure(figsize=(14, 4))

plt.plot(X, pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K, shape=K.shape[0]).random(size=3).T)
plt.title("Samples from the GP prior")
plt.ylabel("y")
plt.xlabel("X");

but then I get

~/pymc3-dev/pymc3/distributions/multivariate.py in random(self, point, size)
    277 
    278         if self._cov_type == "cov":
--> 279             chol = np.linalg.cholesky(param)
    280         elif self._cov_type == "chol":
    281             chol = param

<__array_function__ internals> in cholesky(*args, **kwargs)

~/miniconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/numpy/linalg/linalg.py in cholesky(a)
    762     t, result_t = _commonType(a)
    763     signature = 'D->D' if isComplexType(t) else 'd->d'
--> 764     r = gufunc(a, signature=signature, extobj=extobj)
    765     return wrap(r.astype(result_t, copy=False))
    766 

~/miniconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_nonposdef(err, flag)
     89 
     90 def _raise_linalgerror_nonposdef(err, flag):
---> 91     raise LinAlgError("Matrix is not positive definite")
     92 
     93 def _raise_linalgerror_eigenvalues_nonconvergence(err, flag):

LinAlgError: Matrix is not positive definite

cc @Sayam753 any suggestions?

model comparison

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/model_comparison.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP Mauna Loa (continuation)

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-MaunaLoa2.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

hierarchical partial pooling

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/hierarchical_partial_pooling.ipynb
Reviewers: @OriolAbril

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

  • Consider using new ArviZ labeling capabilities in the plot_forest to avoid having to manually set the yticklabels. We'd instead use coords for that.
    • Note labeller feature in ArviZ has not been released yet, this requires waiting for this.

Notes

Exotic dependencies

None

Computing power

Model runs in roughly a minute

Conditional Autoregressive (CAR) model

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/conditional-autoregressive-model.ipynb
Reviewers: @junpenglao

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

  • There seems to be a markdown formatting issue in section "PyMC3 implementation using Sparse Matrix"

Notes

Exotic dependencies

None

Computing requirements

Most models run in less than 2 minutes, one seems to take ~5mins

Multilevel modeling

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/multilevel_modeling.ipynb

This notebook endured a significant rewrite not too long ago, and is one of the go to examples on ArviZ+xarray usage. This is both its main pro and its own downfall. There are some nits to keep it up to its current golden standard.

Required changes

The notebook should be reexecuted with latest pymc3 and have the warning filter removed (there should be no warnings, if there are they should be fixed). Moreover, rerunning with latest ArviZ will generate much more aesthetically pleasing forestplots. legend=True should be used for forestplots with multiple models.

Changes for discussion

The plot forest on cell 42 should use the approach described in arviz-devs/arviz#1627 to avoid showing the 1 dots that correspond to the elements in the diagonal.

⚠️ These changes need to wait for the next ArviZ release, they require ArviZ>0.11.2, 0.11.2 is not enough.

GP smoothing

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-smoothing.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GLM model selection

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-model-selection.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP Student-t process

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-TProcess.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GLM poisson

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-poisson-regression.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

  • Use numpy generator
  • ⚠️ code cells 15 and 19 are plain wrong, we are doing np.exp(np.mean()) instead of np.mean(np.exp()).

ArviZ related

Notes

Exotic dependencies

None

Computing requirements

Models sample in less than a minute

Standardize and Update Notebook Gallery

[BEGINNER-FRIENDLY]
Our notebooks gallery is quite big, so:

  • Many of them use an old style and could use an updating with ArviZ color style instead (not listed).
  • Many notebooks show FutureWarnings that should be addressed (not listed).
  • Some notebooks fail to run because they use outdated third-party APIs or exotic packages (listed below).

So this issue is here to signal it would be nice if people want to take some time updating and re-running the notebooks below with PyMC 3.9, according to this style page 🎉
Do it in small batches though, to not get bored and enjoy it 😉 Thanks a lot in advance for your help and don't hesitate to ask your questions below!
PyMCheers 🖖

Here is an up-to-date list of the most outdated and problematic NBs (those not listed here should be checked for style and updating accordingly):

Exotic

  • blackbox_external_likelihood needs Cython
  • convolutional_vae_keras_advi needs Keras

Other Issues

  • GLM theano.gof.fg.MissingInputError
  • GLM-poisson-regression KeyError: "['hpd_2.5', 'hpd_97.5'] not in index"
  • GLM-negative-binomial-regression KeyError: "['hpd_97.5', 'hpd_2.5'] not in index"
  • GLM-model-selection KeyError: 'var names: "['sd_log__'] are not present" in dataset'
  • GP-MaunaLoa2 ValueError: Units 'M' and 'Y' are no longer supported
  • GP-MaunaLoa ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.
  • GP-TProcess runs but has way too many divergences; timed out after 14_000 seconds
  • PyMC3_tips_and_heuristic KeyError: Rhat
  • dependent_density_regression AttributeError: 'DataFrame' object has no attribute 'range'
  • hierarchical_partial_pooling not enough values to unpack (expected 2, got 1)
  • lda-advi-aevb TypeError: init() got an unexpected keyword argument 'n_topics'
  • marginalized_gaussian_mixture_model AttributeError: 'Rectangle' object has no property 'normed'
  • GLM-logistic AttributeError: 'Rectangle' object has no property 'normed'
  • model_averaging FileNotFoundError: File ../data/milk.csv does not exist
  • model_comparison AttributeError: 'ELPDData' object has no attribute 'WAIC'
  • multilevel_modeling More chains (4000) than draws (2) and some plots may be wrong
  • profiling has a shape error
  • rugby_analytics ValueError: not enough values to unpack (expected 2, got 1)
  • sampling_callback has a shape error (looks like a threading problem)
  • survival_analysis cell 11 raises a NotImplementedError in numpy/pandas
  • weibull_aft AttributeError: module 'statsmodels' has no attribute 'datasets'
  • ODE_with_manual_gradients ValueError: array must not contain infs or NaNs

putting workflow

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/putting_workflow.ipynb

Required changes

See tracker and its description

notes: consider creating a logit_idata and a logit_trace = logit_idata.posterior (and the same for other models). I think this will minimize the need to modify the code. That being said, I would not expect updating it to be a walk in the park. I'd recommend working on this only if you are already familiar and more or less comfortable with xarray.

GP means and covs

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-MeansAndCovs.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP Mauna Loa

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-MaunaLoa.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP circular

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Circular.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

model averaging

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/model_averaging.ipynb
Reviewers: @aloctavodia

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

prior and posterior predictive checks

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/posterior_predictive.ipynb
Reviewers: @AlexAndorra @lucianopaz

Note: Please refer to notebook updates overview for more details on some of the bullet points below

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

  • Use new numpy random generator (see updates overview)

ArviZ related

  • Use InferenceData
  • Try to take advantage of matplotlib and xarray plotting to avoid unnecessary plotting loops
  • update hpd to hdi

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

  • Show advanced uses of sample_posterior_predictive? Or should that be another more specific notebook not focused on model criticism but purely on pymc3 usage? (i.e. a howto instead of a diagnostics_and_criticism notebook).

Notes

Exotic dependencies

None

Computing requirements

All models seem to sample in under a minute

Blackbox likelihood

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/blackbox_external_likelihood.ipynb

Required changes

The notebook needs to be modified to use ArviZ+InferenceData at the very least, it's hard to know what extra work will be needed as it has not been executed for a while, cython usage is tricky to get right. Note that it uses DensityDist is strange ways, using dicts as observed that contain freeRVs and will need to use density_dist_obs=False as idata_kwargs (ref)

Changes to discuss

I don't think this notebook can be converted to v4 unless it undergoes a significant rewrite. Not sure if we should try to get it working with a custom distribution or with pm.Potential for v4.

⚠️ It requires cython to run.

GP Kron

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Kron.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

factor analysis

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/factor_analysis.ipynb
Reviewers: ?

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

  • Use return_inferencedata=True
  • Do not use Deterministic to "choose" which values to plot, use coords when calling the plotting functions
  • Consider using dims and coords instead of shape
  • use xarray and ArviZ for the plot in code cell 7

Notes

Exotic dependencies

None

Computing requirements

Models seem to take between 1-5 minutes to sample

probabilistic matrix factorization

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/probabilistic_matrix_factorization.ipynb
Reviewers: @ColCarroll

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

  • Use ArviZ and xarray for postprocessing. This will probably be challenging. I'd recommend familiarizing with xarray before working on that. Some ideas:
    • _norms in code cell 23 looks like it could be replaced by xr.apply_ufunc (using input_core_dims)

Notes

Exotic dependencies

None

Computing requirements

Model samples in roughly 1 hour

sampler stats

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/sampler-stats.ipynb

Required changes

The notebook should use InferenceData and ArviZ for plotting. Note that the names of the sampler stats in ArviZ are different, the naming convention for ArviZ can be found at https://arviz-devs.github.io/arviz/schema/schema.html#sample-stats, doc which should be linked too.

Changes for discussion

I think it would be a good addition to add a plot_parallel as a quick way to visualize divergences or to link to the notebook on divergences.

GP marginal

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Marginal.ipynb
Reviewers: @bwengals

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

  • ?

Notes

Exotic dependencies

None

Computing requirements

A couple models seem to take ~20 mins to run.

BEST

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/BEST.ipynb
Reviewers: @twiecki @fonnesbeck

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Notes

Exotic dependencies

None

Computing requirements

Models sample in less than a minute

stochastic volatility

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/stochastic_volatility.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

ArviZ related

  • Use arviz-darkgrid style
  • Use return_inferencedata=True

Notes

Exotic dependencies

None

Computing requirements

Model takes roughly 15 mins to sample

GLM hierarchical

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-hierarchical.ipynb
Reviewers: @twiecki

The sections below may still be pending. If so, the issue is still available, it simply doesn't have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

  • use return_inferencedata=True
  • update slope plotting code
    • use idata and xarray. ⚠️ first operate, then average, the sum of averages is not necessarily the average of the sum.
    • be careful with zorder, alpha and linestyles, markers... to ensure the plot is easy to read for everyone and conveys the relevant information.

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

  • This references the multilevel modeling notebook both as general document and specific cells! ⚠️ with hardcoded links and cell numbers. We should define anchors back in multilevel modeling and use those anchors here. This may be possible in nbsphinx, but I'd say this is a motive to move to myst-nb where it's natural to use rst directives from markdown.

Notes

Exotic dependencies

None

Computing requirements

GP latent

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Latent.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

LKJ

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/LKJ.ipynb
Reviewers: @AlexAndorra

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

ArviZ related

  • Use labeled dims in code cells 12 and 13. Also use .values instead of .data.

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

Notes

Exotic dependencies

None

Computing power

All models run in less than a minute

GLM negative binomial

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-negative-binomial-regression.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

  • Add cross reference where it mentions another notebook "First, let's look at some Poisson distributed data from the Poisson regression example."

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

gaussian processes

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/gaussian_process.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Add a tutorial NB for BART

Currently we only have the doc string, but a good example NB on what it is, how to use it etc would go a long way.

divergences notebook

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.ipynb
Reviewers: ?

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GLM predictions

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-out-of-sample-predictions.ipynb
Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't
have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Getting started notebook is outdated

For example here are the Installation instructions:

Installation

Running PyMC3 requires a working Python interpreter, either version 2.7 (or more recent) or 3.5 (or more recent); we recommend that new users install version 3.5. A complete Python installation for Mac OSX, Linux and Windows can most easily be obtained by downloading and installing the free Anaconda Python Distribution by ContinuumIO.

https://github.com/pymc-devs/pymc-examples/blob/main/examples/getting_started.ipynb

GLM linear

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-linear.ipynb
Reviewers:

Please refer to the notebook updates overview for more detailed guidance on the points below

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

  • Use new numpy random generator
  • Check why manual model runs slower. Is the default prior of glm.from_formula a halfcauchy?

ArviZ related

  • In code cell 7, pass figsize as kwarg to plot_trace, not as plt.figure argument.

Notes

Should that notebook mention bambi at some point? cc @aloctavodia

Exotic dependencies

I think it needs patsy, not completely sure if it' installed with pymc3 or if it's an optional dependency.

Computing requirements

Models run in less than 30 seconds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.