GLM logistic

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-logistic.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

PyMC3 Examples Known Errors & Notes (March 2021)

Summary

The following are existing issues & suggestions in the pymc3-examples repo after going through an iteration of renaming plot dependencies from pm. to arviz.

Installs

Pymc3: v3.11
theano-pymc (aesara): v 1.1.2

Issues

General Issues

Have param returninference =True

File-specific Issues

examples/pymc3_howto/lasso_block_update.ipynb

Issue

"return_inferencedata should be manually set, either to True or to False, but one of the two to avoid the warning.

If False, an inferencedata should be created within the model context, and passed to arviz. This will

avoid the warning of conversion without the model context and
push forward arviz best practices, it is probably not too relevant here but conversion may not be cheap for some models because it requires computing all the pointwise log likelihood values. az.plot_xyz(trace) works because ArviZ internally converts the data to inferencedata, then plots."

examples/pymc3_howto/data_container.ipynb

Issue

We should have "keep_size=True to avoid the warning in the cell below, also because in a future ArviZ release the behaviour of hdi will change for 2d arrays (not for 1d or 3d+ arrays), so using 3d arrays with chain, draw, *shape should be used."

examples/pymc3_howto/sampling_conjugate_step.ipynb

Code

traces = []
models = []
names = ["Partial conjugate sampling", "Full NUTS"]

for use_conjugate in [True, False]:
    with pm.Model() as model:
        tau = pm.Exponential("tau", lam=1, testval=1.0)
        alpha = pm.Deterministic("alpha", tau * np.ones([N, J]))
        p = pm.Dirichlet("p", a=alpha)

        if use_conjugate:
            # If we use the conjugate sampling, we don't need to define the likelihood
            # as it's already taken into account in our custom step method
            step = [ConjugateStep(p.transformed, counts, tau)]

        else:
            x = pm.Multinomial("x", n=ncounts, p=p, observed=counts)
            step = []

        trace = pm.sample(step=step, chains=2, cores=1, return_inferencedata=True)
        traces.append(trace)

    assert all(az.summary(trace)["r_hat"] < 1.1)
    models.append(model)

Issue

"Since we are not storing the summary dataframe anywhere and we only want the rhat, we should use rhat instead. The assertion can be done with:

assert (az.rhat(trace).to_array() < 1.1).all()"

examples/ode_models/ODE_with_manual_gradients.ipynb

Code

Similar to PR 43, for line 33 at variable trace =

    Y_obs = pm.Lognormal("Y_obs", mu=pm.math.log(forward), sigma=sigma, observed=Y)

    trace = pm.sample(1500, init="jitter+adapt_diag", cores=1)
trace["diverging"].sum()

I changed init from adapt_diag to jitter+adapt_diag & added param cores=1.

Issue

I get a sampling error when using adapt_diagor other adapter types....unsure why.

The error:

SamplingError: Bad initial energy

Seen here

examples/generalized_linear_models/GLM-model-selection.ipynb

Issue 1

Code

ax = (
    dfll["log_likelihood"]
    .unstack()
    .plot.bar(subplots=True, layout=(1, 2), figsize=(12, 6), sharex=True)
)

ax[0, 0].set_xticks(range(5))
ax[0, 0].set_xticklabels(["k1", "k2", "k3", "k4", "k5"])
ax[0, 0].set_xlim(-0.25, 4.25);

Issue

One dependency errors out, this making remainder of notebook not run.
Errors on missing sd_log__, and therefore cannot run entire notebook due to dependency.

Particularly

GLM-model-selection KeyError: 'var names: "['sd_log__'] are not present" in dataset'

More details here

Issue 2

Code

dfll = pd.DataFrame(index=["k1", "k2", "k3", "k4", "k5"], columns=["lin", "quad"])
dfll.index.name = "model"

for nm in dfll.index:
    dfll.loc[nm, "lin"] = -models_lin[nm].logp(
        az.summary(traces_lin[nm], traces_lin[nm].varnames)["mean"].to_dict()
    )

    dfll.loc[nm, "quad"] = -models_quad[nm].logp(
        az.summary(traces_quad[nm], traces_quad[nm].varnames)["mean"].to_dict()
    )

dfll = pd.melt(dfll.reset_index(), id_vars=["model"], var_name="poly", value_name="log_likelihood")
dfll.index = pd.MultiIndex.from_frame(dfll[["model", "poly"]])

Issue

I'm not familiar with this notebook and find the pandas stuff happening above quite confusing. The model selection should be simplified with newer ArviZ features. Having to work with straces directly is not something we should need to teach 😬

GLM in PyMC3: Out-Of-Sample Predictions Example

Hi 👋 ! I am very interested in the glm module of PyMC3, specially using it for prediction problems. There is an active thread about this topic: “Out of sample” predictions with the GLM sub-module. I wrote a small post summarising the potential solutions : https://juanitorduz.github.io/glm_pymc3/ Is this something you would be interested in having on the examples? If so, is it ok as it is now or do you want me to reduce it a just focus on the out-of-sample section. I am happy to contribute to the documentation 🚀 .

log gaussian cox process

File: https://nbviewer.jupyter.org/github/pymc-devs/pymc-examples/blob/main/examples/case_studies/log-gaussian-cox-process.ipynb
Reviewers: @ckrapu

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

use numpy Generator. See also https://numpy.org/doc/stable/reference/random/index.html?highlight=random%20sampling%20numpy%20random#quick-start

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

Use xarray and from_pymc3_predictions to filter nans and slice/reduce intensity_samples

Notes

Exotic dependencies

None

Computing requirements

Model takes roughly 5 mins to sample.

use main instead of master in .github/workflows/pre-commit.yml

GP sparse

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-SparseApprox.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Specify shape in MvNormal

As per pymc-devs/pymc#4207 (comment)

bayes factor

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/Bayes_factor.ipynb
Reviewers: @aloctavodia

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP-MeansAndCovs.ipynb throws LinAlgError

Cell 6 contains:

lengthscale = 0.2
eta = 2.0
cov = eta ** 2 * pm.gp.cov.ExpQuad(1, lengthscale)

X = np.linspace(0, 2, 200)[:, None]
K = cov(X).eval()

plt.figure(figsize=(14, 4))

plt.plot(X, pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K).random(size=3).T)
plt.title("Samples from the GP prior")
plt.ylabel("y")
plt.xlabel("X");

which throws

ValueError: input operand has more dimensions than allowed by the axis remapping

As per #11 , I can fix this by specifying the shape of the MvNormal:

lengthscale = 0.2
eta = 2.0
cov = eta ** 2 * pm.gp.cov.ExpQuad(1, lengthscale)

X = np.linspace(0, 2, 200)[:, None]
K = cov(X).eval()

plt.figure(figsize=(14, 4))

plt.plot(X, pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K, shape=K.shape[0]).random(size=3).T)
plt.title("Samples from the GP prior")
plt.ylabel("y")
plt.xlabel("X");

but then I get

~/pymc3-dev/pymc3/distributions/multivariate.py in random(self, point, size)
    277 
    278         if self._cov_type == "cov":
--> 279             chol = np.linalg.cholesky(param)
    280         elif self._cov_type == "chol":
    281             chol = param

<__array_function__ internals> in cholesky(*args, **kwargs)

~/miniconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/numpy/linalg/linalg.py in cholesky(a)
    762     t, result_t = _commonType(a)
    763     signature = 'D->D' if isComplexType(t) else 'd->d'
--> 764     r = gufunc(a, signature=signature, extobj=extobj)
    765     return wrap(r.astype(result_t, copy=False))
    766 

~/miniconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_nonposdef(err, flag)
     89 
     90 def _raise_linalgerror_nonposdef(err, flag):
---> 91     raise LinAlgError("Matrix is not positive definite")
     92 
     93 def _raise_linalgerror_eigenvalues_nonconvergence(err, flag):

LinAlgError: Matrix is not positive definite

cc @Sayam753 any suggestions?

model comparison

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/model_comparison.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP Mauna Loa (continuation)

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-MaunaLoa2.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

hierarchical partial pooling

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/hierarchical_partial_pooling.ipynb
Reviewers: @OriolAbril

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

Consider using new ArviZ labeling capabilities in the plot_forest to avoid having to manually set the yticklabels. We'd instead use coords for that.
- Note labeller feature in ArviZ has not been released yet, this requires waiting for this.

Notes

Exotic dependencies

None

Computing power

Model runs in roughly a minute

Conditional Autoregressive (CAR) model

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/conditional-autoregressive-model.ipynb
Reviewers: @junpenglao

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

There seems to be a markdown formatting issue in section "PyMC3 implementation using Sparse Matrix"

Notes

Exotic dependencies

None

Computing requirements

Most models run in less than 2 minutes, one seems to take ~5mins

Multilevel modeling

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/multilevel_modeling.ipynb

This notebook endured a significant rewrite not too long ago, and is one of the go to examples on ArviZ+xarray usage. This is both its main pro and its own downfall. There are some nits to keep it up to its current golden standard.

Required changes

The notebook should be reexecuted with latest pymc3 and have the warning filter removed (there should be no warnings, if there are they should be fixed). Moreover, rerunning with latest ArviZ will generate much more aesthetically pleasing forestplots. legend=True should be used for forestplots with multiple models.

Changes for discussion

The plot forest on cell 42 should use the approach described in arviz-devs/arviz#1627 to avoid showing the 1 dots that correspond to the elements in the diagonal.

⚠️ These changes need to wait for the next ArviZ release, they require ArviZ>0.11.2, 0.11.2 is not enough.

GP smoothing

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-smoothing.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GLM model selection

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-model-selection.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP Student-t process

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-TProcess.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GLM poisson

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-poisson-regression.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Use numpy generator
⚠️ code cells 15 and 19 are plain wrong, we are doing np.exp(np.mean()) instead of np.mean(np.exp()).

ArviZ related

code cell 15 (again) is computing the whole summary dataframe, when only a subset of the columns are needed. We should either use kind="stats" or customize summary, examples of both at: https://arviz-devs.github.io/arviz/api/generated/arviz.summary.html

Notes

Exotic dependencies

None

Computing requirements

Models sample in less than a minute

Standardize and Update Notebook Gallery

[BEGINNER-FRIENDLY]
Our notebooks gallery is quite big, so:

Many of them use an old style and could use an updating with ArviZ color style instead (not listed).
Many notebooks show FutureWarnings that should be addressed (not listed).
Some notebooks fail to run because they use outdated third-party APIs or exotic packages (listed below).

So this issue is here to signal it would be nice if people want to take some time updating and re-running the notebooks below with PyMC 3.9, according to this style page 🎉
Do it in small batches though, to not get bored and enjoy it 😉 Thanks a lot in advance for your help and don't hesitate to ask your questions below!
PyMCheers 🖖

Here is an up-to-date list of the most outdated and problematic NBs (those not listed here should be checked for style and updating accordingly):

Exotic

blackbox_external_likelihood needs Cython
convolutional_vae_keras_advi needs Keras

Other Issues

putting workflow

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/putting_workflow.ipynb

Required changes

See tracker and its description

notes: consider creating a logit_idata and a logit_trace = logit_idata.posterior (and the same for other models). I think this will minimize the need to modify the code. That being said, I would not expect updating it to be a walk in the park. I'd recommend working on this only if you are already familiar and more or less comfortable with xarray.

GP means and covs

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-MeansAndCovs.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Add contributing information and link to style guide wiki

GP Mauna Loa

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-MaunaLoa.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GP circular

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Circular.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

model averaging

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/model_averaging.ipynb
Reviewers: @aloctavodia

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Use sigma instead of sd

e.g. Normal('x', mu=mu, sigma=sigma) rather than Normal('x', mu=mu, sd=sigma)

sd will (silently) continue working, see pymc-devs/pymc#4344

prior and posterior predictive checks

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/posterior_predictive.ipynb
Reviewers: @AlexAndorra @lucianopaz

Note: Please refer to notebook updates overview for more details on some of the bullet points below

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Use new numpy random generator (see updates overview)

ArviZ related

Use InferenceData
Try to take advantage of matplotlib and xarray plotting to avoid unnecessary plotting loops
update hpd to hdi

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

Show advanced uses of sample_posterior_predictive? Or should that be another more specific notebook not focused on model criticism but purely on pymc3 usage? (i.e. a howto instead of a diagnostics_and_criticism notebook).

Notes

Exotic dependencies

None

Computing requirements

All models seem to sample in under a minute

Combine Marginalized and Latent Gaussian Mixture Notebooks?

The two notebooks are covering exactly the same issue.

They seem short enough that we could use the same dataset and show one after the other. This way we also get a chance to nudge users to try the marginalized mixture, which usually works better.

https://docs.pymc.io/notebooks/gaussian_mixture_model.html
https://docs.pymc.io/notebooks/marginalized_gaussian_mixture_model.html

Blackbox likelihood

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/blackbox_external_likelihood.ipynb

Required changes

The notebook needs to be modified to use ArviZ+InferenceData at the very least, it's hard to know what extra work will be needed as it has not been executed for a while, cython usage is tricky to get right. Note that it uses DensityDist is strange ways, using dicts as observed that contain freeRVs and will need to use density_dist_obs=False as idata_kwargs (ref)

Changes to discuss

I don't think this notebook can be converted to v4 unless it undergoes a significant rewrite. Not sure if we should try to get it working with a custom distribution or with pm.Potential for v4.

⚠️ It requires cython to run.

GP Kron

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Kron.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

factor analysis

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/factor_analysis.ipynb
Reviewers: ?

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Use numpy random Generator. See also https://numpy.org/doc/stable/reference/random/index.html?highlight=random%20sampling%20numpy%20random#quick-start

ArviZ related

Use return_inferencedata=True
Do not use Deterministic to "choose" which values to plot, use coords when calling the plotting functions
Consider using dims and coords instead of shape
use xarray and ArviZ for the plot in code cell 7

Notes

Exotic dependencies

None

Computing requirements

Models seem to take between 1-5 minutes to sample

probabilistic matrix factorization

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/probabilistic_matrix_factorization.ipynb
Reviewers: @ColCarroll

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Use numpy Generator. See also https://numpy.org/doc/stable/reference/random/index.html?highlight=random%20sampling%20numpy%20random#quick-start

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

Use ArviZ and xarray for postprocessing. This will probably be challenging. I'd recommend familiarizing with xarray before working on that. Some ideas:
- _norms in code cell 23 looks like it could be replaced by xr.apply_ufunc (using input_core_dims)

Notes

Exotic dependencies

None

Computing requirements

Model samples in roughly 1 hour

sampler stats

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/sampler-stats.ipynb

Required changes

The notebook should use InferenceData and ArviZ for plotting. Note that the names of the sampler stats in ArviZ are different, the naming convention for ArviZ can be found at https://arviz-devs.github.io/arviz/schema/schema.html#sample-stats, doc which should be linked too.

Changes for discussion

I think it would be a good addition to add a plot_parallel as a quick way to visualize divergences or to link to the notebook on divergences.

rugby analytics

file: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/rugby_analytics.ipynb

Required changes

The inital exploratory analysis looks ok, seaborn is an ok dependency for this, the model needs to be updated, no flat priors for example and we should also use coords+dims and ArviZ for posterior analysis and exploration.

Notes: updates on the model and priors can be taken from https://github.com/arviz-devs/arviz_example_data/blob/main/rugby.ipynb.

Change examples to use coords where appropriate

coords is a very useful feature that not enough people know about. Our docs should establish best practices so using it in our NBs is an important step.

GP marginal

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Marginal.ipynb
Reviewers: @bwengals

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Typo reported in pymc-devs/pymc#4587
use numpy Generator, see also https://numpy.org/doc/stable/reference/random/#quick-start

ArviZ related

?

Notes

Exotic dependencies

None

Computing requirements

A couple models seem to take ~20 mins to run.

BEST

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/BEST.ipynb
Reviewers: @twiecki @fonnesbeck

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Use numpy Generator instead of using the global state random state. Extra reference: https://numpy.org/doc/stable/reference/random/index.html?highlight=random%20sampling%20numpy%20random#quick-start

Notes

Exotic dependencies

None

Computing requirements

Models sample in less than a minute

stochastic volatility

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/stochastic_volatility.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

ArviZ related

Use arviz-darkgrid style
Use return_inferencedata=True

Notes

Exotic dependencies

None

Computing requirements

Model takes roughly 15 mins to sample

GLM hierarchical

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-hierarchical.ipynb
Reviewers: @twiecki

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

use try except in data reading
Add code for rmse mentioned in the notebook, see https://discourse.pymc.io/t/glm-hierarchical-linear-regression-posterior-predictive-check-rmsd/5668

ArviZ related

use return_inferencedata=True
update slope plotting code
- use idata and xarray. ⚠️ first operate, then average, the sum of averages is not necessarily the average of the sum.
- be careful with zorder, alpha and linestyles, markers... to ensure the plot is easy to read for everyone and conveys the relevant information.

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

This references the multilevel modeling notebook both as general document and specific cells! ⚠️ with hardcoded links and cell numbers. We should define anchors back in multilevel modeling and use those anchors here. This may be possible in nbsphinx, but I'd say this is a motive to move to myst-nb where it's natural to use rst directives from markdown.

Notes

Exotic dependencies

None

Computing requirements

GP latent

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-Latent.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

LKJ

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/LKJ.ipynb
Reviewers: @AlexAndorra

Context

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

ArviZ related

Use labeled dims in code cells 12 and 13. Also use .values instead of .data.

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

ArviZ related

Show how to index correlation matrixes with xarray. See arviz-devs/arviz#1627 and https://arviz-devs.github.io/arviz/user_guide/label_guide.html#custom-labellers for details and examples on this (note that ArviZ docs are for the development version, at the time of writing, labellers are an unreleased feature)

Notes

Exotic dependencies

None

Computing power

All models run in less than a minute

GLM negative binomial

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-negative-binomial-regression.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Add cross reference where it mentions another notebook "First, let's look at some Poisson distributed data from the Poisson regression example."

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GLM hierarchical binomial model

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-hierarchical-binominal-model.ipynb
Reviewers: @OriolAbril

Context

Notebook had a PR to update it to "Best Practices" before the start of the tracking project.

gaussian processes

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/gaussian_process.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Add a tutorial NB for BART

Currently we only have the doc string, but a good example NB on what it is, how to use it etc would go a long way.

divergences notebook

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.ipynb
Reviewers: ?

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

GLM predictions

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-out-of-sample-predictions.ipynb
Reviewers:

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

ArviZ related

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Wrong Plot Parallel Results in file Diagnosing_biased_Inference_with_Divergences.ipynb

Description

The updated PR #25 resulted in differences in results for a pre-existing notebook, seen in Review of examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.ipynb

There was an upgrade from PyMC3 V3.9 to V 3.11.

I suspect the nature of pm.sample affected the outcome?

Example of changed results

Getting started notebook is outdated

For example here are the Installation instructions:

https://github.com/pymc-devs/pymc-examples/blob/main/examples/getting_started.ipynb

GLM linear

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-linear.ipynb
Reviewers:

Please refer to the notebook updates overview for more detailed guidance on the points below

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

Use new numpy random generator
Check why manual model runs slower. Is the default prior of glm.from_formula a halfcauchy?

ArviZ related

In code cell 7, pass figsize as kwarg to plot_trace, not as plt.figure argument.

Notes

Should that notebook mention bambi at some point? cc @aloctavodia

Exotic dependencies

I think it needs patsy, not completely sure if it' installed with pymc3 or if it's an optional dependency.

Computing requirements

Models run in less than 30 seconds.

pymc-devs / pymc-examples Goto Github PK

pymc-examples's People

Contributors

Stargazers

Watchers

Forkers

pymc-examples's Issues

Known changes needed

General updates

ArviZ related

Changes for discussion

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Summary

Installs

Issues

General Issues

File-specific Issues

Issue 1

Issue 2

Context

Known changes needed

General updates

Changes for discussion

ArviZ related

Notes

Exotic dependencies

Computing requirements

Known changes needed

General updates

ArviZ related

Changes for discussion

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Known changes needed

General updates

ArviZ related

Changes for discussion

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Known changes needed

General updates

ArviZ related

Changes for discussion

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Known changes needed

General updates

ArviZ related

Changes for discussion

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements

Changes for discussion

ArviZ related

Notes

Exotic dependencies

Computing power

Known changes needed

General updates

Notes

Exotic dependencies

Computing requirements

Required changes

Changes for discussion

Known changes needed