jwarmenhoven / dbda-python Goto Github PK

Doing Bayesian Data Analysis, 2nd Edition (Kruschke, 2015): Python/PyMC3 code

License: MIT License

Jupyter Notebook 100.00%

bayesian-data-analysis bayesian-inference pymc3 mcmc hierarchical-models kruschke probabilistic-programming

dbda-python's Introduction

Doing Bayesian Data Analysis - Python/PyMC3

This repository contains Python/PyMC3 code for a selection of models and figures from the book 'Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan', Second Edition, by John Kruschke (2015). The datasets used in this repository have been retrieved from the book's website. Note that, in its current form, this repository is not a standalone tutorial and that you probably should have a copy of the book to follow along. Suggestions for improvement and help with unsolved issues are welcome!

Note that the code is in Jupyter Notebook format and requires modification to use with other datasets.

Some of the general concepts from the book are discussed in papers by Kruschke & Liddell. See references below.

2018-08-16:
Updating the notebooks with PyMC3 v3.5 and general code clean-up. Inserting plots of the PyMC models in plate notation (v3.5 feature). Fixing some deprecation warnings.

Chapter 9 - Hierarchical Models
Chapter 10 - Model Comparison and Hierarchical Modelling
Chapter 12 - Bayesian Approaches to Testing a Point ("Null") Hypothesis
Chapter 16 - Metric-Predicted Variable on One or Two Groups
Chapter 17 - Metric-Predicted Variable with One Metric Predictor
Chapter 18 - Metric Predicted Variable with Multiple Metric Predictors
Chapter 19 - Metric Predicted Variable with One Nominal Predictor
Chapter 20 - Metric Predicted Variable with Multiple Nominal Predictor
Chapter 21 - Dichotomous Predicted Variable
Chapter 22 - Nominal Predicted Variable
Chapter 23 - Ordinal Predicted Variable
Chapter 24 - Count Predicted Variable

Extra:
Bayesian Linear Regression example (Bishop, 2006)
Example on modelling Ordinal Data (Liddell & Kruschke, 2018)

Libraries used:

pymc3
theano
pandas
numpy
scipy
matplotlib
seaborn

References:

Bishop, C.M. (2006), Pattern Recognition and Machine Learning, Springer Science+Business Media, New York. https://www.microsoft.com/en-us/research/people/cmbishop/

Kruschke, J.K. (2015), Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition, Academic Press / Elsevier, https://sites.google.com/site/doingbayesiandataanalysis/

Kruschke, J.K. & Liddell, T.M. (2017), The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective, Psychonomic Bulletin & Review, http://dx.doi.org/10.3758/s13423-016-1221-4

Kruschke, J.K. & Liddell, T.M. (2017), Bayesian data analysis for newcomers, Psychonomic Bulletin & Review, http://dx.doi.org/10.3758/s13423-017-1272-1

Liddell, T., & Kruschke, J. K. (2018, April 5). Analyzing ordinal data with metric models: What could possibly go wrong? Retrieved from http://osf.io/3tkz4

Salvatier J, Wiecki TV, Fonnesbeck C. (2016), Probabilistic programming in Python using PyMC3, PeerJ Computer Science 2:e55, https://doi.org/10.7717/peerj-cs.55
PyMC3 - http://pymc-devs.github.io/pymc3/

Note:

The repository below contains python code for the first edition of the book. The code in that repository is a much more direct implementation of the R/JAGS code from the book than you will find here.
https://github.com/aloctavodia/Doing_bayesian_data_analysis

dbda-python's People

Contributors

Stargazers

Watchers

Forkers

wavelets lenovor blithejack lizhen-dlut mohd1012 ejmurray agutoreva hehuanshu96 yangls06 benjamesbabala allensmile anhnguyendepocen zachwill vnnw jaydenwhyte xc35 haoybl datawego arnawldo boyali wen036 freshforlife biroc directcsd sfarhd14 phillette mehrvch jfrydendall codeit xiaoguozhi minsu-daniel-kim guneetkohli spiedeman snowdj telescopeuser isaclira vishalseshagiri linzzzzzz pmarkoo vserpak ditannan batermj eric-1986 liam-f kkim610 udhai17 hbcbh1999 travisseymour jamescrosbie aboussetta jacobic leechang-soo vszhukov1 hurun kcciti omarzaghlol tilusnet gaphel1 miloventimiglia ankitnamdeo34 lyealy adi1729 zhaosongyi cchrysostomou vishalbelsare harirajeev marsela kduponte orsushiva sbarman25 ankj kevintrannz ketalochiang ememdev hulalazz pcblanchet shafiahmed johnshushu dlreseach noahliot pmayd riviera2015 alishakiba gakkilovemath washingtonm asclepio mmesbahu aneeshks ppriceu seanreed1111 ak47m1a1 investigatorr pvk444 mihaihai ducdh1210 luzhulearn sungreong turkerfan maogautam etsangsplk

dbda-python's Issues

Refactor code

Chapter 18

Update repository for PyMC3 syntax changes and Arviz plotting

There are no versions of the packages you are using

It is preferable to show on README.md the versions of numpy or pymc3 or matplotlib that you have used in order to create the notebooks.

You can include the version of every package in a file called `requirements.txt, for example:

https://github.com/jvns/pandas-cookbook

or something like this at the end:

https://github.com/aloctavodia/BAP/blob/master/code/Chp7/07_Mixture_Models.ipynb

Warnings after sampling

Chapter 20
Having some issues with the models in 20.4 and 20.5

Hierarchical model3 doesnot have nu_minus1 variable.

The model3 should have nu_minus1 variable, which can be used to make nu
nu = pm.Deterministic('nu', nu_minus1 + 1)

Chapter 10: Wrong number of coin flips for Model 2?

Hey, I was just quickly skimming over the Chapter 10 code and after it says

Model 2 - Two theta variables without pseudo priors

Coin is flipped nine times, resulting in six heads.

y2 = pm.Bernoulli('y2', theta, observed=[1,1,1,1,1,0,0,0])
I can only count 8 flips resulting in 5 heads. It also says so on the plate of the graphical model.

Chapter 9 - batting average Pitcher too high?

The posterior distribution of category 'pitcher' is not comparable with the one from the book. It seems too close to the other field position categories.

PyMCon 2020

Hi!

As you may have already seen on Twitter or on PyMC Discourse, we are planning a virtual conference for the PyMC community. All the information is available in the Discourse post.

We are currently looking for conference chairs and volunteers and would be very grateful if you could share the word! We also want to encourage you to, if you are interested and available, apply to be a conference chair.

Chapter 18: Model definition for multiple regression

There is something wrong with the model definition or the standardizing/handling of the data: the results (parameter values) are not comparable with the ones from the book.

Chapter 12: sampling of model index does not seem to work

The model index is always 0.

Wrong array dimensions in Chapter 24?

Getting the following error when trying to run code for Chapter 24:

trace1['a1a2'][:,j1,j2])
ValueError: could not broadcast input array from shape (20000) into shape (5000)

Which happens in this part of the code:

# Transforming the trace data to sum-to-zero values
m = np.zeros((Nx1Lvl,Nx2Lvl, n_samples*4))
b1b2 = m.copy()

for (j1,j2) in np.ndindex(Nx1Lvl,Nx2Lvl):
        m[j1,j2,:] =  (trace1['a0'] +
                     trace1['a1'][:,j1] +
                     trace1['a2'][:,j2] +
                     trace1['a1a2'][:,j1,j2])

I think the issue boils down to the size of the variable m, which is set to have dimensions (Nx1Lvl, Nx2Lvl, 5000). However, the trace1 object, ends up producing arrays of length = 20,000 because it concatenates the results of all 4 MCMC chains together:

In [78]: len(trace1['a0'])
Out[78]: 20000

If I change the dimensions of m to be

m = np.zeros((Nx1Lvl,Nx2Lvl, n_samples*4))

the code seems to run just fine.

I'm new to PyMC3 so perhaps this is due to how the trace objects works on different versions of PyMC3. I'm using PyMC3 version 3.4.1