GithubHelp home page GithubHelp logo

mcleonard / sampyl Goto Github PK

View Code? Open in Web Editor NEW
324.0 324.0 53.0 5.33 MB

MCMC samplers for Bayesian estimation in Python, including Metropolis-Hastings, NUTS, and Slice

Home Page: http://mcleonard.github.io/sampyl/

License: MIT License

Python 91.40% Makefile 8.60%

sampyl's People

Contributors

andymiller avatar collignon avatar gbarta avatar mcleonard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sampyl's Issues

Add ability of logp to return both log posterior and its gradient

There are situations where we have access to an analytic form of both the log posterior and its gradient. It is beneficial to use this, as autograd might not work (e.g. compiled code), or it's just wasteful to calculate both logp and its partial derivatives separately. Methods like e.g. fmin_l_bfgs_b allow one to provide a function that returns both the cost function and its gradient, or a function that only returns the cost function (with the gradient in their case calculated by finite differences).

What does importing numpy from sampyl really do?

I've tried the following example importing numpy directly and from sampyl in Python 3.7, and they both seem to give the same results. What am I missing here?

import sampyl as smp
# from sampyl import np
import numpy as np
import seaborn
import matplotlib.pyplot as plt

icov = np.linalg.inv(np.array([[1., .8], [.8, 1.]]))
def logp(x, y):
    d = np.array([x, y])
    return -.5 * np.dot(np.dot(d, icov), d)

start = {'x': 1., 'y': 1.}
nuts = smp.NUTS(logp, start)
chain = nuts.sample(1000)

seaborn.jointplot(chain.x, chain.y, stat_func=None)
plt.show()

numpy_np

Add convergence tests

Add a standard test for each sampler to make sure MCMC sample means/variances are approaching the right answer. A mixture of gaussians would be a good fit here.

Slice sampler & n_chains>1: AttributeError: Can't pickle local object 'main.<locals>.logp'

I'm using the master branch of sampyl. My conda environment:

$ conda list
# packages in environment at /home/gabriel/anaconda3/envs/asteca3:
#
# Name                    Version                   Build  Channel
astropy                   3.0.4            py37h14c3975_0  
atomicwrites              1.1.5                    py37_0  
attrs                     18.1.0                   py37_0  
autograd                  1.2                       <pip>
blas                      1.0                         mkl  
ca-certificates           2018.03.07                    0  
certifi                   2018.8.13                py37_0  
cycler                    0.10.0                   py37_0  
cython                    0.28.5           py37hf484d3e_0  
dbus                      1.13.2               h714fa37_1  
emcee                     2.2.1              pyh24bf2e0_4    astropy
expat                     2.2.5                he0dffb1_0  
fontconfig                2.13.0               h9420a91_0  
freetype                  2.9.1                h8a8886c_0  
future                    0.16.0                    <pip>
glib                      2.56.1               h000015b_0  
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb453b48_1  
icu                       58.2                 h9c2bf20_1  
intel-openmp              2018.0.3                      0  
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.0.1            py37hf484d3e_0  
libedit                   3.1.20170329         h6b74fdf_2  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 7.2.0                hdf63c60_3  
libgfortran-ng            7.2.0                hdf63c60_3  
libpng                    1.6.34               hb9fc6fc_0  
libstdcxx-ng              7.2.0                hdf63c60_3  
libuuid                   1.0.3                h1bed415_2  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.8                h26e45fe_1  
matplotlib                2.2.2            py37hb69df0a_2  
mkl                       2018.0.3                      1  
mkl_fft                   1.0.4            py37h4414c95_1  
mkl_random                1.0.1            py37h4414c95_1  
more-itertools            4.2.0                    py37_0  
ncurses                   6.1                  hf484d3e_0  
numpy                     1.15.0           py37h1b885b7_0  
numpy-base                1.15.0           py37h3dfced4_0  
openssl                   1.0.2p               h14c3975_0  
pandas                    0.23.3           py37h04863e7_0  
patsy                     0.5.0                    py37_0  
pcre                      8.42                 h439df22_0  
pip                       10.0.1                   py37_0  
pip                       18.0                      <pip>
pluggy                    0.6.0                    py37_0  
psutil                    5.4.6            py37h14c3975_0  
py                        1.5.4                    py37_0  
pyparsing                 2.2.0                    py37_1  
pyqt                      5.9.2            py37h22d08a2_0  
pytest                    3.6.3                    py37_0  
pytest-arraydiff          0.2              py37h39e3cac_0  
pytest-astropy            0.4.0                    py37_0  
pytest-doctestplus        0.1.3                    py37_0  
pytest-openfiles          0.3.0                    py37_0  
pytest-remotedata         0.3.0                    py37_0  
python                    3.7.0                hc3d631a_0  
python-dateutil           2.7.3                    py37_0  
pytz                      2018.5                   py37_0  
qt                        5.9.6                h52aff34_0  
readline                  7.0                  ha6073c6_4  
scipy                     1.1.0            py37hc49cb51_0  
seaborn                   0.9.0                    py37_0  
setuptools                39.2.0                   py37_0  
sip                       4.19.8           py37hf484d3e_0  
six                       1.11.0                   py37_1  
sqlite                    3.24.0               h84994c4_0  
statsmodels               0.9.0            py37h035aef0_0  
tk                        8.6.7                hc745277_3  
tornado                   5.0.2            py37h14c3975_0  
wheel                     0.31.1                   py37_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               ha838bed_2  

I get a AttributeError: Can't pickle local object 'main.<locals>.logp' in the line

new_chains = pool.map(func, samplers)

whenever I use more than one chain and the Slice sampler.

Tutorial example not working in Python 2.7

I'm testing sampyl v0.3 in a conda environment running Python 2.7.15. I tried running the simple example shown here http://matatat.org/sampyl/, and I get:

Traceback (most recent call last):
  File "/home/gabriel/Dropbox/python-test/sampyl_test.py", line 13, in <module>
    nuts = smp.NUTS(logp, start)
  File "/home/gabriel/anaconda3/envs/asteca27/lib/python2.7/site-packages/sampyl/samplers/NUTS.py", line 77, in __init__
    super(NUTS, self).__init__(logp, start, **kwargs)
  File "/home/gabriel/anaconda3/envs/asteca27/lib/python2.7/site-packages/sampyl/samplers/base.py", line 32, in __init__
    start = {unicodedata.normalize('NFKC', key): val for key, val in start.items()}
  File "/home/gabriel/anaconda3/envs/asteca27/lib/python2.7/site-packages/sampyl/samplers/base.py", line 32, in <dictcomp>
    start = {unicodedata.normalize('NFKC', key): val for key, val in start.items()}
TypeError: normalize() argument 2 must be unicode, not str

Could this be related to the oldish version of Python I'm running?

Issue with example on German tanks

I have an issue with the example of the German Tank Problem.

As it is now (9/1/2019), the prior is not a real prior because it peeks at the data to determine the highest serial number and uses it as a lower bound on the number of tanks. The docs say: "We also know that N must be an integer and any value above 256 is equally likely, a priori, that is, before we saw the serial numbers." This is incorrect: "a priori", N can be any number >=0. It is only after observing the data (i.e. "a posteriori") that we can say that N must be greater than or equal to 256. I think the example should be corrected.

The code below implements what is IMHO the correct model. The posterior looks similar to that produced by the current code.

import sampyl as smp
from sampyl import np
import matplotlib.pyplot as plt

# Data
serials = np.array([10, 256, 202, 97])
m = np.max(serials)

# log P(N | D)
def logp(N):
    # Samplers will pass in floats, we need to make them integers
    N = np.floor(N).astype(int)

    # Log-likelihood
    llh = smp.discrete_uniform(serials, lower=1, upper=N)

    prior = smp.discrete_uniform(N, lower=0, upper=10000)

    return llh + prior

# Slice sampler for drawing from the posterior
sampler = smp.Slice(logp, {'N':300})
chain = sampler.sample(20000, burn=4000, thin=4)

posterior = np.floor(chain.N)
plt.hist(posterior, range=(0, 1000), bins=100,
        histtype='stepfilled', normed=True)
plt.xlabel("Total number of tanks")
plt.ylabel("Posterior probability mass")

plt.show()

screenshot1

track log-like trace

Should we track the log-likelihood's computed for each sample? If we let the sampler object own a trace of samples, it can also own a trace of log likelihoods.

parallel tempering sampler

This is an ensemble of samplers at different temperatures - each sampler can just be an existing Sampyl sampler. This object would just keep track of the cooling schedule.

Is Sampyl still being developed/maintained?

Hi, I'm looking for a simple to use Bayesian MCMC sampler fully Python based. I've been trying emcee and now I've found Sampyl, which looks really good.

Before attempting to implement it though, I'd like to know if it is still being developed or at least maintained. For what I can see the latest commit is 1.5 years old with no active branches, and the opened issues are ~2 years old with no apparent progress in either of them.

Thanks.

Progress bar

Invaluable for some feed back while sampling.

Sampling in parallel

There needs to be functionality to run multiple chains in parallel. This can be used for convergence testing and for increasing the samples per second you get.

find_MAP does nothing?

I'm testing the find_MAP() method but it does not seem to be doing anything. I.e, it just returns the same input I give it:

> smp.find_MAP(logp, {'met': .015, 'age': 9., 'ext': 0., 'dist': 13., 'mass': 5000.,'binar': .3}, verbose=True, bounds=bounds)
      fun: 5950.001300620037
 hess_inv: <6x6 LbfgsInvHessProduct with dtype=float64>
      jac: array([0., 0., 0., 0., 0., 0.])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 1
      nit: 0
   status: 0
  success: True
        x: array([1.5e-02, 9.0e+00, 0.0e+00, 1.3e+01, 5.0e+03, 3.0e-01])
State([('met', array(0.015)), ('age', array(9.)), ('ext', array(0.)), ('dist', array(13.)), ('mass', array(5000.)), ('binar', array(0.3))])

and I can easily find a better solution (only changed the second parameter):

> logp(1.5e-02, 9.5e+00, 0.0e+00, 1.3e+01, 5.0e+03, 3.0e-01)
-5890.1386503849135

discrete_uniform broken by autograd 1.1.3

Hi, thanks for the interesting library!

I was playing with the German Tank Problem example, and initially had trouble getting it to work - it would throw "TypeError: data type not understood" while computing the logp.

This turns out to only be a problem when using autograd 1.1.3 - the previous version 1.1.2 works fine. It happens because autograd now wraps numpy.int_ as a new function, which breaks the integral type check in discrete_uniform: if x.dtype != np.int_: because it is not checking against the real np.int_.

Currently it appears that discrete_uniform is only used by the tank example.

I am inclined to think that if autograd is going to wrap np.int_ then it should also make it work transparently in the numpy type hierarchy, but I don't know enough about numpy internals to know how that might be done.

A possible workaround is to replace the check with if x.dtype.kind in "iu":. I guess another is to use the unwrapped numpy to get access to numpy.int_, although it might be confusing to have two references to numpy which are not safely interchangeable.

add sample summary

we have names of variables and collections of state samples - it'd be nice if we could have a very generic method for summarizing them on the command line (and maybe a method for pretty printing them in a notebook). Something like

variable  mean  sd  skew    autocorr    eff_samp_size
   x      0.22
   y0      0.21
   y1      0.11
...
   y100  0.01  ...

where x is a scalar, and y is a vector.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.