GithubHelp home page GithubHelp logo

eigenfoo / littlemcmc Goto Github PK

View Code? Open in Web Editor NEW
52.0 52.0 8.0 748 KB

A lightweight and performant implementation of HMC and NUTS in Python, spun out of the PyMC project.

Home Page: https://littlemcmc.readthedocs.io

License: Apache License 2.0

Python 96.88% Makefile 2.62% Shell 0.50%
hmc markov-chain-monte-carlo mcmc nuts python

littlemcmc's Introduction

Hello, I'm George! 👋

I’m a data scientist and machine learning researcher based in New York City. I like natural language processing, Bayesian modelling, open source software and coffee. I’m based in New York and I work for Flatiron Health.

In my copious free time, I contribute to and maintain several open-source libraries (particularly the PyMC project). Away from the keyboard, I spend my time drinking coffee, taking photos and obsessing over mechanical keyboards.

Previously, I worked at Point72 Asset Management and Quantopian (acquired by Robinhood), and studied engineering at The Cooper Union with minors in computer science and mathematics.

For more about me (e.g. my blog, projects and résumé), please visit my website.

Contact

I love getting emails from non-bots, and you can also find me on Twitter, Mastodon and LinkedIn.

littlemcmc's People

Contributors

eigenfoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

littlemcmc's Issues

Rework sampling API

Ideally it would be something like

trace, stats = lmc.sample(
    logp_dlogp_func=logp_dlogp_func,
    size=1,
    draws=1000,
    tune=500,
    step="nuts",
    chains=4,
    cores=1,
    progressbar="notebook"
)

instead of

trace, stats = lmc.sample(
    logp_dlogp_func=logp_dlogp_func,
    size=1,
    draws=1000,
    tune=500,
    step=lmc.NUTS(logp_dlogp_func=logp_dlogp_func, size=1),
    chains=4,
    cores=1,
    progressbar="notebook"
)

Return ArviZ output

Instead of returning NumPy arrays and Python dicts as output, we should let there be an option to return the output as an ArviZ object. This would allow much better compatibility.

No copyright or license file in the source files

See notice at the bottom of the LICENSE

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

Broken progress bar when multiprocessing

Description
When passing cores > 1, the progress bar does not work well: sometimes (when using tqdm_notebook) nothing is shown at all, and sometimes (when using tqdm) the progress bars "fight", occasionally starting on new lines.

Reproducible Example
Simply see the quickstart tutorial.

DOC: write documentation

  • Document code itself
  • Rerun the notebooks, making sure that multiprocessing works...
  • Write a developers guide, explaining how the various modules work together. This will probably be helpful to PyMC3 later on.

s/size/model_ndim/g

Currently we have the size parameter in lmc.sample. However, model_ndim might be a more descriptive variable name (and also mirrors what PyMC3 uses).

Publish in JOSS?

Need to find out if JOSS accepts "derivative" software packages: LittleMCMC is derived mostly from PyMC3 code.

Multiprocess sampling fails on macOS

Description

Please provide a minimal, self-contained, and reproducible example.

import numpy as np
import scipy.stats
import littlemcmc as lmc


def logp_func(x, loc=0, scale=1):
    return np.log(scipy.stats.norm.pdf(x, loc=loc, scale=scale))


def dlogp_func(x, loc=0, scale=1):
    return -(x - loc) / scale


def logp_dlogp_func(x, loc=0, scale=1):
    return logp_func(x, loc=loc, scale=scale), dlogp_func(x, loc=loc, scale=scale)


trace, stats = lmc.sample(logp_dlogp_func=logp_dlogp_func, model_ndim=1)

Please provide the full traceback.

Click to expand...
Traceback (most recent call last):-----------------------------------------------------------------| 0.00% [0/8000 00:00<00:00 Sampling 4 chains, 0 divergences]
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    old_handlers)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    run_name="__mp_main__")
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 263, in run_path
    old_handlers)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    pkg_name=pkg_name, script_name=fname)
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 96, in _run_module_code
Traceback (most recent call last):
    prepare(preparation_data)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
    mod_name, mod_spec, pkg_name, script_name)
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    old_handlers)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    run_name="__mp_main__")
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 263, in run_path
    exec(code, run_globals)
  File "/Users/george/littlemcmc/tests/foo.py", line 18, in <module>
    code = spawn._main(child_r)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    trace, stats = lmc.sample(logp_dlogp_func=logp_dlogp_func, model_ndim=1)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 188, in sample
    prepare(preparation_data)
    pkg_name=pkg_name, script_name=fname)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    exec(code, run_globals)
  File "/Users/george/littlemcmc/tests/foo.py", line 18, in <module>
    run_name="__mp_main__")
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 263, in run_path
    trace, stats = lmc.sample(logp_dlogp_func=logp_dlogp_func, model_ndim=1)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 188, in sample
    traces, stats = _mp_sample(**sample_args, **parallel_args)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 300, in _mp_sample
    traces, stats = _mp_sample(**sample_args, **parallel_args)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 300, in _mp_sample
    pkg_name=pkg_name, script_name=fname)
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 96, in _run_module_code
    pickle_backend=pickle_backend,
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in __init__
    mod_name, mod_spec, pkg_name, script_name)
    pickle_backend=pickle_backend,
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in __init__
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/george/littlemcmc/tests/foo.py", line 18, in <module>
    trace, stats = lmc.sample(logp_dlogp_func=logp_dlogp_func, model_ndim=1)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 188, in sample
    traces, stats = _mp_sample(**sample_args, **parallel_args)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 300, in _mp_sample
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in <listcomp>
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in <listcomp>
    pickle_backend=pickle_backend,
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in __init__
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 278, in __init__
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 278, in __init__
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in <listcomp>
    self._process.start()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._process.start()
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 278, in __init__
    self._popen = self._Popen(self)
    self._process.start()
    self._popen = self._Popen(self)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
    self._popen = self._Popen(self)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
    return Popen(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
    return Popen(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
    super().__init__(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    super().__init__(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 42, in _launch
    self._launch(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 42, in _launch
    self._launch(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
    _check_not_importing_main()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    old_handlers)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Users/george/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/george/littlemcmc/tests/foo.py", line 18, in <module>
    trace, stats = lmc.sample(logp_dlogp_func=logp_dlogp_func, model_ndim=1)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 188, in sample
    traces, stats = _mp_sample(**sample_args, **parallel_args)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 300, in _mp_sample
    pickle_backend=pickle_backend,
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in __init__
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 433, in <listcomp>
    for chain, seed, start in zip(range(chains), seeds, start_points)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 278, in __init__
    self._process.start()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "tests/foo.py", line 18, in <module>
    trace, stats = lmc.sample(logp_dlogp_func=logp_dlogp_func, model_ndim=1)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 188, in sample
    traces, stats = _mp_sample(**sample_args, **parallel_args)
  File "/Users/george/littlemcmc/littlemcmc/sampling.py", line 305, in _mp_sample
    for draw in sampler:
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 469, in __iter__
    draw = ProcessAdapter.recv_draw(self._active)
  File "/Users/george/littlemcmc/littlemcmc/parallel_sampling.py", line 337, in recv_draw
    msg = ready[0].recv()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/Users/george/miniconda3/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

Please provide any additional information below.

  • I assume the error is caused by the recv but the other end is already closed.
  • If I wrap the code in an if __name__ == "__main__", it runs fine.
  • This code also runs fine on Linux.

Environment

  • LittleMCMC version: from source
  • Python version: 3.7
  • Operating system: macOS

Remove `report.py`

report.py is mainly used for diagnostic warnings, which is less of a concern here: we simply return all the sampler statistics to the user. We can probably safely remove it.

Support for discrete variables in Hamiltonian Monte Carlo sampler

Ive read in the documentation that this library only supports continuous variables. Any thoughts on adding support for loglikelihood functions that have a mixture of continuous and discrete parameters? The following papers seem to be addressing how to build a sampler in such a case:
Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Continuous Relaxations for DiscreteHamiltonian Monte Carlo

Auxiliary-variable Exact Hamiltonian MonteCarlo Samplers for Binary Distributions

Improve documentation

  • Progress bar does not render well in rst; remove it.
  • Add sample function docstring.
  • Include some preamble on why LittleMCMC is necessary at all? Perhaps in a developer guide?
  • Add more meat to docs/index.rst

Consider supporting deterministic variables?

a.k.a. generated quantities in Stan. It would be nice to support them natively somehow (i.e. along with the multiprocessing), and return them inside the trace object. We should look at how PyMC3 handles deterministics.

Improve documentation

  • Describe the necessary properties of the logp_dlogp_func
    • Shape of outputs must be same as shapes of inputs
    • Pickleable
    • Variables must be on R (i.e. unconstrained)
  • Update demo notebook with cores > 1, to demonstrate multiprocessing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.