GithubHelp home page GithubHelp logo

turinglang / turing.jl Goto Github PK

View Code? Open in Web Editor NEW
2.0K 50.0 213.0 33.31 MB

Bayesian inference with probabilistic programming.

Home Page: https://turinglang.org

License: MIT License

Julia 100.00%
machine-learning probabilistic-programming julia-language artificial-intelligence bayesian-inference hamiltonian-monte-carlo turing bayesian-statistics mcmc hmc

turing.jl's People

Contributors

adscib avatar azev77 avatar cpfiffer avatar devmotion avatar emilemathieu avatar evsmithx avatar fredericwantiez avatar github-actions[bot] avatar harrisonwilde avatar hessammehr avatar hsm207 avatar itsdfish avatar jaimerzp avatar kdr2 avatar luiarthur avatar mohamed82008 avatar palday avatar phipsgabler avatar pitmonticone avatar rdiaz02 avatar red-portal avatar rikhuijzer avatar saranjeetkaur avatar sunxd3 avatar tomroesch avatar torfjelde avatar trappmartin avatar willtebbutt avatar xukai92 avatar yebai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

turing.jl's Issues

Standardised output for inference algorithms

The current form for reporting results is a bit arbitrary. It makes sense to standardise output and report results in the form of joint posterior.

FYI. The Mamba.jl package has a format for reporting Monte Carlo results. It is also the default format for Stan.jl. But we might need to adapt this format to store sample weights.

Re-organise/re-structure the source code

We need a plan for better organisation of the source code. We can collect issues regarding this here.

For example, code in /src/core/intrinsic.jl seems to be only relevant to samplers (and a bit messy). It would be easy to maintain if we move it to /src/samplers/ (or sub-folder).

@assume with Binomial fails

@assume z ~ Binomial() gives LoadError: BoundsError: attempt to access 1-element Array{Any,1} at index [2]

Plan for adding comments

We really need better comment practice for our source code. I guess we can start to improve it by assigning those source code to who wrote it.

Note on how to comment

Using """ to add comments before functions so that they can be captured by Base.Docs.binding (more practice see documentation)

This style of documentation can be also easily toggled in Juno/Atom, which would potentially help future development a lot.

We may also refer to other good open source projects, e.g. normal.jl, in which the comments contains three parts 1) description 2) example code 3) external link.

Progress

Please tick once the file is well commented (file list is from the development branch).

/

  • Turing.jl

core/

  • compiler.jl
  • io.jl

samplers/

  • hmc.jl
  • is.jl
  • pgibbs.jl
  • smc.jl

trace/

  • tarray.jl

Support truncated distribution.

The current HMC sampler does not work with truncated distribution. For example, the following example would throw an error

negbindata = [0, 1, 4, 0, 2, 2, 5, 0, 1]

@model negbinmodel(y) begin
  local α, β
  α ~ Truncated(Cauchy(0,10), 0, +Inf)
  β ~ Truncated(Cauchy(0,10), 0, +Inf)
  for i = 1:length(y)
    y[i] ~ NegativeBinomial(α, β)  # α > 0, 0 < β < 1
  end
  return(α, β)
end

@sample(negbinmodel(negbindata), HMC(1000, 0.02, 1))

ERROR MESSAGE:

MethodError: no method matching pdf(::Distributions.Truncated{Distributions.Cauchy{Float64},Distributions.Continuous}, ::ForwardDiff.Dual{0,Float64})
Closest candidates are:
  pdf(::Distributions.Distribution{Distributions.Univariate,Distributions.Continuous}, ::Real) at /Users/yebai/.julia/v0.5/Distributions/src/univariates.jl:102
  pdf(!Matched::Distributions.Kolmogorov, ::Real) at /Users/yebai/.julia/v0.5/Distributions/src/univariate/continuous/kolmogorov.jl:74
  pdf(!Matched::Distributions.Arcsine{T<:Real}, ::Real) at /Users/yebai/.julia/v0.5/Distributions/src/univariate/continuous/arcsine.jl:74
  ...
 in logpdf(::Distributions.Truncated{Distributions.Cauchy{Float64},Distributions.Continuous}, ::ForwardDiff.Dual{0,Float64}, ::Bool) at transform.jl:39
 in assume(::Turing.HMCSampler{Turing.HMC}, ::Distributions.Truncated{Distributions.Cauchy{Float64},Distributions.Continuous}, ::Turing.Var, ::Turing.VarInfo) at hmc.jl:154
 in macro expansion at compiler.jl:10 [inlined]
 in negbinmodel(::Dict{Any,Any}, ::Turing.VarInfo, ::Turing.HMCSampler{Turing.HMC}) at compiler.jl:21
 in step(::#negbinmodel, ::Dict{Any,Any}, ::Turing.HMCSampler{Turing.HMC}, ::Turing.VarInfo, ::Bool) at hmc.jl:59
 in run(::Function, ::Dict{Any,Any}, ::Turing.HMCSampler{Turing.HMC}) at hmc.jl:109
 in sample(::Function, ::Dict{Any,Any}, ::Turing.HMC) at hmc.jl:184
 in macro expansion; at compiler.jl:295 [inlined]
 in anonymous at <missing>:?
 in include_string(::String, ::String) at loading.jl:441
 in include_string(::String, ::String, ::Int64) at eval.jl:28
 in include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N}) at eval.jl:32
 in (::Atom.##53#56{String,Int64,String})() at eval.jl:40
 in withpath(::Atom.##53#56{String,Int64,String}, ::String) at utils.jl:30
 in withpath(::Function, ::String) at eval.jl:46
 in macro expansion at eval.jl:57 [inlined]
 in (::Atom.##52#55{Dict{String,Any}})() at task.jl:60

Better gradient interface

It would be nice if we can have a user-friendly interface for automatic differentiation, e.g.

@model gdemo(x) = begin   
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  for i=1:length(x)
    x[i] ~ Normal(m, sqrt(s)) 
  end
  return(s, m, x)
end

g  = @gradient(gdemo([2.0, 3.0]), varInfo = nothing)

By default, the user does not need to pass in varInfo (i.e. differentiate through all parameters).

This would be very helpful for users who want to contribute new gradient-based inference methods.

Improving the HMC sampler

Here is my plan for the improvement of the HMC sampler (with priorities from high to low)

Progress:

  • Replace dDistribution with Distribution
    - [ ] Replay detection for each prior variable
    - [ ] Implement NUTS
    - [ ] Implement the reverse mode of AD
  • Separate gradient computation from the actual HMC sampler. This makes it easier to use gradients for other purposes, e.g. sharing code between HMC and NUTS.
  • Support matrix type random variables
    - [ ] Prepare HMC sampler for compositional inference interface, see #55
  • Support variables with constrained support (e.g. the Beta distribution).

I've already looked into how to do the first one. I've check the Distribution package and see most of the distributions are supporting Dual now, expect some Kolmogorov-related distributions which I'm not even familiar with. I believe this one can be done soon.

Improve test coverage

Out test coverage is 68% for the development branch at this issues is post, and I am aiming to improve it to over 90%. There are fews things need to be known for all.

  • Test coverage reaches 90%

Keep runtest.jl structured

Please check the new runtest.jl file. When adding new tests in the future, keep it structured using the current way so that we can tell which source file we are testing for each test case.

Review of old test cases

I noticed some comments in them are out-of-date. @yebai Can you also have a check and remove unnecessary ones.

Add new tests

Clearly there are some source files not having specific test files. I guess we can add more along with adding more commands ( #29 ), which is not actively in process.

MCMC output

The current form for reporting MCMC output is a bit arbitrary. It makes sense to standardise the output and report results in the form of joint posterior.

It's also helpful to implement some utility functions for the Chain type for inspecting samples from various MCMC samplers, e.g.:

IAT: integrated autocorrelation time: http://search.r-project.org/library/LaplacesDemon/html/IAT.html
ESS: effective sample size: http://search.r-project.org/library/LaplacesDemon/html/ESS.html
MCSE: Monte Carlo standard error: http://search.r-project.org/library/LaplacesDemon/html/MCSE.html

PS. The Mamba.jl package has a format for reporting Monte Carlo results. It is also the default format for Stan.jl. But we might need to adapt this format to store sample weights.

New Interface Design

We have some undecided interface issues since the introduction of the Hamiltonian Monte Carlo (HMC) sampler. More specifically, we need an interface to tell the HMC sampler which variable it should handle. In addition, we also need a way to compose samplers (e.g. Gibbs sampling subsets of variables with different samplers). To solve these issues altogether, I propose the following new interface.

##  Model definition
gaussdemo() =  begin  
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  for i=1:length(x)
    x[i] ~ Normal(m, sqrt(s))
  end
  return(s, m, x)
end

# Standard sampler interface
res = sample(gaussdemo, data = Dict(:x=>[1 2]), HMC(..., :s, :m))

# Compositional inference interface
res = sample(gaussdemo, data = Dict(:x=>[1 2]), HMC(..., :m),  PG(..., :s))

# Sampler's default behavior is to sample all random variables
res = sample(gaussdemo, data = Dict(:x=>[1 2]), HMC(...))

# Tagging parameters for samplers like PMC, SMC2
res = sample(gaussdemo, data = Dict(:x=>[1 2]), PMC(..., params=[:m]))

# Running program without a sampler produces a draw from the prior
s, m = sample(gaussdemo)
s, m = gaussdemo() 

In summary, first, this design removed macros @assume, @observe and @predict. Instead, it introduces additional parameters to each sampler to distinguish different types of variables. Second, this design introduces an interface to compose samplers (e.g. HMC and PG).

julia-config error on Ubuntu 16.04

On Ubuntu 16.04, starting with a fresh Julia install, I'm getting the following error. Could there be an undocumented dependency?

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> Pkg.update
update (generic function with 1 method)

julia> Pkg.update()
INFO: Initializing package repository /home/chad/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
INFO: Updating METADATA...
INFO: Computing changes...
INFO: No packages to install, update or remove

julia> Pkg.add("Turing")
INFO: Cloning cache of ArrayViews from git://github.com/JuliaLang/ArrayViews.jl.git
INFO: Cloning cache of Compat from git://github.com/JuliaLang/Compat.jl.git
INFO: Cloning cache of ConjugatePriors from git://github.com/JuliaStats/ConjugatePriors.jl.git
INFO: Cloning cache of Distributions from git://github.com/JuliaStats/Distributions.jl.git
INFO: Cloning cache of PDMats from git://github.com/JuliaStats/PDMats.jl.git
INFO: Cloning cache of StatsBase from git://github.com/JuliaStats/StatsBase.jl.git
INFO: Cloning cache of StatsFuns from git://github.com/JuliaStats/StatsFuns.jl.git
INFO: Cloning cache of Turing from git://github.com/yebai/Turing.jl.git
INFO: Installing ArrayViews v0.6.4
INFO: Installing Compat v0.7.14
INFO: Installing ConjugatePriors v0.1.2
INFO: Installing Distributions v0.8.9
INFO: Installing PDMats v0.3.6
INFO: Installing StatsBase v0.8.0
INFO: Installing StatsFuns v0.2.0
INFO: Installing Turing v0.0.1
INFO: Building Turing
ERROR: could not open file /usr/share/julia/julia-config.jl
ERROR: could not open file /usr/share/julia/julia-config.jl
ERROR: could not open file /usr/share/julia/julia-config.jl
gcc  -O2 -shared -fPIC task.c   -o libtask.so
task.c:6:19: fatal error: julia.h: No such file or directory
compilation terminated.
Makefile:9: recipe for target 'task' failed
make: *** [task] Error 1
===============================[ ERROR: Turing ]================================

LoadError: failed process: Process(`make`, ProcessExited(2)) [2]
while loading /home/chad/.julia/v0.4/Turing/deps/build.jl, in expression starting on line 1

================================================================================

================================[ BUILD ERRORS ]================================

WARNING: Turing had build errors.

 - packages with build errors remain installed in /home/chad/.julia/v0.4
 - build the package(s) and all dependencies with `Pkg.build("Turing")`
 - build a single package by running its `deps/build.jl` script

================================================================================
INFO: Package database updated

More tests for Gibbs constructor

We should have a list of tests for the Gibbs constructor. The current implementation is a bit fragile and incomplete. For example, we should at least test the following cases:

  1. Gibbs(HMC(..., :s, :m))
  2. Gibbs(PG(...,:s, :m))
  3. Gibbs(HMC(...,:s), PG(:s))
  4. Gibbs(PG(:s), HMC(:m))

Stochastic Gradient Hamiltonian Monte Carlo

Reference

Chen, T., Fox, E. B., & Guestrin, C. (2014, February 17). Stochastic Gradient Hamiltonian Monte Carlo. arXiv.org.

The proposal below is naive.

The Stochastic HMC can use the current code with only making some changes to observe():

1. Count the total number of observations in the first run of the model
2. For later iterations
i. Choose a subset of observations for each iteration (according to obs_frac (observation fraction) set by user)
ii. Only accumulate logpdf of those chosen observations (Note: the subset is fixed for each HMC iteration)

This two changes are done in this branch: https://github.com/yebai/Turing.jl/tree/stochhmc and some simple tests are done, which initially looks fine.

Followed our discussion on correlation last time, I think of using the following way to generate correlated observation sets:

1. User set a correlation ratio corr_rate
2. When generating new observation subset, fix a fraction of observations according to corr_rate and sample the remaining

Expected problem: when obs_frac is large, we would not able to do so. E.g.

We have 10 obs and obs_frac is 0.8. Say our first subset is [1,2,3,4,5,6,7,8].

User pick corr_rate to be 0.125. Then we need to fix one obs, say [1] and sample 7 obs from [9, 10], which is impossible.

ddistribution bug

The HMC branch introduces a bug to the compiler, basically the assume macro converts all distributions to ddistribution type. This causes problems for distributions without corresponding distribution type.

More tests for the Gibbs sampler

Maybe we should implement some models from the Stan examples repo and compare Stan's result with Turing's result. This would allow us more systematically test our Gibbs sampler and provide some insights to the efficiency of current code.

For a start, we could implement some models in the basic_estimator folder and misc folder?

Summarise `Chain` in `io.jl`.

The current Chain type dump all fields to the console by default. A potential better way is to Summarise the results, e.g. print the mean and variance. This can be done by using the weighted mean function etc in the StatsBase package.

Remove predict

This is a suggestion from Brooks, which I support, to remove @predict in favour of standard return. It accomplishes the same goal without introducing new syntax. Also the experience of Anglican team indicates that people often associate @predict with the posterior predictive distribution, which is clearly not what it is.

Questions

Hello,

Great package. I have some questions if anyone has a second.

  1. Is there a posterior predictive check method?
  2. Can the random variables be used in any arbitrary julia function?
  3. Are there plans for variational inference?
  4. Have you done any benchmarking vs stan?
  5. How does stochastic control flow work from the user perspective?

Thanks!

Implementation of the HMC sampler

I'm gonna put issues about implementing the HMC sampler here.

One of them is about the compiler. I wonder why we convert the @observe x ~ Distribution(params) to the logpdf() statement in the compiler level instead of doing this inside the sampler?

It makes me hard to pass Dual type to my custom distribution wrapper in order to get the gradient information.

Do you think I need the amend the compiler or there is any other approach? @yebai

Refactoring code: Unify VarInfo, Trace, TaskStorage, Chain.

At the moment, we have various data structures that stores intermediate (MCMC) sampling states. It probably makes sense to unify these data structures.

(I don't have a complete idea how to do this yet, but let's start thinking about this issue and post ideas here.)

Bug: running gdemo (README example) produces error

I met the following error when running the gdemo example in the README.md file.

ERROR: KeyError: key :modelarglist not found
 in getindex; at ./dict.jl:688 [inlined]
 in macro expansion; at /Users/yebai/.julia/v0.5/Turing/src/core/compiler.jl:290 [inlined]
 in anonymous at ./<missing>:?

Prepare development branch for Julia 0.5

There are some changes in the master branch that prepares Turing for Julia 0.5. Since we are shifting to Julia 0.5 gradually, it makes sense to merge changes in the master branch back to development branch.

Automated benchmarking for Gibbs, HMC

We need to write some automated benchmarking scripts for Turing. These benchmarking scripts can include both running time and predictive performance comparisons. We can also compare to other systems like Anglican, LibBi.

Removing isdefined

We can remove isdefined() using the fact that users are only allowed to use TArray in @model.

Enable automated test for Windows

We've enabled automated test for Unix and Mac. It makes sense to also enable automated test for Windows. An example configuration file can be found at link.

Temporary fix for package conficts

This is a remainder that 076844d contains some temp fix in

  • appveyor
  • travis

in order to fix package conflicts when using the newest version of Distributions.jl.

We need to remove these stuff when there are new releases of relevant packages.

Fix `return` bug

@xukai92 The current compiler does not handle models without explicit return statement. Could we return all variables by default when return is absent?

Roadmap for v0.3

  • NUTS implementation #188
  • HMC: Transforms of ϵ for each variable #67 (replace with introducing mass matrix)
  • Re-define ~: macro => function #173 (postponed to 0.4)
  • Finish: Sampler (internal) interface design #107
  • Substantially improve performance of HMC and Gibbs #7
  • Remove obsolete randoc, randc? #156
  • Maximum a posteriori (MAP) inference. #72
  • Compositional MCMC: partial refreshment of the momentum variable #189
  • Allowing user to change the compiler between safe mode and fast mode
  • Support truncated distribution. #87
  • Refactoring code: Unify VarInfo, Trace, TaskLocalStorage #96
  • Refactoring code: Better gradient interface #97
  • Support sampling from model in model #86

Updated model interface: add parameters support.

The current interface pass in variables through a dictionary. This design requires us implicitly copy all items in the data dictionary to model's scope, which is not the most consistent design with Julia's function interface. A possible better design would be:

@model gdemo(x) = begin   
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  for i=1:length(x)
    x[i] ~ Normal(m, sqrt(s)) 
  end
  return(s, m, x)
end

x = [2.0, 3.0]
sampler = Gibbs(HMC(..., :m), PG(..., :s))
chn     = @sample(gdemo(x), sampler)

The most important changes are:

  1. Allow declaring parameters in model macro.
  2. Support = in model declaration, e.g. gdemo(x) = statement_block. We might want to depreciate the syntax without = in the future (i.e. gdemo(x) statement_block) to stay in line with Julia's assignment form of function definitions.
  3. The twiddle macro will determine the type of left-hand-side (LHS) variable through:
    - data: LHS variable can be found in model's parameter list.
    - parameter: otherwise.

EDIT (28 FEB 2017): removed obsolete comment.

Further work for the Gibbs PR

Discussion

  • we need a consistent output of model (i.e. return) and storage of predicts
  • we need to discuss the consistent way of using task.storage

Enhancement

  • the naming issue of x[i,j,k] and x[i,j][k]
  • allow user to step how many steps each sampler in Gibbs sampling is run
  • the current implementation has a bad performance (speed)
  • unify the way of storing and producing variables

The hierarchy of samplers

I wonder if we should create a better hierarchy for samplers. The current was seems a bit arbitrary: we use ParticleSampler in the default sample function and overload some in their own place (IS and HMC). Also, when I split AD from HMC, I put the type of sampler it can accept as GradientSampler, which could be meaningful for future work.

Shall we try to design something like this:

abstract Sampler{T<:InferenceAlgorithm}
abstract ParticleSampler{T} <: Sampler{T}
abstract GradientSampler{T} <: Sampler{T}
type ImportanceSampler{IS} <: ParticleSampler{IS}
  ...
end
type SMCSampler{SMC} <: ParticleSampler{SMC}
  ...
end
type PGSampler{PG} <: ParticleSampler{PG}
  ...
end
type HMCSampler{HMC} <: GradientSampler{T}
  ...
end

Tests not pass for Julia v0.5

Tests do not pass for v0.5.

I looked into the errors and also did the test on my machine, finding that the SMC and PG engines seem to be broken. I guess it may be resulted from the update of task.c ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.