turinglang / advancedvi.jl Goto Github PK

View Code? Open in Web Editor NEW

78.0 78.0 17.0 14.69 MB

Implementation of variational Bayes inference algorithms

Home Page: http://turinglang.org/AdvancedVI.jl/

License: MIT License

Julia 100.00%

advancedvi.jl's People

Contributors

Stargazers

Watchers

Forkers

theogf sivapvarma luiarthur wulpuqu devmotion mrchaos global-localhost global19 global19-atlassian-net jsahil730 wupeifan red-portal bauglir trappmartin arnauqb shravanngoswamii jamesonquinn

advancedvi.jl's Issues

Hyper-parameter optimization

Any suggestions on how to do optimization for hyper-parameters? That would be a very nice feature to have at some point.

Currently no license

Hi! I noticed there's no license on this project. I assume/hope the MIT license on the related Turing projects is applicable.

Callback function during training

It would nice to have the option to have a callback function callback passed as a keyword argument during training.

Missing API method

The API description has the following two methods for vi:

vi(model, alg)
vi(model, alg, q::VariationalPosterior, θ)

The first method is needed when the algorithm only supports one variational family (e.g. Pathfinder). However, there's no way to provide θ as in the second method. It would be useful if the API included vi(model, alg, θ)

Both Bijectors and Distributions export "Distribution"

I'll try to send in a PR to fix this.

WARNING: both Bijectors and Distributions export "Distribution"; uses of it in module AdvancedVI must be qualified
ERROR: LoadError: LoadError: UndefVarError: Distribution not defined
Stacktrace:
  [1] top-level scope
    @ ~/.julia/packages/AdvancedVI/yCVq7/src/AdvancedVI.jl:96
  [2] include
    @ ./Base.jl:386 [inlined]
  [3] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1050
  [4] require(uuidkey::Base.PkgId)
    @ Base ./loading.jl:914
  [5] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:901
  [6] include
    @ ./Base.jl:386 [inlined]
  [7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
    @ Base ./loading.jl:1213
  [8] top-level scope
    @ none:1
  [9] eval
    @ ./boot.jl:360 [inlined]
 [10] eval(x::Expr)
    @ Base.MainInclude ./client.jl:446
 [11] top-level scope
    @ none:1
in expression starting at /home/rik/.julia/packages/AdvancedVI/yCVq7/src/AdvancedVI.jl:1
in expression starting at /home/rik/.julia/packages/Turing/uMQmD/src/Turing.jl:1

Zygote should be preferred to ReverseDiff for reverse mode default

Since it is very typical for VI to repeatedly compute gradients and value of a given function it seems Zygote provide a 3x speedup over ReverseDiff. The reason is that apparently Zygote does some kind of caching.
Here is a quick benchmark:

using BenchmarkTools
using Distributions
using Zygote
using ReverseDiff

d = MvNormal(rand(50), rand(50, 50) |> x -> x * x')
f(x) = logpdf(d, x)
X = rand(d, 40)
@btime ReverseDiff.gradient($X) do x
    sum(f, eachcol(x))
end
# 15.941 ms (534899 allocations: 23.11 MiB)
@btime Zygote.gradient($X) do x
    sum(f, eachcol(x))
end
# 5.405 ms (22475 allocations: 4.51 MiB)

`var` and `cov` on `MvLocationScale` secretly assume the base distribution is standardized

The current way var and cov compute the (co)variance on a LocationScale is only valid when the base distribution has unit variance. Need to fix this.

Add Tapir to AD tests

Make use of DifferentiationInterface.jl?

Currently, there are extensions for each of the AD backends that this package supports. Would it be possibe to refactor to just make use of DifferentiationInterface, and let the user swap out the ADType so that we don't have to maintain this here?

Run benchmarking only on PRs

@shravanngoswamii In addition, we also need an option in NavBar bash script to skip a folder under gh-pages.

Pathfinder

Pathfinder algorithm has been discussed on the Slack quite a bit; I'm dropping a link here to help us keep track of it!

SVGD

Are there currently any plans to add SVGD to Turing.jl, and if so when is it expected?

Minibatches

Hey guys,

I'm working with a large dataset with a relatively large number of parameters (last-layer approximation for a neural network). Out-of-the-box VI is simply a non-starter here.

To perform parameter updates in mini-batches, is scaling the contribution of the minibatch to the log-likelihood the primary change?

Thanks for your work!

Double Stochasticity

ADVI and other methods (SVGD, etc) can treat double stochasticity (Stochastic estimation of the expectation via samples, Stochastic estimation of the log joint via mini batches) :

M. Titsias and M. Lázaro-Gredilla. Doubly stochastic variational Bayes for non-conjugate
inference.

Need a weighted loss function/ log likelihood

For imbalanced datasets a weighted loss function works better than either oversampling or under sampling. Can we add this feature in Turing.jl and AdvancedVI? The log likelihood function would require a weight term added for each class, and according to the number of samples in the training dataset, we would scale the corresponding log likelihood term by the inverse of the number of the samples for that class. I would be happy to contribute but can't find where is it located in the source code, could someone please help?

Weighted Loss: https://medium.com/gumgum-tech/handling-class-imbalance-by-introducing-sample-weighting-in-the-loss-function-3bdebd8203b4

Question/feature request about amortized inference

Hello,

I have been looking for a Julia package to perform amortized Bayesian inference on simulation-based models. In many areas of science, there are models with unknown/intractable likelihood functions. Unfortunately, Bayesian inference with these models have historically been difficult. Somewhat recently there has been progress in this area using a special type of normalizing flow. This method uses a neural network to optimize summary statistics for approximate Bayesian computation. The end result is very accurate amortized inference which can be used for any model, including those without a known likelihood function. Currently, this method is only implemented in a Python package called BayesFlow.

Given Turing's interest in machine learning and Bayesian inference, I was wondering whether there is interest in adding this method to the Turing ecosystem. I think it would add a lot of value to the community.

Where are the benchmarks posted?

Hi, it seems like the benchmarks are no longer being posted on turinglang.org/AdvancedVI.jl/benchmark. Was it moved to a different url? @yebai @shravanngoswamii

Improving mechanism for updating parameters of distributions

Current impl

We need to be able to map parameters to a new distribution with new parameters in a non-mutating manner. Currently there are two options for a user:

Provide the mapping θ ↦ D( ⋅ ; θ) directly as a function.
Overload AdvancedVI.update(d::MyDistribution, θ...).

Option (1)

From our perspective, (1) is of course the best since it's trivial to work with internally. One issue: it's easy for the user to do it "wrong", e.g. entire vi call becomes type-unstable becomes user closes over a global variable pointing to the distibution they want to update, or they do it in a mutation fashion, etc.

Option (2)

From a user-perspective it's generally nicer to work with (2), where the user essentially provides the initial parameters / variational approximation and then it's left to AdvancedVI to optimize. But there are a couple of reasons why this approach sort of sucks:

Different packages overloading in different ways will quickly lead to issues.
User essentially have to do it on a case-by-case basis (unless they come up with some clever and likely hacky solution). This is more of an "issue" with Distributions.jl though, i.e. there's no easy way (as far as I'm aware) to reconstruct an arbitrary distribution with a new set of parameters.

Possible solution (thanks to @devmotion)

Ref: TuringLang/Turing.jl#1377 (comment)

In AVI we define a default impl of update but instead of making the overload this method, we make them pass the method as an argument to vi and subsequent functions. E.g.

vi(model, alg) = vi(AdvancedVI.update, model, alg)
function vi(updater, model, alg)
    # Do as we do now, but replace `AdvancedVI.update` with `updater`
    ...
end

This way, packages depending on AVI (e.g. Turing.jl), can choose between making their own update-function and passing this into vi OR make use of a default impl AdvancedVI.update that we provide. And similarily, if the user knows what they're doing, they can provide the updater directly.

Issue: annoying to have to pass this updater around internally, but oh well.

Default impl

As a default impl I have the following in mind (though it's imo a bit nasty):

@generated function update(d::Distribution, θ...)
    return :($(esc(nameof(d)))(θ...))
end

EDIT: thanks to @devmotion, this is much improved:

# Ref: https://github.com/SciML/DiffEqBase.jl/blob/8e1a627d10ec40a112460860cd13948cc0532c63/src/utils.jl#L73-L75
using Base: typename

Base.@pure __parameterless_type(T) = typename(T).wrapper
parameterless_type(x) = parameterless_type(typeof(x))
parameterless_type(x::Type) = __parameterless_type(x)

AdvancedVI.update(d::Distribution, θ...) = parameterless_type(d)(θ...)

Ideally we'd have access to the fieldnames for each Distribution and θ would be a NamedTuple. Then we could have used merge to combine existing parameters already present in d and the new parameters θ. Unfortunately, such does not exist in Distributions.jl at the moment, hence we're left with the above hack 😕

Natural Gradients + Monte Carlo VI

There's been quite a bit of interesting work recently looking at natural gradients for variational inference with exponential family q-distributions, with non-conjugate / non-exponential family likelihoods / priors. See [1] (applied to GPs, but important bits aren't really GP-specific) and [2]. These turn out to be really quite straightforward to implement, so would be a great target for us. As a starting point, you could imagine extending our current mean field implementation to employ natural gradient descent in the parameters of the diagonal Gaussian q-distribution.

There's even work moving slightly beyond exponential family distributions now [3], but this is quite early work. Might be nice to have though.

[1] - Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. "Natural gradients in practice: Non-conjugate variational inference in Gaussian process models." arXiv preprint arXiv:1803.09151 (2018).
[2] - Khan, Mohammad Emtiyaz, and Didrik Nielsen. "Fast yet simple natural-gradient descent for variational inference in complex models." 2018 International Symposium on Information Theory and Its Applications (ISITA). IEEE, 2018.
[3] - Lin, Wu, Mohammad Emtiyaz Khan, and Mark Schmidt. "Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations." arXiv preprint arXiv:1906.02914 (2019).

VI+PSIS

I've implemented PSIS here, in case anyone is interested in adding VI+PSIS to Turing (the way it's been added to Stan). VI+PSIS should result in better inference, since VI+PSIS is theoretically consistent, unlike VI alone. More importantly, it provides a good diagnostic, the Pareto shape constant k, which lets the user know when VI+PSIS has failed -- large Pareto shape constants indicate the VI algorithm has failed to provide a good approximation to the posterior.

Let me know if you guys need any help.

custom training loop implementation help

Hi, I am trying to set up a custom training loop following the code at the bottom of the README, but I have not been able to make it work. I am relatively new to Julia, so I am probably not the best at debugging this stuff right now. Any pointers are appreciated!

Here is the code:

using Flux
using Turing, AdvancedVI, Distributions, DynamicPPL, StatsFuns, DiffResults
using Turing: Variational
using StatsBase

function vi_custom(model, q_init=nothing; n_mc, n_iter, tol, optimizer)
    varinfo = DynamicPPL.VarInfo(model)
    num_params = sum([size(varinfo.metadata[sym].vals, 1) for sym ∈ keys(varinfo.metadata)])
    logπ = Variational.make_logjoint(model)
    variational_objective = Variational.ELBO()
    alg = ADVI(n_mc, n_iter)
    # Set up q
    if isnothing(q_init)
        μ = randn(num_params)
        σ = StatsFuns.softplus.(randn(num_params))
    else
        μ, σs = StatsBase.params(q_init)
        σ = StatsFuns.invsoftplus.(σs)
    end
    θ = vcat(μ, σ)
    q = Variational.meanfield(model)
    converged = false
    step = 1
    diff_result = DiffResults.GradientResult(θ)
    while (step <= n_iter) && !converged
        # 1. Compute gradient and objective value; results are stored in `diff_results`
        AdvancedVI.grad!(variational_objective, alg, q, model, θ, diff_result)
        # 2. Extract gradient from `diff_result`
        ∇ = DiffResults.gradient(diff_result)
        # 3. Apply optimizer, e.g. multiplying by step-size
        Δ = apply!(optimizer, θ, ∇)
        # 4. Update parameters
        θ_prev = copy(θ)
        @. θ = θ - Δ
        # Check convergence
        converged = sqrt(sum((θ - θ_prev).^2)) < tol
        step += 1
    end
    return θ, step, q
end

@model norm(z) = begin
    s ~ InverseGamma(1, 1)
    μ ~ Normal(0, sqrt(s))
    # likelihood
    z .~ Normal(μ, sqrt(s))
end

z = rand(Normal(1., 2.), (200, 1));

θ, step, q = vi_custom(norm(z); n_mc=25, n_iter=20000, tol=0.01, optimizer = Flux.ADAM())

And here is the stacktrace and error I get:

ERROR: LoadError: MethodError: no method matching (::ELBO)(::ADVI{AdvancedVI.ForwardDiffAD{40}}, ::Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{ForwardDiff.Dual{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4},1},Array{ForwardDiff.Dual{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4},1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate}, ::Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}})
Closest candidates are:
  Any(::Any, ::Any, ::Any, ::Any; kwargs...) at /.julia/packages/AdvancedVI/PaSeO/src/objectives.jl:5
  Any(::AbstractRNG, ::VariationalInference, ::Any, ::Model, ::Any; weight, kwargs...) at /.julia/packages/Turing/3goIa/src/variational/VariationalInference.jl:57
Stacktrace:
 [1] (::AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}})(::Array{ForwardDiff.Dual{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4},1}) at /.julia/packages/AdvancedVI/PaSeO/src/AdvancedVI.jl:140
 [2] vector_mode_dual_eval(::AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}}, ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4,Array{ForwardDiff.Dual{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4},1}}) at /.julia/packages/ForwardDiff/sdToQ/src/apiutils.jl:37
 [3] vector_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}}, ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4,Array{ForwardDiff.Dual{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4},1}}) at /.julia/packages/ForwardDiff/sdToQ/src/gradient.jl:103
 [4] gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}}, ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4,Array{ForwardDiff.Dual{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4},1}}, ::Val{true}) at /.julia/packages/ForwardDiff/sdToQ/src/gradient.jl:35
 [5] gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}}, ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4,Array{ForwardDiff.Dual{ForwardDiff.Tag{AdvancedVI.var"#f#19"{ELBO,ADVI{AdvancedVI.ForwardDiffAD{40}},Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate},Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}},Tuple{}},Float64},Float64,4},1}}) at /.julia/packages/ForwardDiff/sdToQ/src/gradient.jl:33
 [6] grad!(::ELBO, ::ADVI{AdvancedVI.ForwardDiffAD{40}}, ::Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Stacked{Tuple{Bijectors.Exp{0},Identity{0}},2},Multivariate}, ::Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}}, ::Array{Float64,1}, ::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}) at /.julia/packages/AdvancedVI/PaSeO/src/AdvancedVI.jl:149
 [7] vi_custom(::Model{var"#33#34",(:z,),(),(),Tuple{Array{Float64,2}},Tuple{}}, ::Nothing; n_mc::Int64, n_iter::Int64, tol::Float64, optimizer::ADAM) at custom_training_loop.jl:27
 [8] top-level scope at custom_training_loop.jl:53
 [9] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088
in expression starting at custom_training_loop.jl:53

Rethinking AdvancedVI

Alright! It's time to seriously take care of AdvancedVI :D

Here are some of the things we talked about in the meeting back in October:

There should be two distinct methods of optimization when the variational distribution is given as a function (like update_q) or a distribution from which the parameters change.
Hyperparameter optimization should be nicely implemented, a proposition was :

    makelogπ(logπ, ::Nothing) = logπ
    makelogπ(logπ, hyperparams) = logπ(hyperparams)
    function vi(..., logπ; hyperparams = nothing)
        ...
        while not_converged
            logjoint = makelogπ(logπ, hyperparams)
            for i in 1:n_inner
                ...
            end
        end
    end

We should condensate the updates on the variational parameters via a more "atomic" step! function

And here are some more personal points (disclaimer: I will be happy to take care of these different points)

I don't think the current ELBO approach is good, the ELBO can always be splitted between an entropy term (depending only of the distribution) and an expectation term over the log joint. Most VI methods take advantage of this by computing the entropy gradient analytically (and smartly!), see "Doubly Stochastic Variational Inference" by Titias for instance. My proposition would be to split the gradient into two parts (grad_entropy + grad_expeclog), where one can specialize given the problem.
I would personally argue that update_q only makes sense with the current obsolete implementation using distributions with immutable fields like TuringMvNormal. See again Titsias using the reparametrization trick.

Setting up Documenter

We should consider setting up Documenter on the gh-pages branch to publish AdvancedMH docs on the new https://turing.ml/library/ site.

Set up `JuliaFormatter`

Stein Variational Gradient Descent (SVGD)

Liu, Qiang, and Dilin Wang. "Stein variational gradient descent: A general purpose bayesian inference algorithm." Advances in neural information processing systems. 2016. (https://papers.nips.cc/paper/6338-stein-variational-gradient-descent-a-general-purpose-bayesian-inference-algorithm)

I think both @wesselb and @aisopous have played around with implementing this in Julia :)

Set up unit tests for GPU support

See, e.g., https://github.com/compintell/Tapir.jl/edit/main/.buildkite/pipeline.yml

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!