Comments (12)
How many parameters are we talking? Continuous/discrete? Bounded between? What's the objective of the optimization?
from advancedvi.jl.
I've actually used Hyperopt.jl for hyper-param optimization for VI before with, uhmm, mixed success:) I feel like Hyperopt.jl has a nice interface for this already, though maybe it would be neat to have a wrapper around that to make things even easier for the user.
from advancedvi.jl.
How many parameters are we talking? Continuous/discrete? Bounded between? What's the objective of the optimization?
I don't have a specific setting. Just parameters for which the objective is differentiable. Usually in a VI problem, one would optimize the variational parameters, AVI.jl does this already, but also optimize hyper-parameters given the ELBO (if I remember correctly it's called ML-II optimization). Basically run grad. descent on this parameters by taking the derivative given the ELBO.
Typically for a GP derive the ELBO given the kernel parameters and do a grad. desc..
Hyperopt.jl seems to provide a solution in terms of sampling/grid search which is a bit different of what I was thinking about.
from advancedvi.jl.
Ah, yeah so when I was thinking of hyper-parameters I was thinking of parameters "not part of the model but part of the optimization procedure", while it seems like you're referring to parameters that is part of the model, e.g. choice of scale for your kernel, but that you for some reason have chosen not to put a prior on?
from advancedvi.jl.
Ah, yeah so when I was thinking of hyper-parameters I was thinking of parameters "not part of the model but part of the optimization procedure", while it seems like you're referring to parameters that is part of the model, e.g. choice of scale for your kernel, but that you for some reason have chosen not to put a prior on?
Exactly! And even if I did put a prior I would probably be only interested in the MAP
from advancedvi.jl.
Also the complicated part is that one would alternate between variational parameter updates and hyper-parameter updates
from advancedvi.jl.
So right now the assumptions are that logπ
only accepts a sample from q
and return a value.
How about assuming instead logπ(z, args...)
and passing these args
to the optimize!
function.
One could simply add keywords arguments hyperparams
to pass an array of hyperparameters and eventually a different optimizer.
from advancedvi.jl.
Or make it a nested thing where hyperlogπ(z, hyperparams)
returns a logπ(z)
function. I guess it would make it more compatible with Turing model structure
from advancedvi.jl.
Hey I got a nice working version of it. Here is my proposition:
So far the model is
- Give a
$$\log\pi$$ function taking samples$$\theta$$ as arguments
What I propose to work with hyperparameters
-
$$\log\pi$$ is either- A function taking samples
$$\theta$$ as argument (as above). This is case is inferred by having a kw argumenthyperparam
set to nothing - A function taking an array of hyperparameters as argument and which will return itself another function taking samples as arguments
- A function taking samples
Here is an example of the second function for a Gaussian Process Regression example
function logpi(hp)
k = hp[1] * SqExponentialKernel()
K = kernelmatrix(k, X)
prior_gp = MvNormal(K)
return f -> logpdf(Normal(0, hp[2]), f - y) + logpdf(prior_gp, f)
end
Then during training optimization is alternated between variational updates and hyperparameters updates.
If that sounds reasonable I can make a PR for it
from advancedvi.jl.
So I've given this and similar issues (e.g. callbacks) a bit of thought.
I definitively like the the idea, but I think there can be cases where you might want to, e.g. optimize hyper-parameters several times in a row for a given theta
, no?
All in all, I think the goal should be to instead make it easy to write the optimization-loop yourself, instead of adding a bunch of functionality to the vi
call. The reason for this is because it's going to be difficult to cover all the wanted use-cases, e.g.
- callbacks: what arguments do you pass to the callback?
- hyper-parameters: how many steps do you take for the hyper-parameters, and how many do you take for the parameters? are they continuous or discrete?
IMO, vi
should be a call that can do the simplest of things, and then if you want custom behavior, e.g. optimizing hyper-parameters, you just write your own training-loop. Even now it's pretty straight-forward I think:
converged = false
step = 1
prog = ProgressMeter.Progress(num_steps, 1)
diff_results_θ = DiffResults.GradientResult(θ_init)
diff_results_hypers = DiffResults.GradientResult(hypers_init)
while (step ≤ num_steps) && !converged
# 1. Compute gradient and objective value.
getq(θ) = ApproximateDistribution(hypers, θ)
AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results_θ, kwargs...)
# 2. Extract gradient from `diff_result`
∇ = DiffResults.gradient(diff_result_θ)
# 3. Apply optimizer, e.g. multiplying by step-size
Δ = apply!(optimizer, θ, ∇)
# 4. Update parameters
@. θ = θ - Δ
# 5. [OPTIONAL] Update hyperparameters (using gradient computation)
getq(hypers) = ApproximateDistribution(hypers, θ)
AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results_hypers, kwargs...)
∇_hypers = DiffResults.gradient(diff_result_hypers)
Δ_hypers = apply!(optimizer_hypers, hypers, ∇_hypers)
@. hypers .= hypers - Δ_hypers
# 6. Do whatever analysis you want
callback(args...)
# 7. Update
converged = hasconverged(variational_objective, alg, q, model) # or something user-defined
step += 1
ProgressMeter.next!(prog)
end
I think if we make this very clear in the documentation and show simple examples, people should be able to do exactly what they want.
Now, you could argue that we could do both, and that's fair. But 1) I think we first need to nail down everything before we start adding convenience-methods on top of vi
, 2) it's important to ensure that if we do that, we don't make things too "noise" in the sense that if I look at the docstring of vi
and I see a buuuunch of options, most of which I don't need, some which indicate that they can do what I want but then turns out that they can't do exactly what I need, etc. All in all, I don't think this is something we need to prioritize right now? 😕
Agree/disagree?
from advancedvi.jl.
Yep totally agree!
Having independence between var. params and hyper params optimization seems both more flexible and practical!
So I guess the aim would be to create an extra function for hyper parameter optimization that would call vi
inside?
My idea with having logpi() returning a function was that it was then extremely similar to the Turing API where you get your model once you call it on your data/hyperparameters.
from advancedvi.jl.
This could be done by cleverly specifying the variational family, once the reboot is deployed. Maybe I'll write an example that does that by then. For the time being, I'll close the issue. Please re-open and remind me if I forget to do this in the future!
from advancedvi.jl.
Related Issues (20)
- custom training loop implementation help HOT 5
- TagBot trigger issue HOT 13
- Zygote should be preferred to ReverseDiff for reverse mode default HOT 9
- Rethinking AdvancedVI HOT 19
- VI+PSIS HOT 1
- Pathfinder HOT 2
- Both Bijectors and Distributions export "Distribution" HOT 1
- Missing API method HOT 3
- SVGD HOT 2
- Minibatches HOT 3
- Setting up Documenter
- Need a weighted loss function/ log likelihood HOT 1
- Question/feature request about amortized inference HOT 5
- Add Tapir to AD tests HOT 26
- Set up `JuliaFormatter` HOT 6
- Set up unit tests for GPU support
- Run benchmarking only on PRs HOT 3
- `var` and `cov` on `MvLocationScale` secretly assume the base distribution is standardized
- Where are the benchmarks posted? HOT 6
- Make use of DifferentiationInterface.jl? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from advancedvi.jl.