Any suggestions on how to do optimization for hyper-parameters? That would be a very n

Hyper-parameter optimization about advancedvi.jl HOT 12 CLOSED

turinglang commented on September 14, 2024

Hyper-parameter optimization

from advancedvi.jl.

Comments (12)

mohamed82008 commented on September 14, 2024

How many parameters are we talking? Continuous/discrete? Bounded between? What's the objective of the optimization?

from advancedvi.jl.

torfjelde commented on September 14, 2024

I've actually used Hyperopt.jl for hyper-param optimization for VI before with, uhmm, mixed success:) I feel like Hyperopt.jl has a nice interface for this already, though maybe it would be neat to have a wrapper around that to make things even easier for the user.

from advancedvi.jl.

theogf commented on September 14, 2024

How many parameters are we talking? Continuous/discrete? Bounded between? What's the objective of the optimization?

I don't have a specific setting. Just parameters for which the objective is differentiable. Usually in a VI problem, one would optimize the variational parameters, AVI.jl does this already, but also optimize hyper-parameters given the ELBO (if I remember correctly it's called ML-II optimization). Basically run grad. descent on this parameters by taking the derivative given the ELBO.
Typically for a GP derive the ELBO given the kernel parameters and do a grad. desc..

Hyperopt.jl seems to provide a solution in terms of sampling/grid search which is a bit different of what I was thinking about.

from advancedvi.jl.

torfjelde commented on September 14, 2024

Ah, yeah so when I was thinking of hyper-parameters I was thinking of parameters "not part of the model but part of the optimization procedure", while it seems like you're referring to parameters that is part of the model, e.g. choice of scale for your kernel, but that you for some reason have chosen not to put a prior on?

from advancedvi.jl.

theogf commented on September 14, 2024

Ah, yeah so when I was thinking of hyper-parameters I was thinking of parameters "not part of the model but part of the optimization procedure", while it seems like you're referring to parameters that is part of the model, e.g. choice of scale for your kernel, but that you for some reason have chosen not to put a prior on?

Exactly! And even if I did put a prior I would probably be only interested in the MAP

from advancedvi.jl.

theogf commented on September 14, 2024

Also the complicated part is that one would alternate between variational parameter updates and hyper-parameter updates

from advancedvi.jl.

theogf commented on September 14, 2024

So right now the assumptions are that logπ only accepts a sample from q and return a value.
How about assuming instead logπ(z, args...) and passing these args to the optimize! function.
One could simply add keywords arguments hyperparams to pass an array of hyperparameters and eventually a different optimizer.

from advancedvi.jl.

theogf commented on September 14, 2024

Or make it a nested thing where hyperlogπ(z, hyperparams) returns a logπ(z) function. I guess it would make it more compatible with Turing model structure

from advancedvi.jl.

theogf commented on September 14, 2024

Hey I got a nice working version of it. Here is my proposition:
So far the model is

Give a $$\log\pi$$ function taking samples $$\theta$$ as arguments

What I propose to work with hyperparameters

$$\log\pi$$ is either
- A function taking samples $$\theta$$ as argument (as above). This is case is inferred by having a kw argument hyperparam set to nothing
- A function taking an array of hyperparameters as argument and which will return itself another function taking samples as arguments

Here is an example of the second function for a Gaussian Process Regression example

function logpi(hp)
  k = hp[1] * SqExponentialKernel()
  K = kernelmatrix(k, X)
  prior_gp = MvNormal(K)
  return f -> logpdf(Normal(0, hp[2]), f - y) + logpdf(prior_gp, f)
end

Then during training optimization is alternated between variational updates and hyperparameters updates.
If that sounds reasonable I can make a PR for it

from advancedvi.jl.

torfjelde commented on September 14, 2024

So I've given this and similar issues (e.g. callbacks) a bit of thought.

I definitively like the the idea, but I think there can be cases where you might want to, e.g. optimize hyper-parameters several times in a row for a given theta, no?

All in all, I think the goal should be to instead make it easy to write the optimization-loop yourself, instead of adding a bunch of functionality to the vi call. The reason for this is because it's going to be difficult to cover all the wanted use-cases, e.g.

callbacks: what arguments do you pass to the callback?
hyper-parameters: how many steps do you take for the hyper-parameters, and how many do you take for the parameters? are they continuous or discrete?

IMO, vi should be a call that can do the simplest of things, and then if you want custom behavior, e.g. optimizing hyper-parameters, you just write your own training-loop. Even now it's pretty straight-forward I think:

converged = false
step = 1

prog = ProgressMeter.Progress(num_steps, 1)

diff_results_θ = DiffResults.GradientResult(θ_init)
diff_results_hypers = DiffResults.GradientResult(hypers_init)

while (step ≤ num_steps) && !converged
    # 1. Compute gradient and objective value.
    getq(θ) = ApproximateDistribution(hypers, θ)
    AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results_θ, kwargs...)

    # 2. Extract gradient from `diff_result`
    ∇ = DiffResults.gradient(diff_result_θ)
    # 3. Apply optimizer, e.g. multiplying by step-size
    Δ = apply!(optimizer, θ, ∇)

    # 4. Update parameters
    @. θ = θ - Δ

    # 5. [OPTIONAL] Update hyperparameters (using gradient computation)
    getq(hypers) = ApproximateDistribution(hypers, θ)
    AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results_hypers, kwargs...)

    ∇_hypers = DiffResults.gradient(diff_result_hypers)
    Δ_hypers = apply!(optimizer_hypers, hypers, ∇_hypers)
    @. hypers .= hypers - Δ_hypers

    # 6. Do whatever analysis you want
    callback(args...)

    # 7. Update
    converged = hasconverged(variational_objective, alg, q, model) # or something user-defined
    step += 1

    ProgressMeter.next!(prog)
end

I think if we make this very clear in the documentation and show simple examples, people should be able to do exactly what they want.

Now, you could argue that we could do both, and that's fair. But 1) I think we first need to nail down everything before we start adding convenience-methods on top of vi, 2) it's important to ensure that if we do that, we don't make things too "noise" in the sense that if I look at the docstring of vi and I see a buuuunch of options, most of which I don't need, some which indicate that they can do what I want but then turns out that they can't do exactly what I need, etc. All in all, I don't think this is something we need to prioritize right now? 😕

Agree/disagree?

from advancedvi.jl.

theogf commented on September 14, 2024

Yep totally agree!
Having independence between var. params and hyper params optimization seems both more flexible and practical!
So I guess the aim would be to create an extra function for hyper parameter optimization that would call vi inside?

My idea with having logpi() returning a function was that it was then extremely similar to the Turing API where you get your model once you call it on your data/hyperparameters.

from advancedvi.jl.

Red-Portal commented on September 14, 2024

This could be done by cleverly specifying the variational family, once the reboot is deployed. Maybe I'll write an example that does that by then. For the time being, I'll close the issue. Please re-open and remind me if I forget to do this in the future!

from advancedvi.jl.

Hyper-parameter optimization about advancedvi.jl HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs