GithubHelp home page GithubHelp logo

Comments (12)

mohamed82008 avatar mohamed82008 commented on September 14, 2024

How many parameters are we talking? Continuous/discrete? Bounded between? What's the objective of the optimization?

from advancedvi.jl.

torfjelde avatar torfjelde commented on September 14, 2024

I've actually used Hyperopt.jl for hyper-param optimization for VI before with, uhmm, mixed success:) I feel like Hyperopt.jl has a nice interface for this already, though maybe it would be neat to have a wrapper around that to make things even easier for the user.

from advancedvi.jl.

theogf avatar theogf commented on September 14, 2024

How many parameters are we talking? Continuous/discrete? Bounded between? What's the objective of the optimization?

I don't have a specific setting. Just parameters for which the objective is differentiable. Usually in a VI problem, one would optimize the variational parameters, AVI.jl does this already, but also optimize hyper-parameters given the ELBO (if I remember correctly it's called ML-II optimization). Basically run grad. descent on this parameters by taking the derivative given the ELBO.
Typically for a GP derive the ELBO given the kernel parameters and do a grad. desc..

Hyperopt.jl seems to provide a solution in terms of sampling/grid search which is a bit different of what I was thinking about.

from advancedvi.jl.

torfjelde avatar torfjelde commented on September 14, 2024

Ah, yeah so when I was thinking of hyper-parameters I was thinking of parameters "not part of the model but part of the optimization procedure", while it seems like you're referring to parameters that is part of the model, e.g. choice of scale for your kernel, but that you for some reason have chosen not to put a prior on?

from advancedvi.jl.

theogf avatar theogf commented on September 14, 2024

Ah, yeah so when I was thinking of hyper-parameters I was thinking of parameters "not part of the model but part of the optimization procedure", while it seems like you're referring to parameters that is part of the model, e.g. choice of scale for your kernel, but that you for some reason have chosen not to put a prior on?

Exactly! And even if I did put a prior I would probably be only interested in the MAP

from advancedvi.jl.

theogf avatar theogf commented on September 14, 2024

Also the complicated part is that one would alternate between variational parameter updates and hyper-parameter updates

from advancedvi.jl.

theogf avatar theogf commented on September 14, 2024

So right now the assumptions are that logπ only accepts a sample from q and return a value.
How about assuming instead logπ(z, args...) and passing these args to the optimize! function.
One could simply add keywords arguments hyperparams to pass an array of hyperparameters and eventually a different optimizer.

from advancedvi.jl.

theogf avatar theogf commented on September 14, 2024

Or make it a nested thing where hyperlogπ(z, hyperparams) returns a logπ(z) function. I guess it would make it more compatible with Turing model structure

from advancedvi.jl.

theogf avatar theogf commented on September 14, 2024

Hey I got a nice working version of it. Here is my proposition:
So far the model is

  • Give a $$\log\pi$$ function taking samples $$\theta$$ as arguments

What I propose to work with hyperparameters

  • $$\log\pi$$ is either
    • A function taking samples $$\theta$$ as argument (as above). This is case is inferred by having a kw argument hyperparam set to nothing
    • A function taking an array of hyperparameters as argument and which will return itself another function taking samples as arguments

Here is an example of the second function for a Gaussian Process Regression example

function logpi(hp)
  k = hp[1] * SqExponentialKernel()
  K = kernelmatrix(k, X)
  prior_gp = MvNormal(K)
  return f -> logpdf(Normal(0, hp[2]), f - y) + logpdf(prior_gp, f)
end

Then during training optimization is alternated between variational updates and hyperparameters updates.
If that sounds reasonable I can make a PR for it

from advancedvi.jl.

torfjelde avatar torfjelde commented on September 14, 2024

So I've given this and similar issues (e.g. callbacks) a bit of thought.

I definitively like the the idea, but I think there can be cases where you might want to, e.g. optimize hyper-parameters several times in a row for a given theta, no?

All in all, I think the goal should be to instead make it easy to write the optimization-loop yourself, instead of adding a bunch of functionality to the vi call. The reason for this is because it's going to be difficult to cover all the wanted use-cases, e.g.

  • callbacks: what arguments do you pass to the callback?
  • hyper-parameters: how many steps do you take for the hyper-parameters, and how many do you take for the parameters? are they continuous or discrete?

IMO, vi should be a call that can do the simplest of things, and then if you want custom behavior, e.g. optimizing hyper-parameters, you just write your own training-loop. Even now it's pretty straight-forward I think:

converged = false
step = 1

prog = ProgressMeter.Progress(num_steps, 1)

diff_results_θ = DiffResults.GradientResult(θ_init)
diff_results_hypers = DiffResults.GradientResult(hypers_init)

while (step  num_steps) && !converged
    # 1. Compute gradient and objective value.
    getq(θ) = ApproximateDistribution(hypers, θ)
    AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results_θ, kwargs...)

    # 2. Extract gradient from `diff_result`= DiffResults.gradient(diff_result_θ)
    # 3. Apply optimizer, e.g. multiplying by step-size
    Δ = apply!(optimizer, θ, ∇)

    # 4. Update parameters
    @. θ = θ - Δ

    # 5. [OPTIONAL] Update hyperparameters (using gradient computation)
    getq(hypers) = ApproximateDistribution(hypers, θ)
    AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results_hypers, kwargs...)

    ∇_hypers = DiffResults.gradient(diff_result_hypers)
    Δ_hypers = apply!(optimizer_hypers, hypers, ∇_hypers)
    @. hypers .= hypers - Δ_hypers

    # 6. Do whatever analysis you want
    callback(args...)

    # 7. Update
    converged = hasconverged(variational_objective, alg, q, model) # or something user-defined
    step += 1

    ProgressMeter.next!(prog)
end

I think if we make this very clear in the documentation and show simple examples, people should be able to do exactly what they want.

Now, you could argue that we could do both, and that's fair. But 1) I think we first need to nail down everything before we start adding convenience-methods on top of vi, 2) it's important to ensure that if we do that, we don't make things too "noise" in the sense that if I look at the docstring of vi and I see a buuuunch of options, most of which I don't need, some which indicate that they can do what I want but then turns out that they can't do exactly what I need, etc. All in all, I don't think this is something we need to prioritize right now? 😕

Agree/disagree?

from advancedvi.jl.

theogf avatar theogf commented on September 14, 2024

Yep totally agree!
Having independence between var. params and hyper params optimization seems both more flexible and practical!
So I guess the aim would be to create an extra function for hyper parameter optimization that would call vi inside?

My idea with having logpi() returning a function was that it was then extremely similar to the Turing API where you get your model once you call it on your data/hyperparameters.

from advancedvi.jl.

Red-Portal avatar Red-Portal commented on September 14, 2024

This could be done by cleverly specifying the variational family, once the reboot is deployed. Maybe I'll write an example that does that by then. For the time being, I'll close the issue. Please re-open and remind me if I forget to do this in the future!

from advancedvi.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.