scikit-optimize / scikit-optimize Goto Github PK

Sequential model-based optimization with a `scipy.optimize` interface

Home Page: https://scikit-optimize.github.io

License: BSD 3-Clause "New" or "Revised" License

Python 94.83% Shell 4.86% Makefile 0.31%

bayesopt optimization scientific-computing machine-learning hyperparameter bayesian-optimization binder scikit-learn hyperparameter-optimization hyperparameter-tuning

scikit-optimize's Issues

Local-search based technique to optimize the acquisition function of trees and friends

We cannot optimize the acquisition function of using conventional gradient / 2nd order information based methods. SMAC does it in the following way described in page 13 of http://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf

Some terminology.

If we have p parameters and a parameter configuration, a one-exchange neighbourhood is defined as a parameter configuration that is different in exactly one parameter.
For a parameter (say X) that is continuous, this neighbor is sampled from a Gaussian centered at X with std 0.2 keeping all other parameters constant.
For a parameter (say Y) that is categorical, this neighbour is any other categorical parameter keeping all other parameters constant.

Seems like they do a multi-start local search with 10 points. For each local search:

Initialize a random point p
Check the acquisition values at "4X + Y" neighbours.
If none of the neighbours have a lesser acquisition than p, then terminate
Else reassign p to the neighbour with minimum acquisition value.

Then return the minimum of all the 10 local searches.

Unify GBRT and GP interfaces

Based on: #34 (comment)

We should aim to converge on a unified interface where possible. It is too much work to duplicate all the acquisition functions etc.

How does the parameters being of different scales affect the Gaussian Process?

The reason I scaled all the parameters to between 0 and 1 to make the GP fitting invariant of the scale of parameters.

I am sure that using an anisotropic kernel as being done now makes it invariant of the different scales of the paramters but it might be worth investigating.

Refactor minimize functions to make use of sampling API

Now that #75 has been merged, we should refactor all *_minimize functions in order to make use of the new API.

We may need to make a few internal changes since sample_points return values in the original space, while we will need to feed the transformed values instead to the optimizer.

I would expect something along the following lines:

Make _check_grid a public util returning the corresponding list of Distribution objects.
sample_grid(grid, n_samples)
warp(grid, samples): from original to warped space
unwarp(grid, samples): from warped to original sapce

Return type of `Space.rvs(x)` and `Space.inverse_transform(X)`

Both Space.rvs(X) and Space.inverse_transform(X) return arrays of object dtypes. Are we okay with that?

Add scikit-learn compatible BayesSearchCV

Hey.
Do you intent to provide a GridSearchCV plug-in replacement or only the optimizer?
The thing is that it might take a while to get that into scikit-learn, and it would be nice if people had access to it.

Cheers,
Andy

0.1 release

I would like to get the 0.1 release out before school starts again (i.e September). This is just a parent issue to track the blockers.

Is there anything else?

Matern kernel returning a zeroed out covariance matrix

I have been playing around the code for sometime and it doesn't seem to work at least for the test example (or seems to at least by chance)

a = 1
b = 5.1 / (4 * pi**2)
c = 5.0 / pi
r = 6
s = 10
t = 1 / (8*pi)

def branin(x):
    x1 = x[0]
    x2 = x[1]
    return a * (x2 - b * x1**2 + c * x1 - r)**2 + s * (1 - t) * cos(x1) + s

bounds = [[-5, 10], [0, 15]]
res = gp_minimize(
    branin, bounds, search='sampling', maxiter=2, random_state=0,
    acq='UCB')

More specifically this line https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L282 returns a matrix of zeros.

This is because the optimized scale parameter of the Matern kernel is 1e-5, which sets the covariance between all the samples to be zero.

Should we try a different approach other than scaling the parameters down to 0 and 1.

@glouppe @betatim What are your thoughts on this?

Parallelize n_restarts_optimizer

Should be trivial

Tests are slow! (episode II)

The current build takes more than 15mn, this is very long, given that we dont have so much code yet... We should really try to trim some of the tests.

Explicit is better than implicit

This sounds like a incredibly formal, bureaucratic and heavy, try and read to the end before panicking.

I think one of the first things we should do is make sure we are all on the same page on how the project will work. I suggest the following:

all changes by PR
a PR solves one problem (don't mix problems together in one PR) with the minimal set of changes
describe why you are proposing the changes you are proposing
try to not rush changes (definition of rush depends on how big your changes are)
someone else has to merge your PR
new code needs to come with a test
no merging if travis is red

I don't see this as rules to be enforced by 🚓 but as guidelines.

I think it is important to write down briefly these kind of "obvious" things if you want to start a project that is long term (not just a hackathon hack) with people who you haven't worked with so much. Basically: explicit is better than implicit 😀

Stopping criteria

What is a good stopping criteria for blackbox optimization?

Plot functions from `OptimizeResult` object

To avoid cluttering the examples with tons of code to produce nice plots, I think it would make sense to provide a few plot functions taking directly an OptimizeResult object.

See e.g. the plot_acquisition and plot_convergence functions in GPyOpt (at the end of http://diana-hep.org/carl/notebooks/Parameterized%20inference%20with%20nuisance%20parameters.html)

API for non continuous inputs

At the moment, input values are assumed to live within a bounded continuous range. We should think about an API on how to specify integer and symbolic values as well, and what would be the consequences for the algorithms we implemented so far.

Project name

If we plan on getting serious with this, we should think of a better project name.

One that I like would be scikit-optimize, abbreviated as skopt.

CC: @MechCoder @betatim

Move GBRT in a `learning` submodule

I propose creating a learning submodule, for basically everything which is a modification of a ML algorithm. The wrapper around Gradient Boosting should be moved there.

Add probability of improvement acquisition function

Implement RF based model selection

The computed variance for each RandomForest is given in http://arxiv.org/pdf/1211.0906v2.pdf in section 4.3.2 (This will involve wrapping sklearn's DecisionTrees to return the standard deviation of each leaf)

The ExpectedImprovement makes the same assumption about the predictions being gaussian except there is a minor modification given in Section 5.2 of https://www.cs.ubc.ca/~murphyk/Papers/gecco09.pdf

There is a change from sklearn's RF implementation in computing the split point described in 4.3.2 in http://arxiv.org/pdf/1211.0906v2.pdf but we can try without that modification.

Expected input/output shape

When exploring #37 and #38, I noticed that we are not very consistent with respect to the input/output shape. We should enforce one and only way to do things.

I would suggest the following conventions:

func: 1d array-like as input, scalar as output (as in scipy.minimize)
acquisition functions: 2d array-like as input, 1d array as output.

Everything else raises an error.

GBRT based minimization

GBRT now returns the quantiles. We can get a naive approximation to the std by subtracting the 68th quantile from the 50th quantile and feeding it to the acquistion functions.

Incompatibility with Python 2.7.x?

I noticed that when run with 2.7.11, there is a syntax error:

in space.py
def __init__(self, *categories, prior=None):
SyntaxError: invalid syntax

The regular argument cannot come after the *argument. Simply reversing these parameters causes other issues in space.py
This seems to be in accordance with this accepted Python enhancement proposal.

Do the devs plan on making skopt compatible with 2.7.x?

Run examples as part of the CI

Avoid broken examples like what happened in #29 by running them as part of travis. Not sure if there is anything more useful we can do than to check that they run with exit code == 0.

LICENSE

This repo needs a license. BSD tres clauses?

The other kind of three comma club

Add skopt.benchmarks functions

For now, we could be move in the two benchmarks functions defined in the tests of gp_opt.py.

Implement Tree-structured Parzen Estimator (TPE)

Algorithm 4 from http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf

Diagnostic plots 📈📊📉

We should add some convenience functions that make plots similar to what is in the examples for "generic" problems to help people debug why things aren't converging or why they are converging to the value they are etc etc.

Maybe use something in the style of https://github.com/dfm/corner.py to show N>2 spaces, where the samples are, what the acquisition function looks like, ...

name clash with scikits.optimization?

I'm not sure if the name is great because a scikits.optimization exists....

GradientBoostingQuantileRegressor.predict API

Currently, GradientBoostingQuantileRegressor.predict concatenates predictions vertically. I think this is a bug, isnt it?

Slow tests

The current three tests take 17 mins to run on Travis, while the entire sklearn test suite runs in 10 mins

Definition of terms related to uncertainty

Some thoughts on "uncertainty". This issue was inspired by @MechCoder's comment in #9. The first part of this issue tries to correctly define various terms that often get used interchangeably and are easy to confuse (I confidently predict that I will make at least one error in this post). Once we have defined the terms, we can decide which of them we need in order to evaluate various acquisition functions.

Standard deviation (\sigma): this is the square root of the variance. Can be calculated for any sample no matter what distribution the samples come from.

Standard error (of the mean): \sigma / \sqrt(N) a measure of the uncertainty associated with the estimated value of the mean.

Confidence interval (CI): The N% confidence interval will contain the measured value N% of the time. Alice wants to estimate the value of a parameter t, so she constructs an estimator that as well as a CI. The 68% CI (around that) will contain the true value t in 68% of experiments (that is we clone Alice and repeat what she did many times).

N% quantile: The N% quantile starts at negative infinity and goes until a point x, think of it as the integral of the p.d.f. between -inf and x which equals N%.

If that is distributed according to a normal distribution then the 68% CI is [that - sigma, that + sigma].

For a normal distribution mu-sigma = the 16% quantile.

For our purposes we have a surrogate model (a GP or what have you) for the true, expensive function f. At a given point x our best estimate of the true value of f is the mean mu(x) of our surrogate model.

Now my understanding runs out -> need help.

What is the band we get from a GP and then feed into EI and friends? Is it the "standard error on the mean" or "68% confidence interval" or "68% credible interval" or something else?

Add Gitter rooms and badge.

Add a gitter rooms -

https://gitter.im/scikit-optimize/user (public room)
https://gitter.im/scikit-optimize/dev (invite only, private room)
A gitter badge to the user channel on README.md ;)

Methods to optimize the acquisition function

Right now we support lbfgs and random sampling. What are some other methods to optimize the acquistion function?

Bug for EI?

As observed in https://github.com/MechCoder/scikit-optimize/pull/14, the approximated objective when using EI is really weird. What is the issue?

Implement weighted hamming distance kernel for categorical inputs

Section 4.1.2 in http://arxiv.org/pdf/1211.0906v2.pdf

Support for categorical parameters seems to be broken

def bench1(x):
    return np.asscalar(np.asarray(x))

def bench2(x):
    return np.asscalar(np.asarray(x, dtype=np.int))

bench1([1])
1
bench2(["1"])
1

from skopt import forest_minimize
# Works
forest_minimize(bench1, ((1.0, 4.0),))
# Fails
forest_minimize(bench2, (("1", "2", "3", "4"),))

# Works
gp_minimize(bench1, ((1.0, 4.0),), maxiter=5)
# Fails
gp_minimize(bench2, (("1", "2", "3", "4"),))

relate ro robo

relevant: https://github.com/automl/RoBO

Tune the hyperparameters of the Gaussian kernel

According to the talk with @glouppe on chat, in https://github.com/MechCoder/scikit-optimize/pull/38 the length scale of the kernel has been fixed to be 1.0 . We should either

Tune the hyperparameter by maximizing the log likelihood function and get a point estimate of this value as done before. (which does not seem to work) (or)
Maximize the value of the integrated acquisition function across all hyperparameters.

ExtraTrees returns NaN for std

yield (check_minimize, minimizer, bench1, 0., [(-2.0, 2.0)], 0.05, 75) with et_minimize produces

scikit-optimize/skopt/acquisition.py:165: RuntimeWarning: invalid value encountered in greater
  mask = std > 0

and std is:

(Pdb) print(std)
[  0.00000000e+00   3.44874701e-01   4.35236492e-01              nan
   5.35666028e-01   3.76289149e-01   0.00000000e+00   3.44874701e-01
   3.03596891e-01   2.84929167e-01              nan              nan
   1.11601649e-01   3.44874701e-01              nan              nan
   2.98023224e-08   6.69582536e-01   1.68631973e-01              nan]

check out BayesianOptimization package

https://github.com/fmfn/BayesianOptimization

Doc: generate examples gallery

Would be nice to generate examples upon deployment to build a nice gallery. This would require some changes to ci_scripts/deploy.sh and to the templates, but nothing impossible.

Ask-and-tell interface?

Hi, I just discovered this project. I wonder whether it is really the goal to provide only a scipy-like interface or whether you think it would be possible to provide an ask-and-tell interface, too. That would be much more convenient for use cases in which the optimization process is controlled actually by the objective function.

Collecting benchmark problems

Collecting benchmarks:

Extend test suite

Before implementing any more things, we should really extend the test suite with more thorough tests. At the moment, I cant even minimize a 1D parabola with the default parameters of gp_minimize...

(and I dont even understand why it fails... so many things to adjust :/)

We might want to look at other packages for good defaults.

scikit-optimize / scikit-optimize Goto Github PK

scikit-optimize's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs