GithubHelp home page GithubHelp logo

scikit-optimize / scikit-optimize Goto Github PK

View Code? Open in Web Editor NEW
2.7K 64.0 545.0 9.2 MB

Sequential model-based optimization with a `scipy.optimize` interface

Home Page: https://scikit-optimize.github.io

License: BSD 3-Clause "New" or "Revised" License

Python 94.83% Shell 4.86% Makefile 0.31%
bayesopt optimization scientific-computing machine-learning hyperparameter bayesian-optimization binder scikit-learn hyperparameter-optimization hyperparameter-tuning

scikit-optimize's Issues

Local-search based technique to optimize the acquisition function of trees and friends

We cannot optimize the acquisition function of using conventional gradient / 2nd order information based methods. SMAC does it in the following way described in page 13 of http://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf

Some terminology.

  1. If we have p parameters and a parameter configuration, a one-exchange neighbourhood is defined as a parameter configuration that is different in exactly one parameter.
  2. For a parameter (say X) that is continuous, this neighbor is sampled from a Gaussian centered at X with std 0.2 keeping all other parameters constant.
  3. For a parameter (say Y) that is categorical, this neighbour is any other categorical parameter keeping all other parameters constant.

Seems like they do a multi-start local search with 10 points. For each local search:

  1. Initialize a random point p
  2. Check the acquisition values at "4X + Y" neighbours.
  3. If none of the neighbours have a lesser acquisition than p, then terminate
    Else reassign p to the neighbour with minimum acquisition value.

Then return the minimum of all the 10 local searches.

Refactor minimize functions to make use of sampling API

Now that #75 has been merged, we should refactor all *_minimize functions in order to make use of the new API.

We may need to make a few internal changes since sample_points return values in the original space, while we will need to feed the transformed values instead to the optimizer.

I would expect something along the following lines:

  1. Make _check_grid a public util returning the corresponding list of Distribution objects.
  2. sample_grid(grid, n_samples)
  3. warp(grid, samples): from original to warped space
  4. unwarp(grid, samples): from warped to original sapce

Add scikit-learn compatible BayesSearchCV

Hey.
Do you intent to provide a GridSearchCV plug-in replacement or only the optimizer?
The thing is that it might take a while to get that into scikit-learn, and it would be nice if people had access to it.

Cheers,
Andy

0.1 release

I would like to get the 0.1 release out before school starts again (i.e September). This is just a parent issue to track the blockers.

  • Consistent and backward-compatible API. Addressed by #75
  • SMAC #57
  • (Local search technique that performs better than random sampling on piecewise constant predict functions (#74), postponed till we have a conclusion in #109)
  • Examples (#65)
  • Support for Python 2.7 (#87)
  • Consistent return types #86
  • Name collision #76 (punting for now)
  • Need a logo #107 (code speaks louder than images, no logo required)
  • release mechanics #133
  • better defaults #166
  • merge #145
  • merge #169
  • maybe merge #162 (nice to have but don't hold the ๐Ÿš„ for it)
  • stop this list from getting ever longer ๐Ÿ“‹

Is there anything else?

Matern kernel returning a zeroed out covariance matrix

I have been playing around the code for sometime and it doesn't seem to work at least for the test example (or seems to at least by chance)

a = 1
b = 5.1 / (4 * pi**2)
c = 5.0 / pi
r = 6
s = 10
t = 1 / (8*pi)

def branin(x):
    x1 = x[0]
    x2 = x[1]
    return a * (x2 - b * x1**2 + c * x1 - r)**2 + s * (1 - t) * cos(x1) + s

bounds = [[-5, 10], [0, 15]]
res = gp_minimize(
    branin, bounds, search='sampling', maxiter=2, random_state=0,
    acq='UCB')

More specifically this line https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L282 returns a matrix of zeros.

This is because the optimized scale parameter of the Matern kernel is 1e-5, which sets the covariance between all the samples to be zero.

Should we try a different approach other than scaling the parameters down to 0 and 1.

@glouppe @betatim What are your thoughts on this?

Tests are slow! (episode II)

The current build takes more than 15mn, this is very long, given that we dont have so much code yet... We should really try to trim some of the tests.

Explicit is better than implicit

This sounds like a incredibly formal, bureaucratic and heavy, try and read to the end before panicking.

I think one of the first things we should do is make sure we are all on the same page on how the project will work. I suggest the following:

  • all changes by PR
  • a PR solves one problem (don't mix problems together in one PR) with the minimal set of changes
  • describe why you are proposing the changes you are proposing
  • try to not rush changes (definition of rush depends on how big your changes are)
  • someone else has to merge your PR
  • new code needs to come with a test
  • no merging if travis is red

I don't see this as rules to be enforced by ๐Ÿš“ but as guidelines.

I think it is important to write down briefly these kind of "obvious" things if you want to start a project that is long term (not just a hackathon hack) with people who you haven't worked with so much. Basically: explicit is better than implicit ๐Ÿ˜€

API for non continuous inputs

At the moment, input values are assumed to live within a bounded continuous range. We should think about an API on how to specify integer and symbolic values as well, and what would be the consequences for the algorithms we implemented so far.

Project name

If we plan on getting serious with this, we should think of a better project name.

One that I like would be scikit-optimize, abbreviated as skopt.

CC: @MechCoder @betatim

Move GBRT in a `learning` submodule

I propose creating a learning submodule, for basically everything which is a modification of a ML algorithm. The wrapper around Gradient Boosting should be moved there.

Implement RF based model selection

The computed variance for each RandomForest is given in http://arxiv.org/pdf/1211.0906v2.pdf in section 4.3.2 (This will involve wrapping sklearn's DecisionTrees to return the standard deviation of each leaf)

The ExpectedImprovement makes the same assumption about the predictions being gaussian except there is a minor modification given in Section 5.2 of https://www.cs.ubc.ca/~murphyk/Papers/gecco09.pdf

There is a change from sklearn's RF implementation in computing the split point described in 4.3.2 in http://arxiv.org/pdf/1211.0906v2.pdf but we can try without that modification.

Expected input/output shape

When exploring #37 and #38, I noticed that we are not very consistent with respect to the input/output shape. We should enforce one and only way to do things.

I would suggest the following conventions:

  • func: 1d array-like as input, scalar as output (as in scipy.minimize)
  • acquisition functions: 2d array-like as input, 1d array as output.

Everything else raises an error.

GBRT based minimization

GBRT now returns the quantiles. We can get a naive approximation to the std by subtracting the 68th quantile from the 50th quantile and feeding it to the acquistion functions.

Incompatibility with Python 2.7.x?

I noticed that when run with 2.7.11, there is a syntax error:

in space.py
def __init__(self, *categories, prior=None):
SyntaxError: invalid syntax

The regular argument cannot come after the *argument. Simply reversing these parameters causes other issues in space.py
This seems to be in accordance with this accepted Python enhancement proposal.

Do the devs plan on making skopt compatible with 2.7.x?

Run examples as part of the CI

Avoid broken examples like what happened in #29 by running them as part of travis. Not sure if there is anything more useful we can do than to check that they run with exit code == 0.

Diagnostic plots ๐Ÿ“ˆ๐Ÿ“Š๐Ÿ“‰

We should add some convenience functions that make plots similar to what is in the examples for "generic" problems to help people debug why things aren't converging or why they are converging to the value they are etc etc.

Maybe use something in the style of https://github.com/dfm/corner.py to show N>2 spaces, where the samples are, what the acquisition function looks like, ...

Slow tests

The current three tests take 17 mins to run on Travis, while the entire sklearn test suite runs in 10 mins

Definition of terms related to uncertainty

Some thoughts on "uncertainty". This issue was inspired by @MechCoder's comment in #9. The first part of this issue tries to correctly define various terms that often get used interchangeably and are easy to confuse (I confidently predict that I will make at least one error in this post). Once we have defined the terms, we can decide which of them we need in order to evaluate various acquisition functions.

Standard deviation (\sigma): this is the square root of the variance. Can be calculated for any sample no matter what distribution the samples come from.

Standard error (of the mean): \sigma / \sqrt(N) a measure of the uncertainty associated with the estimated value of the mean.

Confidence interval (CI): The N% confidence interval will contain the measured value N% of the time. Alice wants to estimate the value of a parameter t, so she constructs an estimator that as well as a CI. The 68% CI (around that) will contain the true value t in 68% of experiments (that is we clone Alice and repeat what she did many times).

N% quantile: The N% quantile starts at negative infinity and goes until a point x, think of it as the integral of the p.d.f. between -inf and x which equals N%.

If that is distributed according to a normal distribution then the 68% CI is [that - sigma, that + sigma].

For a normal distribution mu-sigma = the 16% quantile.

For our purposes we have a surrogate model (a GP or what have you) for the true, expensive function f. At a given point x our best estimate of the true value of f is the mean mu(x) of our surrogate model.


Now my understanding runs out -> need help.

What is the band we get from a GP and then feed into EI and friends? Is it the "standard error on the mean" or "68% confidence interval" or "68% credible interval" or something else?

Support for categorical parameters seems to be broken

def bench1(x):
    return np.asscalar(np.asarray(x))

def bench2(x):
    return np.asscalar(np.asarray(x, dtype=np.int))

bench1([1])
1
bench2(["1"])
1

from skopt import forest_minimize
# Works
forest_minimize(bench1, ((1.0, 4.0),))
# Fails
forest_minimize(bench2, (("1", "2", "3", "4"),))

# Works
gp_minimize(bench1, ((1.0, 4.0),), maxiter=5)
# Fails
gp_minimize(bench2, (("1", "2", "3", "4"),))

ExtraTrees returns NaN for std

yield (check_minimize, minimizer, bench1, 0., [(-2.0, 2.0)], 0.05, 75) with et_minimize produces

scikit-optimize/skopt/acquisition.py:165: RuntimeWarning: invalid value encountered in greater
  mask = std > 0

and std is:

(Pdb) print(std)
[  0.00000000e+00   3.44874701e-01   4.35236492e-01              nan
   5.35666028e-01   3.76289149e-01   0.00000000e+00   3.44874701e-01
   3.03596891e-01   2.84929167e-01              nan              nan
   1.11601649e-01   3.44874701e-01              nan              nan
   2.98023224e-08   6.69582536e-01   1.68631973e-01              nan]

Doc: generate examples gallery

Would be nice to generate examples upon deployment to build a nice gallery. This would require some changes to ci_scripts/deploy.sh and to the templates, but nothing impossible.

Ask-and-tell interface?

Hi, I just discovered this project. I wonder whether it is really the goal to provide only a scipy-like interface or whether you think it would be possible to provide an ask-and-tell interface, too. That would be much more convenient for use cases in which the optimization process is controlled actually by the objective function.

Extend test suite

Before implementing any more things, we should really extend the test suite with more thorough tests. At the moment, I cant even minimize a 1D parabola with the default parameters of gp_minimize...

(and I dont even understand why it fails... so many things to adjust :/)

We might want to look at other packages for good defaults.

Add random search

For API checks and baseline purposes, I think it would be nice to have dummy random search method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.