scikit-optimize / scikit-optimize Goto Github PK
View Code? Open in Web Editor NEWSequential model-based optimization with a `scipy.optimize` interface
Home Page: https://scikit-optimize.github.io
License: BSD 3-Clause "New" or "Revised" License
Sequential model-based optimization with a `scipy.optimize` interface
Home Page: https://scikit-optimize.github.io
License: BSD 3-Clause "New" or "Revised" License
We cannot optimize the acquisition function of using conventional gradient / 2nd order information based methods. SMAC does it in the following way described in page 13 of http://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf
Some terminology.
p
parameters and a parameter configuration, a one-exchange neighbourhood is defined as a parameter configuration that is different in exactly one parameter.X
) that is continuous, this neighbor is sampled from a Gaussian centered at X
with std 0.2 keeping all other parameters constant.Y
) that is categorical, this neighbour is any other categorical parameter keeping all other parameters constant.Seems like they do a multi-start local search with 10 points. For each local search:
p
p
, then terminatep
to the neighbour with minimum acquisition value.Then return the minimum of all the 10 local searches.
Based on: #34 (comment)
We should aim to converge on a unified interface where possible. It is too much work to duplicate all the acquisition functions etc.
The reason I scaled all the parameters to between 0 and 1 to make the GP fitting invariant of the scale of parameters.
I am sure that using an anisotropic kernel as being done now makes it invariant of the different scales of the paramters but it might be worth investigating.
Now that #75 has been merged, we should refactor all *_minimize
functions in order to make use of the new API.
We may need to make a few internal changes since sample_points
return values in the original space, while we will need to feed the transformed values instead to the optimizer.
I would expect something along the following lines:
_check_grid
a public util returning the corresponding list of Distribution objects.sample_grid(grid, n_samples)
warp(grid, samples)
: from original to warped spaceunwarp(grid, samples)
: from warped to original sapceBoth Space.rvs(X)
and Space.inverse_transform(X)
return arrays of object dtypes. Are we okay with that?
Hey.
Do you intent to provide a GridSearchCV plug-in replacement or only the optimizer?
The thing is that it might take a while to get that into scikit-learn, and it would be nice if people had access to it.
Cheers,
Andy
I would like to get the 0.1 release out before school starts again (i.e September). This is just a parent issue to track the blockers.
Is there anything else?
I have been playing around the code for sometime and it doesn't seem to work at least for the test example (or seems to at least by chance)
a = 1
b = 5.1 / (4 * pi**2)
c = 5.0 / pi
r = 6
s = 10
t = 1 / (8*pi)
def branin(x):
x1 = x[0]
x2 = x[1]
return a * (x2 - b * x1**2 + c * x1 - r)**2 + s * (1 - t) * cos(x1) + s
bounds = [[-5, 10], [0, 15]]
res = gp_minimize(
branin, bounds, search='sampling', maxiter=2, random_state=0,
acq='UCB')
More specifically this line https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L282 returns a matrix of zeros.
This is because the optimized scale parameter of the Matern
kernel is 1e-5, which sets the covariance between all the samples to be zero.
Should we try a different approach other than scaling the parameters down to 0 and 1.
Should be trivial
The current build takes more than 15mn, this is very long, given that we dont have so much code yet... We should really try to trim some of the tests.
This sounds like a incredibly formal, bureaucratic and heavy, try and read to the end before panicking.
I think one of the first things we should do is make sure we are all on the same page on how the project will work. I suggest the following:
I don't see this as rules to be enforced by ๐ but as guidelines.
I think it is important to write down briefly these kind of "obvious" things if you want to start a project that is long term (not just a hackathon hack) with people who you haven't worked with so much. Basically: explicit is better than implicit ๐
What is a good stopping criteria for blackbox optimization?
To avoid cluttering the examples with tons of code to produce nice plots, I think it would make sense to provide a few plot functions taking directly an OptimizeResult
object.
See e.g. the plot_acquisition
and plot_convergence
functions in GPyOpt (at the end of http://diana-hep.org/carl/notebooks/Parameterized%20inference%20with%20nuisance%20parameters.html)
At the moment, input values are assumed to live within a bounded continuous range. We should think about an API on how to specify integer and symbolic values as well, and what would be the consequences for the algorithms we implemented so far.
If we plan on getting serious with this, we should think of a better project name.
One that I like would be scikit-optimize
, abbreviated as skopt
.
CC: @MechCoder @betatim
I propose creating a learning
submodule, for basically everything which is a modification of a ML algorithm. The wrapper around Gradient Boosting should be moved there.
The computed variance for each RandomForest is given in http://arxiv.org/pdf/1211.0906v2.pdf in section 4.3.2 (This will involve wrapping sklearn's DecisionTrees to return the standard deviation of each leaf)
The ExpectedImprovement makes the same assumption about the predictions being gaussian except there is a minor modification given in Section 5.2 of https://www.cs.ubc.ca/~murphyk/Papers/gecco09.pdf
There is a change from sklearn's RF implementation in computing the split point described in 4.3.2 in http://arxiv.org/pdf/1211.0906v2.pdf but we can try without that modification.
When exploring #37 and #38, I noticed that we are not very consistent with respect to the input/output shape. We should enforce one and only way to do things.
I would suggest the following conventions:
func
: 1d array-like as input, scalar as output (as in scipy.minimize)Everything else raises an error.
GBRT now returns the quantiles. We can get a naive approximation to the std by subtracting the 68th quantile from the 50th quantile and feeding it to the acquistion functions.
I noticed that when run with 2.7.11, there is a syntax error:
in space.py
def __init__(self, *categories, prior=None):
SyntaxError: invalid syntax
The regular argument cannot come after the *argument. Simply reversing these parameters causes other issues in space.py
This seems to be in accordance with this accepted Python enhancement proposal.
Do the devs plan on making skopt compatible with 2.7.x?
Avoid broken examples like what happened in #29 by running them as part of travis. Not sure if there is anything more useful we can do than to check that they run with exit code == 0.
This repo needs a license. BSD tres clauses?
The other kind of three comma club
For now, we could be move in the two benchmarks functions defined in the tests of gp_opt.py
.
We should add some convenience functions that make plots similar to what is in the examples for "generic" problems to help people debug why things aren't converging or why they are converging to the value they are etc etc.
Maybe use something in the style of https://github.com/dfm/corner.py to show N>2 spaces, where the samples are, what the acquisition function looks like, ...
I'm not sure if the name is great because a scikits.optimization exists....
Currently, GradientBoostingQuantileRegressor.predict
concatenates predictions vertically. I think this is a bug, isnt it?
The current three tests take 17 mins to run on Travis, while the entire sklearn test suite runs in 10 mins
Some thoughts on "uncertainty". This issue was inspired by @MechCoder's comment in #9. The first part of this issue tries to correctly define various terms that often get used interchangeably and are easy to confuse (I confidently predict that I will make at least one error in this post). Once we have defined the terms, we can decide which of them we need in order to evaluate various acquisition functions.
Standard deviation (\sigma): this is the square root of the variance. Can be calculated for any sample no matter what distribution the samples come from.
Standard error (of the mean): \sigma / \sqrt(N)
a measure of the uncertainty associated with the estimated value of the mean.
Confidence interval (CI): The N% confidence interval will contain the measured value N% of the time. Alice wants to estimate the value of a parameter t
, so she constructs an estimator that
as well as a CI. The 68% CI (around that
) will contain the true value t
in 68% of experiments (that is we clone Alice and repeat what she did many times).
N% quantile: The N% quantile starts at negative infinity and goes until a point x
, think of it as the integral of the p.d.f. between -inf
and x
which equals N%.
If that
is distributed according to a normal distribution then the 68% CI is [that
- sigma
, that
+ sigma
].
For a normal distribution mu
-sigma
= the 16% quantile.
For our purposes we have a surrogate model (a GP or what have you) for the true, expensive function f
. At a given point x
our best estimate of the true value of f
is the mean mu(x)
of our surrogate model.
Now my understanding runs out -> need help.
What is the band we get from a GP and then feed into EI and friends? Is it the "standard error on the mean" or "68% confidence interval" or "68% credible interval" or something else?
Add a gitter rooms -
Right now we support lbfgs
and random sampling
. What are some other methods to optimize the acquistion function?
As observed in https://github.com/MechCoder/scikit-optimize/pull/14, the approximated objective when using EI is really weird. What is the issue?
Section 4.1.2 in http://arxiv.org/pdf/1211.0906v2.pdf
def bench1(x):
return np.asscalar(np.asarray(x))
def bench2(x):
return np.asscalar(np.asarray(x, dtype=np.int))
bench1([1])
1
bench2(["1"])
1
from skopt import forest_minimize
# Works
forest_minimize(bench1, ((1.0, 4.0),))
# Fails
forest_minimize(bench2, (("1", "2", "3", "4"),))
# Works
gp_minimize(bench1, ((1.0, 4.0),), maxiter=5)
# Fails
gp_minimize(bench2, (("1", "2", "3", "4"),))
relevant: https://github.com/automl/RoBO
According to the talk with @glouppe on chat, in https://github.com/MechCoder/scikit-optimize/pull/38 the length scale of the kernel has been fixed to be 1.0 . We should either
yield (check_minimize, minimizer, bench1, 0., [(-2.0, 2.0)], 0.05, 75)
with et_minimize
produces
scikit-optimize/skopt/acquisition.py:165: RuntimeWarning: invalid value encountered in greater
mask = std > 0
and std
is:
(Pdb) print(std)
[ 0.00000000e+00 3.44874701e-01 4.35236492e-01 nan
5.35666028e-01 3.76289149e-01 0.00000000e+00 3.44874701e-01
3.03596891e-01 2.84929167e-01 nan nan
1.11601649e-01 3.44874701e-01 nan nan
2.98023224e-08 6.69582536e-01 1.68631973e-01 nan]
Would be nice to generate examples upon deployment to build a nice gallery. This would require some changes to ci_scripts/deploy.sh
and to the templates, but nothing impossible.
Hi, I just discovered this project. I wonder whether it is really the goal to provide only a scipy-like interface or whether you think it would be possible to provide an ask-and-tell interface, too. That would be much more convenient for use cases in which the optimization process is controlled actually by the objective function.
Collecting benchmarks:
Before implementing any more things, we should really extend the test suite with more thorough tests. At the moment, I cant even minimize a 1D parabola with the default parameters of gp_minimize
...
(and I dont even understand why it fails... so many things to adjust :/)
We might want to look at other packages for good defaults.
Are the examples somewhere on the website? I can't seem to find them.
For API checks and baseline purposes, I think it would be nice to have dummy random search method.
To consolidate the package, we should generate a static version of the documentation.
I recently found https://github.com/BurntSushi/pdoc which seems to be quite nice and easy for that purpose.
Would love to have everything to reproduce fig 7 from http://arxiv.org/pdf/1012.2599v1.pdf (and some of the other figures?)
This would also serve as a way to check the correctness of our implementation (for which I currently have doubts regarding EI, as reported in #17)
Might consider this:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.