GithubHelp home page GithubHelp logo

Comments (22)

rgrosse avatar rgrosse commented on July 23, 2024

Thanks. That definitely looks wrong. Do you think you could send me the
inputs to laplace_approx, though? In particular I don't have an easy way
to get the hessian.

Roger

On Mon, Jan 21, 2013 at 10:43 AM, jamesrobertlloyd <[email protected]

wrote:

This kernel

ScoredKernel(k_opt=ProductKernel([ MaskKernel(ndim=8, active_dimension=0,
base_kernel=CubicKernel(offset=1.757755, output_variance=7.084045)),
MaskKernel(ndim=8, active_dimension=7,
base_kernel=SqExpPeriodicKernel(lengthscale=-2.701080, period=-0.380918,
output_variance=-0.071214)) ]), nll=6348.096611,
laplace_nle=-184450132.068237, bic_nle=12720.630212, noise=[-1.77276072])

applied to data set r_concrete_500_fold_10_of_10.mat

has a clearly erroneous laplace_nle

Why?


Reply to this email directly or view it on GitHubhttps://github.com//issues/3.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Just ran a script in sandpit.debug_laplace to recreate the error - pickled the output into source/laplace_error.p Load it with (sorry - should have saved as tuple rather than a list!)

import pickle
vars = pickle.load(open('laplace_error.p'))
sk = vars[0] # The suspicious Scored kernel
script = vars[1][0] # GPML script to compute Hessian
output = vars[2] # The output of the script
result = vars[3] # The scored kernel resulting from the output - note different (but still crazy) Laplace approx during to having started optimiser from different places - not important, it's still a crazy output

output.hessian will be a good place to start - not positive definite - perhaps it is catching out the proj_psd command?

Good luck and let me know if you need anything else to work on this

I experimented with the length of the finite difference approx used to calculate the Hessian - still results in Hessians that lead to unusual laplace approxes

from gp-structure-search.

rgrosse avatar rgrosse commented on July 23, 2024

Thanks. I notice that not only is the hessian not PSD, it also has an entry
of 3.18e12 on the diagonal. This will almost certainly cause numerical
problems and make it give crazy answers. Is there a legitimate reason for
it to be of order 10^12, or is that a bug? If it's legitimate, we'll have
to think harder about how to compute the Laplace approximation in those
cases (and it might affect some of the other algorithms as well).
Otherwise, we'll want to track down the bug, and in the meantime place an
assert somewhere so that it gives an error rather than returning a super
duper likelihood score.

On Thu, Jan 24, 2013 at 9:38 AM, jamesrobertlloyd
[email protected]:

Just ran a script in sandpit.debug_laplace to recreate the error - pickled
the output into source/laplace_error.p Load it with (sorry - should have
saved as tuple rather than a list!)

import pickle
vars = pickle.load(open('laplace_error.p'))
sk = vars[0] # The suspicious Scored kernel
script = vars[1][0] # GPML script to compute Hessian
output = vars[2] # The output of the script
result = vars[3] # The scored kernel resulting from the output - note
different (but still crazy) Laplace approx during to having started
optimiser from different places - not important, it's still a crazy output

output.hessian will be a good place to start - not positive definite -
perhaps it is catching out the proj_psd command?

Good luck and let me know if you need anything else to work on this

I experimented with the length of the finite difference approx used to
calculate the Hessian - still results in Hessians that lead to unusual
laplace approxes


Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-12653716.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Let's experiment to see if the Hessians from MATLAB seem legit. Varying the step size of the finite difference calculation, we get the matrices below.

I cannot discern any obvious trends. The fact that signs seem to flip every so often suggests that we are testing the numerical accuracy of the derivative formulae in GPML. Any other thoughts/observations?

In my next comment there will be some exploratory plots...

1e-1

+8.6522e+00 +5.7413e+00 -1.5832e+01 +1.6244e+07 +5.6365e+00
+5.7413e+00 +4.1806e+00 -1.0545e+01 +1.1531e+07 +4.0605e+00
-1.5832e+01 -1.0545e+01 +1.9165e+01 -3.4706e+07 -1.0539e+01
+1.6244e+07 +1.1531e+07 -3.4706e+07 +7.5742e+04 +1.1526e+07
+5.6365e+00 +4.0605e+00 -1.0539e+01 +1.1526e+07 +3.9404e+00

1e-2

+7.3674e+00 +6.0717e+00 -1.4569e+01 +1.8389e+07 +5.6940e+00
+6.0717e+00 +5.8234e+00 -9.6200e+00 +1.2195e+07 +5.2384e+00
-1.4569e+01 -9.6200e+00 +1.2758e+01 -2.8177e+07 -9.6601e+00
+1.8389e+07 +1.2195e+07 -2.8177e+07 +8.9304e+05 +1.2214e+07
+5.6940e+00 +5.2384e+00 -9.6601e+00 +1.2214e+07 +4.6534e+00

1e-3

-1.0704e+01 -3.3397e+00 -1.1056e+01 +1.8474e+07 +2.1915e+01
-3.3397e+00 +1.9268e+01 -1.0500e+01 +1.2258e+07 +1.0551e+01
-1.1056e+01 -1.0500e+01 +1.3073e+01 -2.7650e+07 -1.0820e+01
+1.8474e+07 +1.2258e+07 -2.7650e+07 +1.0071e+07 +1.2332e+07
+2.1915e+01 +1.0551e+01 -1.0820e+01 +1.2332e+07 +1.8336e+00

1e-4

-1.2709e+02 +1.3610e+02 -1.3953e+01 +2.0614e+07 +4.7130e+01
+1.3610e+02 -3.0915e+01 -1.1274e+01 +1.3371e+07 +7.0436e+01
-1.3953e+01 -1.1274e+01 +1.6923e+01 -2.7130e+07 -1.2816e+01
+2.0614e+07 +1.3371e+07 -2.7130e+07 +4.7308e+08 +1.4162e+07
+4.7130e+01 +7.0436e+01 -1.2816e+01 +1.4162e+07 +1.7179e+02

1e-5

-1.2485e+03 -7.2327e+02 -1.0194e+02 +6.5094e+06 -5.3225e+01
-7.2327e+02 +1.3733e+03 -6.2107e+01 +1.9421e+07 +1.4022e+03
-1.0194e+02 -6.2107e+01 +1.7011e+01 -1.9966e+07 -9.0486e+01
+6.5094e+06 +1.9421e+07 -1.9966e+07 +4.0034e+10 +1.8997e+07
-5.3225e+01 +1.4022e+03 -9.0486e+01 +1.8997e+07 +1.4311e+03

1e-6

-9.1229e+03 -1.0733e+04 +1.5740e+03 -1.8915e+08 +5.8898e+02
-1.0733e+04 +2.3216e+04 +5.1945e+02 +1.0537e+08 +1.4845e+04
+1.5740e+03 +5.1945e+02 +5.6887e+02 -1.0466e+07 +4.7553e+02
-1.8915e+08 +1.0537e+08 -1.0466e+07 +3.2147e+12 -4.5512e+07
+5.8898e+02 +1.4845e+04 +4.7553e+02 -4.5512e+07 +6.4742e+03

1e-7

-1.5973e+05 +6.4187e+04 -1.0822e+04 +5.8456e+08 -8.4113e+03
+6.4187e+04 +3.0694e+04 -1.2721e+04 +1.6981e+09 +6.3806e+04
-1.0822e+04 -1.2721e+04 +5.3632e+03 +9.4772e+08 -1.1863e+04
+5.8456e+08 +1.6981e+09 +9.4772e+08 +5.6459e+13 -8.0183e+08
-8.4113e+03 +6.3806e+04 -1.1863e+04 -8.0183e+08 +9.6918e+04

1e-8

-2.3741e+05 +1.5215e+05 +1.0901e+05 +1.3436e+10 -4.3918e+05
+1.5215e+05 +1.2361e+06 +6.4673e+04 -1.4165e+10 +1.2320e+06
+1.0901e+05 +6.4673e+04 -2.2584e+03 +7.1079e+09 +6.1322e+04
+1.3436e+10 -1.4165e+10 +7.1079e+09 +1.2526e+14 -2.4051e+09
-4.3918e+05 +1.2320e+06 +6.1322e+04 -2.4051e+09 +1.2279e+06

1e-9

-3.4606e+07 -3.2188e+06 +1.7158e+06 +8.0593e+09 -6.0905e+06
-3.2188e+06 +6.0398e+06 +9.8569e+05 -1.2262e+11 +2.3448e+06
+1.7158e+06 +9.8569e+05 +5.9892e+05 -4.9899e+10 +1.0781e+06
+8.0593e+09 -1.2262e+11 -4.9899e+10 +1.3932e+14 +7.3460e+10
-6.0905e+06 +2.3448e+06 +1.0781e+06 +7.3460e+10 -1.3503e+06

1e-10

-2.9407e+08 +3.9351e+07 +2.4516e+07 -1.1775e+12 +2.5139e+07
+3.9351e+07 -1.8118e+07 +5.7634e+06 +7.2534e+11 -4.3006e+07
+2.4516e+07 +5.7634e+06 +1.0959e+06 +1.6164e+12 +5.0872e+06
-1.1775e+12 +7.2534e+11 +1.6164e+12 +1.3761e+14 -1.7572e+10
+2.5139e+07 -4.3006e+07 +5.0872e+06 -1.7572e+10 -6.7894e+07

1e-11

-3.7236e+09 -4.3814e+08 +1.6247e+08 -5.1157e+12 -1.2950e+09
-4.3814e+08 +5.2616e+07 +5.0687e+07 -8.2348e+12 +6.0310e+08
+1.6247e+08 +5.0687e+07 -4.4617e+06 +9.4263e+12 +5.4727e+07
-5.1157e+12 -8.2348e+12 +9.4263e+12 +1.3760e+14 +1.4653e+13
-1.2950e+09 +6.0310e+08 +5.4727e+07 +1.4653e+13 +1.1536e+09

1e-12

-1.2043e+10 +9.0341e+09 +8.2261e+08 -1.2115e+14 +1.1460e+10
+9.0341e+09 +6.0535e+09 -1.6399e+08 +1.6743e+13 -9.4721e+08
+8.2261e+08 -1.6399e+08 +2.2586e+08 +5.7523e+13 -1.9715e+08
-1.2115e+14 +1.6743e+13 +5.7523e+13 +4.1769e+14 -9.1880e+12
+1.1460e+10 -9.4721e+08 -1.9715e+08 -9.1880e+12 -7.9479e+09

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Continuing the above analysis - let's plot the nll surface and see what was going on. All of the plots below vary one variable from [-scale, scale]

e.g. the super high curvature parameter, 4, at a scale of 1e-2

temp

I will end this comment here to test the inclusion of images..

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

OK - images working - and we have a clear minimum at this scale - zooming in to a scale of 1e-6 we get
temp

Definitely a minimum but with negative curvature away from the minimum explaining why the estimates of curvature were getting larger as we decreased the step size

However, things are not so pretty for other variables - e.g. parameter 1 at a scale of 1e-6
temp
and the corresponding derivative
temp

i.e. we are not at a minimum - gradient is roughly constant + noise. Zooming out - the nll surface looks like

temp

Definitely not a minimum!

So the optimiser had failed and we should have no reason to believe that the laplace approx should give anything other than garbage!

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

In conclusion my few plots suggest that the very large curvatures are not unreasonable but the optimiser has failed. When the optimiser fails the noise on the calculations of derivatives can become significant (e.g. look at the curvature of dimension 1 as delta is made smaller - nearly monotonically increasing in magnitude).

Latest version of optimiser in new release of GPML + fingers crossed? Or something more intelligent?

from gp-structure-search.

duvenaud avatar duvenaud commented on July 23, 2024

These plots are great. I talked with Roger the other day and he seemed to think that the bug was caused by very large values on the diagonals causing numerical problems in his python code.

The example you give seems to make a pretty compelling case that we won't solve this until we end up at a true optimum, although it might be that even there some dimensions are very jagged.

I mentioned that the optimizer seemed to be stopping at saddle points to Carl, and he asked me to send him code that could reproduce the problem, so that he could make his optimizer robust against it. That shouldn't be too hard...

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Reproducing code will be easy enough - just need to extract the script produced by sandpit.debug_laplace and tidy up some paths - I'm going out now but can produce this tomorrow.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Reproducing code in laplace_debug - just changed a parameter of the optimiser to check we were not forcing premature stopping (we are not) - shall we catch Carl this afternoon?

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Conclusion: If the optimiser fails - then the Laplace approx is bogus - safest thing to do is to return a marginal likelihood approx of NaN. Still need to make sure that the optimiser is as good as possible - but we need to be robust to failure.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

The laplace function now returns nans when hessian is not PSD

from gp-structure-search.

rgrosse avatar rgrosse commented on July 23, 2024

As I mentioned before, unless there's a bug I don't know about, the existing code should handle non-PSD matrices just fine. The problem is ill-conditioned matrices, regardless of whether they have negative eigenvalues. We could fix this by writing a more numerically stable version, and the only reason I didn't was that ill-conditioned Hessians seem like a sign of a bug somewhere upstream. But I'll go ahead and write the stable version later today.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Hmmm - correct me if I'm wrong but I think the conclusion was that the non-PSD matrices we had tracked down were symptoms of a failed optimiser. In this case, there is no meaningful 'best approximation' to the Laplace integral because the likelihood surface cannot be approximated as a locally Gaussian optimum - because it is not an optimum.

I think the safest solution is to indicate failure of the Laplace method in general. Upstream, we ultimately should be using the best optimisation techniques / diagnostics we can and catch an optimisation failure before asking for the meaningless Laplace approx to marginal likelihood.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

But perhaps the hessian is often not psd - but very close? Then returning a nan whenever not psd would be rather severe! Is this what you and David observed when first testing the Laplace approx?

from gp-structure-search.

rgrosse avatar rgrosse commented on July 23, 2024

I think it's dangerous to have it quietly return nans. We might wind up throwing out good kernels without knowing it, because of the issues you describe. I'm writing the new version of the Laplace approximation so that it has a second return value which says whether there was a problem. That way you can continue the experiment as if nothing happened, but check afterwards to see if these problems affected the results.

In general, it seems possible for the Hessian to have eigenvalues of zero -- this would happen, for instance, if there's a redundant parameterization. This would probably lead to their being slightly negative eigenvalues which we should simply treat as zero.

Roger

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Good point - course of action seems good!

from gp-structure-search.

rgrosse avatar rgrosse commented on July 23, 2024

Btw, in the fix you committed, laplace_approx just crashes for me because comparing arrays using != doesn't give you a bool. Also, since floating point computations aren't exact, we should be using np.allclose to compare, rather than exact equality.

I've pushed the stable Laplace approximation (laplace_approx_stable in psd_matrices.py) along with this fix.

I think I might have accidentally overwritten some of the results file when I pushed the last commit, so watch out for that. I'll try to debug what happened later.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Yep - good spot - your fix is much better than mine!

from gp-structure-search.

rgrosse avatar rgrosse commented on July 23, 2024

Forget what I said earlier about overwriting results files. I just misread the output of git commit, and thought I was overwriting stuff when I was just merging changes. Based on the log, everything seems to be right.

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Next step for this is to fix the optimiser - will send an example to Carl and see if it is fixed by latest version of GPML

from gp-structure-search.

jamesrobertlloyd avatar jamesrobertlloyd commented on July 23, 2024

Closing this issue and merging into a generic fix the optimiser issue

from gp-structure-search.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.