Add an interpolation scheme that implements some kind of Gaussian process interpolatio

I'll just mention a nice paper by <a href="http://adsabs.harvard.edu/abs/2013A&A..

The development version of scikit-learn <a href="http://scikit-learn.org/dev/modules/g

Gaussian process interpolation,about rmjarvis/piff

Comments (48)

jmeyers314 commented on September 26, 2024 1

I think I got the anisotropic radial basis function kernel working (in sklearn). Check it out:

The interpolator is taking the 100 points in the left panel, optimizing the parameters of the 2D covariance matrix (+ nugget), and then making predictions. There's also a hidden step here where mildly noisy postage stamp images of Kolmogorov PSFs are being drawn and fit.

from piff.

jmeyers314 commented on September 26, 2024

From a very cursory glance at knn_interp.py it looks like we could probably do something very similar here with sklearn.gaussian_process.

from piff.

RobertLuptonTheGood commented on September 26, 2024

Rumour (== Dan Foreman-Mackay) has it that the next scikit-learn gaussian process code will be much better -- I'm not sure if it's just speed. I also don't know if the API changed.

from piff.

jmeyers314 commented on September 26, 2024

... although it doesn't look like there's any way in sklearn (or in DFM's george) to regress several functions (say, for fwhm, g1, g2, ...) simultaneously under a common covariance function. For a first implementation then, I suppose independent regressors may have to do.

from piff.

rmjarvis commented on September 26, 2024

I'll just mention a nice paper by Gentile et al that gives a good description of GP (aka kriging), radial basis functions, inverse distance weighting and other interpolation schemes.

They are related in form (they all can be described with the formula $\hat z = \sum_i \lambda_i z(x_i)$), but with some non-trivial differences in how $\lambda_i$ and $z(x_i)$ are calculated.

I'm also not sure which ones of these could be extended to have $z$ be a vector rather than a scalar.

from piff.

jmeyers314 commented on September 26, 2024

Looks like Gentile et al are using independent regressors: Appendix A.1, step 2.

from piff.

rmjarvis commented on September 26, 2024

Yes, that's how they did it. Some of the methods might be amenable to extension to vectors. But if not, probably PCAing the vectors and treating each principal component coefficient as a scalar would be appropriate.

For things like Kolmogorov with only a few parameters to fit (which are likely fairly orthogonal), there might not be much difference between the PCA approach and just the raw parameter values.

from piff.

jmeyers314 commented on September 26, 2024

probably PCAing the vectors and treating each principal component coefficient as a scalar would be appropriate

That reminds me... I think I've actually done this before to make an optics PSF emulator. Just need to switch the input params from [Z2, Z3, Z4, Z5, Z6, ...] to [u, v]...

from piff.

jmeyers314 commented on September 26, 2024

The development version of scikit-learn GaussianProcessRegressor does look like a significant improvement. I think I can cook up something on the latest stable version of sklearn pretty quickly, but we should definitely come back to this once the new sklearn is released.

from piff.

jmeyers314 commented on September 26, 2024

What's the purpose of kNNInterp.getFitProperties()? Don't we always want to interpolate the entirety of star.fit.params?

from piff.

cpadavis commented on September 26, 2024

You're very likely right. I can't think of a use case for it -- I think I
ended up coding it that way since it looked just like the getProperties,
only for star.fit instead of star.data. In fact, I think it's actually
bad, because it means some fit parameters could be ignored!

Edit: The only use case I can come up with is that maybe some other version of knn interp might want to interpolate some other set of properties that we are not actively fitting with our model (ie something that is in star but isn't star.fit.params). But even that doesn't actually jibe with the way it is actually written.

On Wed, Sep 21, 2016 at 1:56 PM Josh Meyers [email protected]
wrote:

What's the purpose of kNNInterp.getFitProperties()? Don't we always want
to interpolate the entirety of star.fit.params?

—
You are receiving this because you are subscribed to this thread.

Reply to this email directly, view it on GitHub
#14 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABsw5IWs9eIH7m0ZpEJE_Tj7sagocFYvks5qsZnmgaJpZM4IPG8T
.

from piff.

jmeyers314 commented on September 26, 2024

I think I got the basic implementation of this with sklearn down now. Unfortunately, it doesn't work very well. The basic problem is that while sklearn gaussian_process will let you set the parameter that describes the noise variance in the regression (affectionately termed the "nugget"), there isn't any facility to optimize the log marginal likelihood w.r.t. the nugget in the same way that the log marginal likelihood is optimized w.r.t. the correlation length scale. It looks like george and development sklearn both do have this ability though.

So... I see a few options:

Let the user set the nugget. This requires some sort of guess for how noisy the star.fit.params are, but is really easy for us to implement. I'll have to think about how to propagate the user-set nugget if we turn on PCA.
Wrap existing sklearn optimization to also optimize the nugget.
Use george, which is partially pip-installable but also depends on eigen, which might be a bit too steep of a dependency here? (though should be fast!)
Use development sklearn before it's released.
Wait for new sklearn.

I'm tempted to start with 1, add 2, and then add 3 as an optional enhancement+dependency.

from piff.

rmjarvis commented on September 26, 2024

It looks like you're testing it against a polynomial variation, rather than something that is actually generated as a Gaussian process. Do we expect it to work well in this case? Polynomials kind of violate some of the assumptions that GP makes I think.

You could generate a real Gaussian field using the PowerSpectrum class in GalSim. We normally use that for generating shear fields with some cosmological power spectrum. But you could also use it to generate size and shapes of PSFs that follow some input power spectrum. That might be a fairer test of a GP interpolator.

from piff.

jmeyers314 commented on September 26, 2024

Do we expect it to work well in this case?

I think it ought to work well for any sufficiently smooth function.

You could generate a real Gaussian field using the PowerSpectrum class in GalSim.

Agree that this would be an interesting test case.

One thing I've noticed is that the GP looks like it's trying to smooth the data more and more each iteration. The results are much nicer looking if you don't actually iterate.

from piff.

jmeyers314 commented on September 26, 2024

I've made some progress here. Caught a couple of bugs in Kolmogorov that were giving me problems before.

Here's the results for interpolating the g1 parameter for a Kolmogorov PSF, where the true g1 varies as a polynomial over the field (4 other PSF params not plotted are also varying over the field as polynomials):

Here's the same, but this time, the truth is generated from galsim.PowerSpectrum:

Not too good. But if you PCA the fitted PSF parameters first, and then only keep the first 2 PCs (which in this case explain ~99.3% of the parameter variance), you get:

I believe the improvement is due to the fact that the GP interpolator optimization step, which looks for a good correlation length, is better able to latch onto the signal (instead of noise) in the PCA case. The optimized inverse length scales are 0.002 without PCA, and 0.38 with.

I tried the same thing on the polynomial example, but there the difference was negligible, maybe due to the fact that the interpolation was already really good there.

from piff.

cpadavis commented on September 26, 2024

Cool stuff!

When you PCA the fitted PSF parameters, do you mean over star.fit.params?
Don't you have only 3 or 5 parameters for the Kolmogorov? (Are you doing
the centroid adjustments?) I'm surprised that essentially throwing out one
parameter makes such a dramatic improvement... Also, how are you generating
the terms in the PS case? Do they each get their own PS? (if size, g1, g2
are generated independently, how can a linear combination of them do
better? But on the other hand, if g1 and g2 are constrained by the overall
ellipticity length, maybe the pca is picking up on that constraint.)

On Wed, Sep 28, 2016 at 5:16 PM Josh Meyers [email protected]
wrote:

I've made some progress here. Caught a couple of bugs in Kolmogorov that
were giving me problems before.

Here's the results for interpolating the g1 parameter for a Kolmogorov
PSF, where the true g1 varies as a polynomial over the field (4 other PSF
params not plotted are also varying over the field as polynomials):

[image: f1]
https://cloud.githubusercontent.com/assets/3650485/18936655/fb4a5ad6-859d-11e6-8104-88f962db1f7f.png

Here's the same, but this time, the truth is generated from
galsim.PowerSpectrum:

[image: f3]
https://cloud.githubusercontent.com/assets/3650485/18936686/3d47deb8-859e-11e6-9e5c-ccea8cde44b7.png

Not too good. But if you PCA the fitted PSF parameters first, and then
only keep the first 2 PCs (which in this case explain ~99.3% of the
parameter variance), you get:

[image: f4]
https://cloud.githubusercontent.com/assets/3650485/18936701/67c1df04-859e-11e6-94fe-653c8244c67b.png

I believe the improvement is due to the fact that the GP interpolator
optimization step, which looks for a good correlation length, is better
able to latch onto the signal (instead of noise) in the PCA case. The
optimized inverse length scales are 0.002 without PCA, and 0.38 with.

I tried the same thing on the polynomial example, but there the difference
was negligible, maybe due to the fact that the interpolation was already
really good there.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#14 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABsw5P9ytULvjYGWPI1KvmEQf_0wqHx_ks5quwNCgaJpZM4IPG8T
.

from piff.

jmeyers314 commented on September 26, 2024

When you PCA the fitted PSF parameters, do you mean over star.fit.params?

Yep. I'm doing the 5 param version (centroid is part of star.fit.params).

if size, g1, g2 are generated independently, how can a linear combination of them do better?

That's a good point. I was lazy for the GP tests and just set the true fwhm and centering params to constants. I'll try again generating them from their own distributions though.

from piff.

rmjarvis commented on September 26, 2024

I was discussing this thread with @pmelchior and @TallJimbo on Tuesday when I was up at Princeton, and they thought there should be some place where you figure out what the correlation function of the GP is. If you aren't setting something explicit, they though sklearn is probably assuming a Gaussian (or "squared exponential" is apparently a preferred term to avoid confusion with the "Gaussian" of the GP).

In the Gentile et al discussion of kriging, they calculate something they call a variogram, gamma(h), which is connected to the correlation function C(h) as gamma(h) = C(h) - C(0). I think from what Peter and Jim were saying about this that C(0) is what sklearn calls the nugget.

Anyway, sklearn might be doing something semi-reasonable without an explicit variogram (or correlation function), but I suspect we'll want to calculate it ourselves and provide it explicitly.

from piff.

jmeyers314 commented on September 26, 2024

they thought there should be some place where you figure out what the correlation function of the GP is

Yep. This is done by the sklearn.gaussian_process.GaussianProcess.fit() method. Current sklearn only optimizes the correlation length of the squared exponential kernel (there are other kernels available, but I think squared exponential makes the most sense).

I'm less familiar with the "variogram" terminology, but you might be right that the nugget is at least related to C(0).

In the long run, I think we'll want at least some kind of prior over what kernel parameters are reasonable, potentially informed by something like observations of high stellar density fields. I don't know that we'll want to assert something outright, since it's quite likely the correlation function of PSF parameters varies exposure to exposure.

from piff.

rmjarvis commented on September 26, 2024

I think squared exponential makes the most sense).

I don't think that's likely to be the appropriate kernel for the atmosphere. Why can't we just measure it directly from the data? We should have all the information we need to do so.

from piff.

jmeyers314 commented on September 26, 2024

I don't think that's likely to be the appropriate kernel for the atmosphere. Why can't we just measure it directly from the data? We should have all the information we need to do so.

Hmmm. That's not something I've ever seen done with a GP, but could be interesting to investigate. We could certainly try estimating the parameters of a von Karman correlation function (essentially r0 and an outer scale) if a non-parametric correlation function proves too tricky.

from piff.

jmeyers314 commented on September 26, 2024

Actually, from Heymans++12 eq. 8, it looks like a von Karman correlation function may be the same as a Matern(-5/6) correlation function.

from piff.

pmelchior commented on September 26, 2024

It is indeed surprising: GP models depend profoundly on the covariance/correlation/variogram function, and yet nobody seems to fit it as part of the inference. It's often "our data looks like that" or "this is what worked best", whatever either of those statements mean.
I haven't done a literature search, but I'd guess that this has at least been tried. My second guess is that this is a non-trivial modification of the GP likelihood. It's maybe easier to split this into an iterative solution where one

solves the regression problem with GP given a covariance
solves for the covariance given the residuals of the regression model

from piff.

jmeyers314 commented on September 26, 2024

Actually, from Heymans++12 eq. 8, it looks like a von Karman correlation function may be the same as a Matern(-5/6) correlation function.

Although... it's not clear to me why the correlation of the ellipticities should look like the correlation in an instantaneous phase screen.

from piff.

rmjarvis commented on September 26, 2024

I'm pretty sure the Gentile et al implementation of kriging just involved measuring the correlations on the data directly. I guess I'm not following why this is not an easy thing to do. Is it just that sklearn doesn't have an easy way to put in an empirical correlation function?

BTW, for DES we've lately been looking at the PSF patterns of the presumably atmospheric component and there is a clear directionality to the streaks, so we've been thinking that we might even want the correlation function to be anisotropic. This I expect would be difficult to incorporate, since I think the code probably expects the kernel to be isotropic (i.e. C(r,theta) == C(r)). But at least getting the radial function from the data seems like it should be straightforward.

from piff.

pmelchior commented on September 26, 2024

Not sure about sklearn, but multivariate covariances are analytically treatable.

from piff.

jmeyers314 commented on September 26, 2024

I'm pretty sure anisotropic covariances are possible with sklearn. The theta kwarg to GaussianProcess just becomes multidimensional.

from piff.

jmeyers314 commented on September 26, 2024

@rmjarvis , I'm tooling up to try and answer your question about injecting a measured covariance function instead of optimizing a parametric one. This involves rolling a new GP interpolator by hand (sklearn won't do it), which I think I've got more or less running now, at least for the isotropic case. (Shouldn't be too hard to extend this to anisotropic).

TreeCorr seems like a natural place to do the measurement part. I only looked at this briefly, but it seems like this only handles isotropic correlation functions, right? How much work would it be to implement anisotropic correlation function measurements? Or is there another tool I should look at?

I suppose I can start in the meantime by injecting the correlation function used to generate fake data into the GP and comparing that to an optimized parametric correlation function GP.

from piff.

rmjarvis commented on September 26, 2024

TreeCorr seems like a natural place to do the measurement part. I only looked at this briefly, but it seems like this only handles isotropic correlation functions, right? How much work would it be to implement anisotropic correlation function measurements? Or is there another tool I should look at?

Yes, TreeCorr only does isotropic. Although, for the number of stars we have, I suspect the tree algorithm isn't that much of an advantage. Just doing the brute force correlation as a direct sum will already be fairly fast. (Probably even in python.) So for the anisotropic binning, we can just do that.

So probably go ahead and use TreeCorr for now for the isotropic one. I think we can leave the anisotropic one for a later issue. When we get to that, we can try to decide whether it makes more sense to add that feature to TreeCorr or just implement the direct sum in Piff.

from piff.

gbernstein commented on September 26, 2024

I've done the 2d 2PCF with all pairs in numpy (histogram2d) successfully. Takes a minute or so on my laptop to do all pairs of ~20k objects. Let me know if you want to see any code - it's nothing clever or fancy.

from piff.

aaronroodman commented on September 26, 2024

yes that would be useful - otherwise I was going to do the same…

On Oct 6, 2016, at 10:20 AM, Gary Bernstein <[email protected]mailto:[email protected]> wrote:

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/14#issuecomment-252030209, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEEa85R5lCsHVuTQlwZhgVHbqCpcSRWwks5qxS3ugaJpZM4IPG8T.

Prof. Aaron Roodman
Chair Dept. of Particle Physics & Astrophysics
SLAC National Accelerator Laboratory
Stanford University

SLAC National Accelerator Laboratory E-mail: [email protected]:[email protected]
2575 Sand Hill Rd. Phone: 650-926-2705
MS 29
Menlo Park, CA 94025 URL: http://www.slac.stanford.edu/~roodman

from piff.

gbernstein commented on September 26, 2024

def vcorr2d(x,y,dx,dy,
            rmax=1., bins=513,
            maxpts = 30000):
    """
    Produce 2d 2-point correlation function of total displacement power
    for the supplied sample of data, using brute-force pair counting.
    Output are 2d arrays giving the 2PCF and then the number of pairs that
    went into each bin.  The 2PCF calculated is
    xi_+ - <vr1 vr2 + vt1 vt2> = <vx1 vx2 + vy1 vy2>
    Note that each pair is counted only once.  So to count all pairs one can
    average xi_+ with itself reflected about the origin.
    """

    hrange = [ [-rmax,rmax], [-rmax,rmax] ]
    if len(x) > maxpts:
        # Subsample array to get desired number of points
        rate = float(maxpts) / len(x)
        print "Subsampling rate {:5.3f}%".format(rate*100.)
        use = np.random.random(len(x)) <= rate
        x = x[use]
        y = y[use]
        dx = dx[use]
        dy = dy[use]
    print "Length ",len(x)
    # Get index arrays that make all unique pairs
    i1, i2 = np.triu_indices(len(x))
    # Omit self-pairs
    use = i1!=i2
    i1 = i1[use]
    i2 = i2[use]
    del use

    # Make separation vectors and count pairs
    yshift = y[i2]-y[i1]
    xshift = x[i2]-x[i1]
    counts = np.histogram2d(xshift,yshift, bins=bins, range=hrange)[0]

    # Accumulate displacement sums
    v =  dx + 1j*dy
    print 'xiplus' ##
    vv = dx[i1] * dx[i2] + dy[i1] * dy[i2]
    xiplus = np.histogram2d(xshift,yshift, bins=bins, range=hrange, weights=vv)[0]/counts

    return xiplus,counts

This is a vector version (dx and dy) but a scalar version just drops dy.

from piff.

jmeyers314 commented on September 26, 2024

Thanks @gbernstein . That works nicely.

I've coded up a GP that can use nearest neighbor interpolation of an empirically measured correlation function. I keep running into problems trying to Cholesky solve the implied covariance matrix though. (See algorithm 2.1 here). Cholesky seems like the way to go for a symmetric positive semidefinite matrix, but maybe there are more stable algorithms?

I wonder if part of the problem, and maybe the reason people tend to use parametric covariance functions in general, is that to work as a covariance, a function must be positive semi-definite. (See equation 4.2 and surrounding text here). It's not obvious to me one way or the other if a measured correlation function will always properly behave this way, and I think it's even less clear if any kind of interpolation of a measured covariance will behave well. Note that the Kriging literature, and Gentile, et al. in particular, also appear to use parametric forms for their covariograms -- the so-called "authorized" covariograms.

from piff.

TallJimbo commented on September 26, 2024

Naive Cholesky won't work unless you can guarantee positive definite (no semi-). There are symmetric indefinite factorizations that would be ideal, but they're sometimes hard to find, and a symmetric Eigensolver or SVD will also do the job (less efficiently).

from piff.

rmjarvis commented on September 26, 2024

There are symmetric indefinite factorizations that would be ideal, but they're sometimes hard to find, and a symmetric Eigensolver or SVD will also do the job (less efficiently).

TMV has that, so if you're willing to call out to a C/C++ function to do some work, we could use that. It does an L D LT decomposition, where L is a lower triangle matrix with unit diagonal, and D is a block diagonal matrix with 1x1 and 2x2 blocks. Any symmetric matrix can be so decomposed. If positive semi-definite, then I think the diagonals will end up all 1x1 blocks. Negative eigenvalues imply some 2x2 blocks.

from piff.

jmeyers314 commented on September 26, 2024

I think I have the GP using the measured covariance function more or less working now. I ended up using numpy SVD, and had to put an upper limit on the singular values to robustly invert the matrix. (Maybe TMV can do that too?) There are a lot of potential variations to investigate, so I don't think I can say yet how this compares to using a parametric form for the covariance, but it works well enough that I think it's worth including here as an option.

It occurs to me that we're going to have to do something more sophisticated to use GPs on real data. Thinking ahead to LSST, for each exposure we'd like to interpolate the PSF measured from ~10^4.5 stars to the positions of ~10^6 galaxies, which formally implies inverting at least one ~10^6 x 10^6 matrix, and maybe more if we think the covariances of distinct PSF parameters are different. We can probably reduce the complexity by independently working on subregions of the focal plane, but that will require some more thought/experimentation. One thing that might help is if there are any clever ways given a constant symmetric NxN matrix to then invert many symmetric N+1 x N+1 matrices each with one additional column/row.

from piff.

rmjarvis commented on September 26, 2024

One thing that might help is if there are any clever ways given a constant symmetric NxN matrix to then invert many symmetric N+1 x N+1 matrices each with one additional column/row.

There is for the Cholesky case. Also for the case where your covariance matrix is defined as X.T * X for some matrix X. (You can do a QR decomposition: X = Q*R rather than directly calculating S = X.T * X and then doing Cholesky. It's both faster and more numerically stable.)

In the latter case, you can think about this as adding a single row to X and finding the new R value (upper triangle matrix, corresponding to L.T in the normal Cholesky presentation).

And yes, TMV does this. :) It's even one of the few algorithms where TMV is faster than LAPACK.

from piff.

jmeyers314 commented on September 26, 2024

Umm actually.... I think I meant invert a Nstar x Nstar matrix, so maybe not quite as hard as I was thinking.

Also, just noticed that the latest scikit-learn (0.18) was released two weeks ago, and has many nice new GP features. It's probably worth finding out how this fares with ~10^4.5 training points and ~10^6 prediction points.

Two notable omissions though:

their radial kernels can have anisotropic scale-lengths, but only ones that are axis-aligned.
no directly measured and asserted covariances.

The code looks quite understandable though, so I'm optimistic I could monkey patch these into the rest of the sklearn framework.

from piff.

rmjarvis commented on September 26, 2024

I'd bet they would consider pull requests too if it's close but needs a feature or two.

from piff.

jmeyers314 commented on September 26, 2024

Quick note:

For 35000x35000 matrix (i.e., 35000 training PSFs, aka stars):

time to cholesky in george using default solver (i.e., using numpy.linalg): 600s
time to cholesky in george using HODLR solver (see http://dan.iel.fm/george/current/user/solvers/#george.HODLRSolver): 1s

I haven't explicitly timed it yet, but I believe sklearn is also using numpy's cholesky, so probably similar speed to the slow version of george.

Since I think 35,000 is roughly the size of problem we're anticipating for LSST, I'm now advocating we go with george in spite of its additional prerequisites (eigen).

from piff.

RobertLuptonTheGood commented on September 26, 2024

eigen is a pretty lightweight dependency (it is of course header only).

from piff.

gbernstein commented on September 26, 2024

Wow, that's insanely fast!

from piff.

jmeyers314 commented on September 26, 2024

Ugg. Looks like I was a little too quick to speak (again...).

While HODLR is indeed ~600x faster for some covariance functions, it seems the relative performance of HODLR and np.linalg depends a lot on the scale-length of the covariance. Weirdly, HODLR seems at its best the more dense the covariance matrix is (the longer the scale-length is). Will keep investigating... Regardless, it would be nice to have the option of testing different solvers using george.

from piff.

rmjarvis commented on September 26, 2024

eigen is a pretty lightweight dependency (it is of course header only).

But it's not pip installable. For now at least, that's a requirement of piff.

from piff.

jmeyers314 commented on September 26, 2024

The interfaces for sklearn and george are pretty similar. I bet it's not too hard to just wrap both.

It's pretty hard to say whether it's worth having the HODLR solver, or worth making other kinds of approximations, without some actual data available to see what the actual correlations look like.

from piff.

jmeyers314 commented on September 26, 2024

Hey @rmjarvis. Just pinging you again about this issue. Any more comments?

from piff.

rmjarvis commented on September 26, 2024

Thanks for the reminder. For some reason, I thought I was waiting for you for something on this. It looks fine to me. There was one more test that I had to loosen the tolerance for slightly on my laptop, but I think it's probably fine. I'll go ahead and merge it.

from piff.

jmeyers314 commented on September 26, 2024

No worries. My DM overlords will be happy to hear, and push around story points :)

from piff.

Gaussian process interpolation about piff HOT 48 CLOSED

Comments (48)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs