Comments (48)
I think I got the anisotropic radial basis function kernel working (in sklearn). Check it out:
The interpolator is taking the 100 points in the left panel, optimizing the parameters of the 2D covariance matrix (+ nugget), and then making predictions. There's also a hidden step here where mildly noisy postage stamp images of Kolmogorov PSFs are being drawn and fit.
from piff.
From a very cursory glance at knn_interp.py
it looks like we could probably do something very similar here with sklearn.gaussian_process
.
from piff.
Rumour (== Dan Foreman-Mackay) has it that the next scikit-learn gaussian process code will be much better -- I'm not sure if it's just speed. I also don't know if the API changed.
from piff.
... although it doesn't look like there's any way in sklearn (or in DFM's george) to regress several functions (say, for fwhm, g1, g2, ...) simultaneously under a common covariance function. For a first implementation then, I suppose independent regressors may have to do.
from piff.
I'll just mention a nice paper by Gentile et al that gives a good description of GP (aka kriging), radial basis functions, inverse distance weighting and other interpolation schemes.
They are related in form (they all can be described with the formula $\hat z = \sum_i \lambda_i z(x_i)$), but with some non-trivial differences in how
I'm also not sure which ones of these could be extended to have
from piff.
Looks like Gentile et al are using independent regressors: Appendix A.1, step 2.
from piff.
Yes, that's how they did it. Some of the methods might be amenable to extension to vectors. But if not, probably PCAing the vectors and treating each principal component coefficient as a scalar would be appropriate.
For things like Kolmogorov with only a few parameters to fit (which are likely fairly orthogonal), there might not be much difference between the PCA approach and just the raw parameter values.
from piff.
probably PCAing the vectors and treating each principal component coefficient as a scalar would be appropriate
That reminds me... I think I've actually done this before to make an optics PSF emulator. Just need to switch the input params from [Z2, Z3, Z4, Z5, Z6, ...] to [u, v]...
from piff.
The development version of scikit-learn GaussianProcessRegressor does look like a significant improvement. I think I can cook up something on the latest stable version of sklearn pretty quickly, but we should definitely come back to this once the new sklearn is released.
from piff.
What's the purpose of kNNInterp.getFitProperties()
? Don't we always want to interpolate the entirety of star.fit.params
?
from piff.
You're very likely right. I can't think of a use case for it -- I think I
ended up coding it that way since it looked just like the getProperties,
only for star.fit instead of star.data. In fact, I think it's actually
bad, because it means some fit parameters could be ignored!
Edit: The only use case I can come up with is that maybe some other version of knn interp might want to interpolate some other set of properties that we are not actively fitting with our model (ie something that is in star but isn't star.fit.params). But even that doesn't actually jibe with the way it is actually written.
On Wed, Sep 21, 2016 at 1:56 PM Josh Meyers [email protected]
wrote:
What's the purpose of kNNInterp.getFitProperties()? Don't we always want
to interpolate the entirety of star.fit.params?—
You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub
#14 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABsw5IWs9eIH7m0ZpEJE_Tj7sagocFYvks5qsZnmgaJpZM4IPG8T
.
from piff.
I think I got the basic implementation of this with sklearn down now. Unfortunately, it doesn't work very well. The basic problem is that while sklearn gaussian_process will let you set the parameter that describes the noise variance in the regression (affectionately termed the "nugget"), there isn't any facility to optimize the log marginal likelihood w.r.t. the nugget in the same way that the log marginal likelihood is optimized w.r.t. the correlation length scale. It looks like george and development sklearn both do have this ability though.
So... I see a few options:
- Let the user set the nugget. This requires some sort of guess for how noisy the star.fit.params are, but is really easy for us to implement. I'll have to think about how to propagate the user-set nugget if we turn on PCA.
- Wrap existing sklearn optimization to also optimize the nugget.
- Use george, which is partially pip-installable but also depends on eigen, which might be a bit too steep of a dependency here? (though should be fast!)
- Use development sklearn before it's released.
- Wait for new sklearn.
I'm tempted to start with 1, add 2, and then add 3 as an optional enhancement+dependency.
from piff.
It looks like you're testing it against a polynomial variation, rather than something that is actually generated as a Gaussian process. Do we expect it to work well in this case? Polynomials kind of violate some of the assumptions that GP makes I think.
You could generate a real Gaussian field using the PowerSpectrum
class in GalSim. We normally use that for generating shear fields with some cosmological power spectrum. But you could also use it to generate size and shapes of PSFs that follow some input power spectrum. That might be a fairer test of a GP interpolator.
from piff.
Do we expect it to work well in this case?
I think it ought to work well for any sufficiently smooth function.
You could generate a real Gaussian field using the PowerSpectrum class in GalSim.
Agree that this would be an interesting test case.
One thing I've noticed is that the GP looks like it's trying to smooth the data more and more each iteration. The results are much nicer looking if you don't actually iterate.
from piff.
I've made some progress here. Caught a couple of bugs in Kolmogorov that were giving me problems before.
Here's the results for interpolating the g1 parameter for a Kolmogorov PSF, where the true g1 varies as a polynomial over the field (4 other PSF params not plotted are also varying over the field as polynomials):
Here's the same, but this time, the truth is generated from galsim.PowerSpectrum:
Not too good. But if you PCA the fitted PSF parameters first, and then only keep the first 2 PCs (which in this case explain ~99.3% of the parameter variance), you get:
I believe the improvement is due to the fact that the GP interpolator optimization step, which looks for a good correlation length, is better able to latch onto the signal (instead of noise) in the PCA case. The optimized inverse length scales are 0.002 without PCA, and 0.38 with.
I tried the same thing on the polynomial example, but there the difference was negligible, maybe due to the fact that the interpolation was already really good there.
from piff.
Cool stuff!
When you PCA the fitted PSF parameters, do you mean over star.fit.params?
Don't you have only 3 or 5 parameters for the Kolmogorov? (Are you doing
the centroid adjustments?) I'm surprised that essentially throwing out one
parameter makes such a dramatic improvement... Also, how are you generating
the terms in the PS case? Do they each get their own PS? (if size, g1, g2
are generated independently, how can a linear combination of them do
better? But on the other hand, if g1 and g2 are constrained by the overall
ellipticity length, maybe the pca is picking up on that constraint.)
On Wed, Sep 28, 2016 at 5:16 PM Josh Meyers [email protected]
wrote:
I've made some progress here. Caught a couple of bugs in Kolmogorov that
were giving me problems before.Here's the results for interpolating the g1 parameter for a Kolmogorov
PSF, where the true g1 varies as a polynomial over the field (4 other PSF
params not plotted are also varying over the field as polynomials):[image: f1]
https://cloud.githubusercontent.com/assets/3650485/18936655/fb4a5ad6-859d-11e6-8104-88f962db1f7f.pngHere's the same, but this time, the truth is generated from
galsim.PowerSpectrum:[image: f3]
https://cloud.githubusercontent.com/assets/3650485/18936686/3d47deb8-859e-11e6-9e5c-ccea8cde44b7.pngNot too good. But if you PCA the fitted PSF parameters first, and then
only keep the first 2 PCs (which in this case explain ~99.3% of the
parameter variance), you get:[image: f4]
https://cloud.githubusercontent.com/assets/3650485/18936701/67c1df04-859e-11e6-94fe-653c8244c67b.pngI believe the improvement is due to the fact that the GP interpolator
optimization step, which looks for a good correlation length, is better
able to latch onto the signal (instead of noise) in the PCA case. The
optimized inverse length scales are 0.002 without PCA, and 0.38 with.I tried the same thing on the polynomial example, but there the difference
was negligible, maybe due to the fact that the interpolation was already
really good there.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#14 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABsw5P9ytULvjYGWPI1KvmEQf_0wqHx_ks5quwNCgaJpZM4IPG8T
.
from piff.
When you PCA the fitted PSF parameters, do you mean over star.fit.params?
Yep. I'm doing the 5 param version (centroid is part of star.fit.params).
if size, g1, g2 are generated independently, how can a linear combination of them do better?
That's a good point. I was lazy for the GP tests and just set the true fwhm and centering params to constants. I'll try again generating them from their own distributions though.
from piff.
I was discussing this thread with @pmelchior and @TallJimbo on Tuesday when I was up at Princeton, and they thought there should be some place where you figure out what the correlation function of the GP is. If you aren't setting something explicit, they though sklearn is probably assuming a Gaussian (or "squared exponential" is apparently a preferred term to avoid confusion with the "Gaussian" of the GP).
In the Gentile et al discussion of kriging, they calculate something they call a variogram, gamma(h), which is connected to the correlation function C(h) as gamma(h) = C(h) - C(0). I think from what Peter and Jim were saying about this that C(0) is what sklearn calls the nugget.
Anyway, sklearn might be doing something semi-reasonable without an explicit variogram (or correlation function), but I suspect we'll want to calculate it ourselves and provide it explicitly.
from piff.
they thought there should be some place where you figure out what the correlation function of the GP is
Yep. This is done by the sklearn.gaussian_process.GaussianProcess.fit() method. Current sklearn only optimizes the correlation length of the squared exponential kernel (there are other kernels available, but I think squared exponential makes the most sense).
I'm less familiar with the "variogram" terminology, but you might be right that the nugget is at least related to C(0).
In the long run, I think we'll want at least some kind of prior over what kernel parameters are reasonable, potentially informed by something like observations of high stellar density fields. I don't know that we'll want to assert something outright, since it's quite likely the correlation function of PSF parameters varies exposure to exposure.
from piff.
I think squared exponential makes the most sense).
I don't think that's likely to be the appropriate kernel for the atmosphere. Why can't we just measure it directly from the data? We should have all the information we need to do so.
from piff.
I don't think that's likely to be the appropriate kernel for the atmosphere. Why can't we just measure it directly from the data? We should have all the information we need to do so.
Hmmm. That's not something I've ever seen done with a GP, but could be interesting to investigate. We could certainly try estimating the parameters of a von Karman correlation function (essentially r0 and an outer scale) if a non-parametric correlation function proves too tricky.
from piff.
Actually, from Heymans++12 eq. 8, it looks like a von Karman correlation function may be the same as a Matern(-5/6) correlation function.
from piff.
It is indeed surprising: GP models depend profoundly on the covariance/correlation/variogram function, and yet nobody seems to fit it as part of the inference. It's often "our data looks like that" or "this is what worked best", whatever either of those statements mean.
I haven't done a literature search, but I'd guess that this has at least been tried. My second guess is that this is a non-trivial modification of the GP likelihood. It's maybe easier to split this into an iterative solution where one
- solves the regression problem with GP given a covariance
- solves for the covariance given the residuals of the regression model
from piff.
Actually, from Heymans++12 eq. 8, it looks like a von Karman correlation function may be the same as a Matern(-5/6) correlation function.
Although... it's not clear to me why the correlation of the ellipticities should look like the correlation in an instantaneous phase screen.
from piff.
I'm pretty sure the Gentile et al implementation of kriging just involved measuring the correlations on the data directly. I guess I'm not following why this is not an easy thing to do. Is it just that sklearn doesn't have an easy way to put in an empirical correlation function?
BTW, for DES we've lately been looking at the PSF patterns of the presumably atmospheric component and there is a clear directionality to the streaks, so we've been thinking that we might even want the correlation function to be anisotropic. This I expect would be difficult to incorporate, since I think the code probably expects the kernel to be isotropic (i.e. C(r,theta) == C(r)). But at least getting the radial function from the data seems like it should be straightforward.
from piff.
Not sure about sklearn, but multivariate covariances are analytically treatable.
from piff.
I'm pretty sure anisotropic covariances are possible with sklearn. The theta
kwarg to GaussianProcess just becomes multidimensional.
from piff.
@rmjarvis , I'm tooling up to try and answer your question about injecting a measured covariance function instead of optimizing a parametric one. This involves rolling a new GP interpolator by hand (sklearn won't do it), which I think I've got more or less running now, at least for the isotropic case. (Shouldn't be too hard to extend this to anisotropic).
TreeCorr seems like a natural place to do the measurement part. I only looked at this briefly, but it seems like this only handles isotropic correlation functions, right? How much work would it be to implement anisotropic correlation function measurements? Or is there another tool I should look at?
I suppose I can start in the meantime by injecting the correlation function used to generate fake data into the GP and comparing that to an optimized parametric correlation function GP.
from piff.
TreeCorr seems like a natural place to do the measurement part. I only looked at this briefly, but it seems like this only handles isotropic correlation functions, right? How much work would it be to implement anisotropic correlation function measurements? Or is there another tool I should look at?
Yes, TreeCorr only does isotropic. Although, for the number of stars we have, I suspect the tree algorithm isn't that much of an advantage. Just doing the brute force correlation as a direct sum will already be fairly fast. (Probably even in python.) So for the anisotropic binning, we can just do that.
So probably go ahead and use TreeCorr for now for the isotropic one. I think we can leave the anisotropic one for a later issue. When we get to that, we can try to decide whether it makes more sense to add that feature to TreeCorr or just implement the direct sum in Piff.
from piff.
I've done the 2d 2PCF with all pairs in numpy (histogram2d) successfully. Takes a minute or so on my laptop to do all pairs of ~20k objects. Let me know if you want to see any code - it's nothing clever or fancy.
from piff.
yes that would be useful - otherwise I was going to do the same…
On Oct 6, 2016, at 10:20 AM, Gary Bernstein <[email protected]mailto:[email protected]> wrote:
I've done the 2d 2PCF with all pairs in numpy (histogram2d) successfully. Takes a minute or so on my laptop to do all pairs of ~20k objects. Let me know if you want to see any code - it's nothing clever or fancy.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/14#issuecomment-252030209, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEEa85R5lCsHVuTQlwZhgVHbqCpcSRWwks5qxS3ugaJpZM4IPG8T.
Prof. Aaron Roodman
Chair Dept. of Particle Physics & Astrophysics
SLAC National Accelerator Laboratory
Stanford University
SLAC National Accelerator Laboratory E-mail: [email protected]:[email protected]
2575 Sand Hill Rd. Phone: 650-926-2705
MS 29
Menlo Park, CA 94025 URL: http://www.slac.stanford.edu/~roodman
from piff.
def vcorr2d(x,y,dx,dy,
rmax=1., bins=513,
maxpts = 30000):
"""
Produce 2d 2-point correlation function of total displacement power
for the supplied sample of data, using brute-force pair counting.
Output are 2d arrays giving the 2PCF and then the number of pairs that
went into each bin. The 2PCF calculated is
xi_+ - <vr1 vr2 + vt1 vt2> = <vx1 vx2 + vy1 vy2>
Note that each pair is counted only once. So to count all pairs one can
average xi_+ with itself reflected about the origin.
"""
hrange = [ [-rmax,rmax], [-rmax,rmax] ]
if len(x) > maxpts:
# Subsample array to get desired number of points
rate = float(maxpts) / len(x)
print "Subsampling rate {:5.3f}%".format(rate*100.)
use = np.random.random(len(x)) <= rate
x = x[use]
y = y[use]
dx = dx[use]
dy = dy[use]
print "Length ",len(x)
# Get index arrays that make all unique pairs
i1, i2 = np.triu_indices(len(x))
# Omit self-pairs
use = i1!=i2
i1 = i1[use]
i2 = i2[use]
del use
# Make separation vectors and count pairs
yshift = y[i2]-y[i1]
xshift = x[i2]-x[i1]
counts = np.histogram2d(xshift,yshift, bins=bins, range=hrange)[0]
# Accumulate displacement sums
v = dx + 1j*dy
print 'xiplus' ##
vv = dx[i1] * dx[i2] + dy[i1] * dy[i2]
xiplus = np.histogram2d(xshift,yshift, bins=bins, range=hrange, weights=vv)[0]/counts
return xiplus,counts
This is a vector version (dx and dy) but a scalar version just drops dy.
from piff.
Thanks @gbernstein . That works nicely.
I've coded up a GP that can use nearest neighbor interpolation of an empirically measured correlation function. I keep running into problems trying to Cholesky solve the implied covariance matrix though. (See algorithm 2.1 here). Cholesky seems like the way to go for a symmetric positive semidefinite matrix, but maybe there are more stable algorithms?
I wonder if part of the problem, and maybe the reason people tend to use parametric covariance functions in general, is that to work as a covariance, a function must be positive semi-definite. (See equation 4.2 and surrounding text here). It's not obvious to me one way or the other if a measured correlation function will always properly behave this way, and I think it's even less clear if any kind of interpolation of a measured covariance will behave well. Note that the Kriging literature, and Gentile, et al. in particular, also appear to use parametric forms for their covariograms -- the so-called "authorized" covariograms.
from piff.
Naive Cholesky won't work unless you can guarantee positive definite (no semi-). There are symmetric indefinite factorizations that would be ideal, but they're sometimes hard to find, and a symmetric Eigensolver or SVD will also do the job (less efficiently).
from piff.
There are symmetric indefinite factorizations that would be ideal, but they're sometimes hard to find, and a symmetric Eigensolver or SVD will also do the job (less efficiently).
TMV has that, so if you're willing to call out to a C/C++ function to do some work, we could use that. It does an L D LT decomposition, where L is a lower triangle matrix with unit diagonal, and D is a block diagonal matrix with 1x1 and 2x2 blocks. Any symmetric matrix can be so decomposed. If positive semi-definite, then I think the diagonals will end up all 1x1 blocks. Negative eigenvalues imply some 2x2 blocks.
from piff.
I think I have the GP using the measured covariance function more or less working now. I ended up using numpy SVD, and had to put an upper limit on the singular values to robustly invert the matrix. (Maybe TMV can do that too?) There are a lot of potential variations to investigate, so I don't think I can say yet how this compares to using a parametric form for the covariance, but it works well enough that I think it's worth including here as an option.
It occurs to me that we're going to have to do something more sophisticated to use GPs on real data. Thinking ahead to LSST, for each exposure we'd like to interpolate the PSF measured from ~10^4.5 stars to the positions of ~10^6 galaxies, which formally implies inverting at least one ~10^6 x 10^6 matrix, and maybe more if we think the covariances of distinct PSF parameters are different. We can probably reduce the complexity by independently working on subregions of the focal plane, but that will require some more thought/experimentation. One thing that might help is if there are any clever ways given a constant symmetric NxN matrix to then invert many symmetric N+1 x N+1 matrices each with one additional column/row.
from piff.
One thing that might help is if there are any clever ways given a constant symmetric NxN matrix to then invert many symmetric N+1 x N+1 matrices each with one additional column/row.
There is for the Cholesky case. Also for the case where your covariance matrix is defined as X.T * X for some matrix X. (You can do a QR decomposition: X = Q*R rather than directly calculating S = X.T * X and then doing Cholesky. It's both faster and more numerically stable.)
In the latter case, you can think about this as adding a single row to X and finding the new R value (upper triangle matrix, corresponding to L.T in the normal Cholesky presentation).
And yes, TMV does this. :) It's even one of the few algorithms where TMV is faster than LAPACK.
from piff.
Umm actually.... I think I meant invert a Nstar x Nstar matrix, so maybe not quite as hard as I was thinking.
Also, just noticed that the latest scikit-learn (0.18) was released two weeks ago, and has many nice new GP features. It's probably worth finding out how this fares with ~10^4.5 training points and ~10^6 prediction points.
Two notable omissions though:
- their radial kernels can have anisotropic scale-lengths, but only ones that are axis-aligned.
- no directly measured and asserted covariances.
The code looks quite understandable though, so I'm optimistic I could monkey patch these into the rest of the sklearn framework.
from piff.
I'd bet they would consider pull requests too if it's close but needs a feature or two.
from piff.
Quick note:
For 35000x35000 matrix (i.e., 35000 training PSFs, aka stars):
time to cholesky in george using default solver (i.e., using numpy.linalg): 600s
time to cholesky in george using HODLR solver (see http://dan.iel.fm/george/current/user/solvers/#george.HODLRSolver): 1s
I haven't explicitly timed it yet, but I believe sklearn is also using numpy's cholesky, so probably similar speed to the slow version of george.
Since I think 35,000 is roughly the size of problem we're anticipating for LSST, I'm now advocating we go with george in spite of its additional prerequisites (eigen).
from piff.
eigen is a pretty lightweight dependency (it is of course header only).
from piff.
Wow, that's insanely fast!
from piff.
Ugg. Looks like I was a little too quick to speak (again...).
While HODLR is indeed ~600x faster for some covariance functions, it seems the relative performance of HODLR and np.linalg depends a lot on the scale-length of the covariance. Weirdly, HODLR seems at its best the more dense the covariance matrix is (the longer the scale-length is). Will keep investigating... Regardless, it would be nice to have the option of testing different solvers using george.
from piff.
eigen is a pretty lightweight dependency (it is of course header only).
But it's not pip installable. For now at least, that's a requirement of piff.
from piff.
The interfaces for sklearn and george are pretty similar. I bet it's not too hard to just wrap both.
It's pretty hard to say whether it's worth having the HODLR solver, or worth making other kinds of approximations, without some actual data available to see what the actual correlations look like.
from piff.
Hey @rmjarvis. Just pinging you again about this issue. Any more comments?
from piff.
Thanks for the reminder. For some reason, I thought I was waiting for you for something on this. It looks fine to me. There was one more test that I had to loosen the tolerance for slightly on my laptop, but I think it's probably fine. I'll go ahead and merge it.
from piff.
No worries. My DM overlords will be happy to hear, and push around story points :)
from piff.
Related Issues (20)
- Reading WCS solution of ZTF images HOT 1
- Stars Center on the stamps HOT 7
- psf.draw | wcs is not a list for single chip fit HOT 5
- Add equality testing HOT 1
- very large PSF (after re-installation) | Bad fit HOT 6
- regularize in a hacky way HOT 3
- Examples in documentation & `examples/` fail for v1.2 HOT 3
- Piff catastrophically fails when PixelGrid size is larger than input star images HOT 3
- Inconsistent piff log message levels HOT 2
- Fix max_snr weight adjustment.
- Make it easy to run just the star selection without fitting a PSF model
- incorrect error message with missing keyword
- coadd_object_id in psf_stars extension is stored as an f8 instead of an i8 HOT 3
- Add optional high order moments to the HSM output file. HOT 4
- Let weight, badpix be different files from image.
- Strange PSF fitting HOT 5
- Request: Release new version
- test suite needs pytest<8 and galsim<2.5 HOT 1
- Add option to outlier-reject reserve stars at the end of the processing HOT 1
- Serialization for embedding in larger (Rubin) files HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from piff.