Comments (8)
I was thinking of having at least 3 possible ways of specifying the jackknife regions:
- user-specified regions given in the input catalog
- a specified number of kmeans regions
- a specified number of random regions
I'm still mulling over what the best user interface would be for this, so if you have ideas about how it should work, please feel free to comment here.
from treecorr.
Working on the Balrog paper, I've set up some wrappers to do automatic jackknifing with TreeCorr. I'm not saying what I have is the ideal interface, but I figured I'd share some thoughts that might be useful.
I do all my jackknifing with Erin Sheldon's k-means, and there's two options of how to specify the regions. Either you give it N, the number of regions to generate, or you give it a file that you've generated yourself, to read the k-means centers. I think something like this allows a reasonable range of functionality, at least as far as k-means is concerned.
Also, something which I find useful, is to be able to return the full vector of correlation functions over the jackknife realizations, in addition to just the covariance and the "full region" results. So I think it'd be nice to at least optionally return this object if you specify some keyword.
I can share code/ideas if that's useful, and would even be happy to contribute to the repository if useful. (...After I'm done writing my thesis.)
from treecorr.
In the WZ taskforce there is definite interest in this. We currently have generated our own jackknife regions (so every dataset now has some jackknife parameter in addition to RA, DEC). What we do is then use treecorr to calculate D1D2(theta) for each of the combinations of jackknife pairs, skipping combinations which are obviously separated too far. Then we combine them together at the end of the day to create the jackknife samples (e.g. for Jackknife==1, add all the D1D2(theta) from jackknife pairs that do NOT have Jackknife==1 in it).
This is of course grossly inefficient, since treecorr could be doing it for you!
The thought we had was that if we told treecorr that we have N jackknife regions, it then generates N+1 D1D2(theta)'s. Then, when you go through your paircounts, you just add to all the paircounts except the one you want to skip.
Looking through BinnedCorr2.cpp, I think this amounts to changing the following:
_npairs[k] += nn;
to something like:
int j1 = c1.getJackknife();
int j2 = c2.getJackknife();
for (int i = 0; i<_njackknives+1; i++) {
if (j1 != i) & (j2 != i) {
_npairs[i,k] += nn;
}
}
where now _npairs needs to have _njackknives+1 rows of D1D2(theta).
Does this mess with the tree algorithm? I guess it would force your tree to have end nodes that only include unique jackknives...
Which is all a long way to saying that I like Christopher Bonnett's way, where you pass it the regions in your input catalog. We also need to be able to access the individual realizations
from treecorr.
Your proposal would require each cell to have a unique jackknife region. Which really means that each region would need to generate their trees separately, so you'd have nregion
trees. This in turn implies that you might as well do this in python, rather than in C++, since there isn't really any advantage to having the C++ layer know about the regions.
For that reason, my intention on this issue was to implement all this in Python, not C++, more or less as you described. So I don't think what you are doing is "grossly inefficient" in that I don't think you are doing any extra calculations that could be saved if TreeCorr were smarter.
The main inefficiency is that users currently need to implement all that themselves, so there is user inefficiency in that, which I would like to fix by incorporating that algorithm into TreeCorr natively. If you'd like to take a stab at adding that feature using the code you are currently using in the WZ taskforce, feel free to send a pull request. :)
Or if you send me the algorithm you are using, that would also be helpful, since it would probably save me some debugging time.
Note: One thing that I'd like to enable when implementing this feature is a keyword to select between galaxy jackknife and pair jackknife. cf Oliver Friedrich's analysis about the difference between the two.
Also, I'd planned to start with Chris's proposal of giving region numbers (or ids) in the catalog, but I'd probably also want to add an option to automatically generate regions using kmeans or somesuch.
from treecorr.
This is done in PR #96.
from treecorr.
Oh yeah !
from treecorr.
I know. Finally, right? Only 5+ years later...
from treecorr.
Will be included in v4.1.0
from treecorr.
Related Issues (20)
- Avoiding repeated writing of identical patches with save_patch_dir HOT 1
- MPI Crash when many patches empty HOT 8
- Computing NN correlation from simulated catalogues without a random catalogue HOT 5
- NNCorrelation error under MPI in 4.2.0 HOT 2
- Bug when using patches and the Rlens metric HOT 2
- Access correlation function for jackknife HOT 2
- multiply-occurring objects bias results low HOT 4
- Installing error on windows HOT 17
- Measuring the correlation function xi by patches HOT 4
- Problem with NN_correlation when setting low_mem=True HOT 1
- NG doesn't work as expected in simulation box with x, y, z HOT 7
- Computing "scaled counts-in-spheres" HOT 4
- TwoD Binning in Rperp metric. HOT 2
- Feature request: create Catalog using derived quantities HOT 3
- possible weight issue for 2pt correlation with cartesian coordinates HOT 2
- bin_slop definition and implementation HOT 7
- Let varg, vark be specifiable by the user, rather than computed
- Outlier point in NN Correlation HOT 3
- is bin_slop in log or linear space when using log binning? HOT 3
- write_patch_results HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from treecorr.