Have to code internally compute the jackknife error. Ideally the code can split th

This is done in PR <a class="issue-link js-issue-link" data-error-text="Failed to load

Feature request : Integrated jack-knife error about treecorr HOT 8 CLOSED

cbonnett commented on June 14, 2024

Feature request : Integrated jack-knife error

from treecorr.

Comments (8)

rmjarvis commented on June 14, 2024

I was thinking of having at least 3 possible ways of specifying the jackknife regions:

user-specified regions given in the input catalog
a specified number of kmeans regions
a specified number of random regions

I'm still mulling over what the best user interface would be for this, so if you have ideas about how it should work, please feel free to comment here.

from treecorr.

suchyta1 commented on June 14, 2024

Working on the Balrog paper, I've set up some wrappers to do automatic jackknifing with TreeCorr. I'm not saying what I have is the ideal interface, but I figured I'd share some thoughts that might be useful.

I do all my jackknifing with Erin Sheldon's k-means, and there's two options of how to specify the regions. Either you give it N, the number of regions to generate, or you give it a file that you've generated yourself, to read the k-means centers. I think something like this allows a reasonable range of functionality, at least as far as k-means is concerned.

Also, something which I find useful, is to be able to return the full vector of correlation functions over the jackknife realizations, in addition to just the covariance and the "full region" results. So I think it'd be nice to at least optionally return this object if you specify some keyword.

I can share code/ideas if that's useful, and would even be happy to contribute to the repository if useful. (...After I'm done writing my thesis.)

from treecorr.

cpadavis commented on June 14, 2024

In the WZ taskforce there is definite interest in this. We currently have generated our own jackknife regions (so every dataset now has some jackknife parameter in addition to RA, DEC). What we do is then use treecorr to calculate D1D2(theta) for each of the combinations of jackknife pairs, skipping combinations which are obviously separated too far. Then we combine them together at the end of the day to create the jackknife samples (e.g. for Jackknife==1, add all the D1D2(theta) from jackknife pairs that do NOT have Jackknife==1 in it).

This is of course grossly inefficient, since treecorr could be doing it for you!

The thought we had was that if we told treecorr that we have N jackknife regions, it then generates N+1 D1D2(theta)'s. Then, when you go through your paircounts, you just add to all the paircounts except the one you want to skip.

Looking through BinnedCorr2.cpp, I think this amounts to changing the following:

_npairs[k] += nn;

to something like:

int j1 = c1.getJackknife();
int j2 = c2.getJackknife();
for (int i = 0; i<_njackknives+1; i++) {
    if (j1 != i) & (j2 != i) {
        _npairs[i,k] += nn;
    }
}

where now _npairs needs to have _njackknives+1 rows of D1D2(theta).

Does this mess with the tree algorithm? I guess it would force your tree to have end nodes that only include unique jackknives...

Which is all a long way to saying that I like Christopher Bonnett's way, where you pass it the regions in your input catalog. We also need to be able to access the individual realizations

from treecorr.

rmjarvis commented on June 14, 2024

Your proposal would require each cell to have a unique jackknife region. Which really means that each region would need to generate their trees separately, so you'd have nregion trees. This in turn implies that you might as well do this in python, rather than in C++, since there isn't really any advantage to having the C++ layer know about the regions.

For that reason, my intention on this issue was to implement all this in Python, not C++, more or less as you described. So I don't think what you are doing is "grossly inefficient" in that I don't think you are doing any extra calculations that could be saved if TreeCorr were smarter.

The main inefficiency is that users currently need to implement all that themselves, so there is user inefficiency in that, which I would like to fix by incorporating that algorithm into TreeCorr natively. If you'd like to take a stab at adding that feature using the code you are currently using in the WZ taskforce, feel free to send a pull request. :)

Or if you send me the algorithm you are using, that would also be helpful, since it would probably save me some debugging time.

Note: One thing that I'd like to enable when implementing this feature is a keyword to select between galaxy jackknife and pair jackknife. cf Oliver Friedrich's analysis about the difference between the two.

Also, I'd planned to start with Chris's proposal of giving region numbers (or ids) in the catalog, but I'd probably also want to add an option to automatically generate regions using kmeans or somesuch.

from treecorr.

rmjarvis commented on June 14, 2024

This is done in PR #96.

from treecorr.

cbonnett commented on June 14, 2024

Oh yeah !

from treecorr.

rmjarvis commented on June 14, 2024

I know. Finally, right? Only 5+ years later...

from treecorr.

rmjarvis commented on June 14, 2024

Will be included in v4.1.0

from treecorr.

Feature request : Integrated jack-knife error about treecorr HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs