GithubHelp home page GithubHelp logo

Comments (8)

rmjarvis avatar rmjarvis commented on June 14, 2024

I was thinking of having at least 3 possible ways of specifying the jackknife regions:

  • user-specified regions given in the input catalog
  • a specified number of kmeans regions
  • a specified number of random regions

I'm still mulling over what the best user interface would be for this, so if you have ideas about how it should work, please feel free to comment here.

from treecorr.

suchyta1 avatar suchyta1 commented on June 14, 2024

Working on the Balrog paper, I've set up some wrappers to do automatic jackknifing with TreeCorr. I'm not saying what I have is the ideal interface, but I figured I'd share some thoughts that might be useful.

I do all my jackknifing with Erin Sheldon's k-means, and there's two options of how to specify the regions. Either you give it N, the number of regions to generate, or you give it a file that you've generated yourself, to read the k-means centers. I think something like this allows a reasonable range of functionality, at least as far as k-means is concerned.

Also, something which I find useful, is to be able to return the full vector of correlation functions over the jackknife realizations, in addition to just the covariance and the "full region" results. So I think it'd be nice to at least optionally return this object if you specify some keyword.

I can share code/ideas if that's useful, and would even be happy to contribute to the repository if useful. (...After I'm done writing my thesis.)

from treecorr.

cpadavis avatar cpadavis commented on June 14, 2024

In the WZ taskforce there is definite interest in this. We currently have generated our own jackknife regions (so every dataset now has some jackknife parameter in addition to RA, DEC). What we do is then use treecorr to calculate D1D2(theta) for each of the combinations of jackknife pairs, skipping combinations which are obviously separated too far. Then we combine them together at the end of the day to create the jackknife samples (e.g. for Jackknife==1, add all the D1D2(theta) from jackknife pairs that do NOT have Jackknife==1 in it).

This is of course grossly inefficient, since treecorr could be doing it for you!

The thought we had was that if we told treecorr that we have N jackknife regions, it then generates N+1 D1D2(theta)'s. Then, when you go through your paircounts, you just add to all the paircounts except the one you want to skip.

Looking through BinnedCorr2.cpp, I think this amounts to changing the following:

_npairs[k] += nn;

to something like:

int j1 = c1.getJackknife();
int j2 = c2.getJackknife();
for (int i = 0; i<_njackknives+1; i++) {
    if (j1 != i) & (j2 != i) {
        _npairs[i,k] += nn;
    }
}

where now _npairs needs to have _njackknives+1 rows of D1D2(theta).

Does this mess with the tree algorithm? I guess it would force your tree to have end nodes that only include unique jackknives...

Which is all a long way to saying that I like Christopher Bonnett's way, where you pass it the regions in your input catalog. We also need to be able to access the individual realizations

from treecorr.

rmjarvis avatar rmjarvis commented on June 14, 2024

Your proposal would require each cell to have a unique jackknife region. Which really means that each region would need to generate their trees separately, so you'd have nregion trees. This in turn implies that you might as well do this in python, rather than in C++, since there isn't really any advantage to having the C++ layer know about the regions.

For that reason, my intention on this issue was to implement all this in Python, not C++, more or less as you described. So I don't think what you are doing is "grossly inefficient" in that I don't think you are doing any extra calculations that could be saved if TreeCorr were smarter.

The main inefficiency is that users currently need to implement all that themselves, so there is user inefficiency in that, which I would like to fix by incorporating that algorithm into TreeCorr natively. If you'd like to take a stab at adding that feature using the code you are currently using in the WZ taskforce, feel free to send a pull request. :)

Or if you send me the algorithm you are using, that would also be helpful, since it would probably save me some debugging time.

Note: One thing that I'd like to enable when implementing this feature is a keyword to select between galaxy jackknife and pair jackknife. cf Oliver Friedrich's analysis about the difference between the two.

Also, I'd planned to start with Chris's proposal of giving region numbers (or ids) in the catalog, but I'd probably also want to add an option to automatically generate regions using kmeans or somesuch.

from treecorr.

rmjarvis avatar rmjarvis commented on June 14, 2024

This is done in PR #96.

from treecorr.

cbonnett avatar cbonnett commented on June 14, 2024

Oh yeah !

from treecorr.

rmjarvis avatar rmjarvis commented on June 14, 2024

I know. Finally, right? Only 5+ years later...

from treecorr.

rmjarvis avatar rmjarvis commented on June 14, 2024

Will be included in v4.1.0

from treecorr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.