GithubHelp home page GithubHelp logo

Dump FK Table comparison about pineko HOT 19 CLOSED

felixhekhorn avatar felixhekhorn commented on August 22, 2024
Dump FK Table comparison

from pineko.

Comments (19)

cschwan avatar cschwan commented on August 22, 2024 2

@alecandido agreed, but I think we can have it both machine and human readable while only deviating a little from what we already do:

  • table headers get prefixed with "# " (comment character)
  • content left as is

The resulting table should be easily readable with numpy.loadtxt.

from pineko.

cschwan avatar cschwan commented on August 22, 2024
* rename metadata `results` to  `results_grid`

What's the idea here?

from pineko.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

we can also just leave it - but then there will be results and results_fk so it's a bit inconsistent (since now there are two results ...)

from pineko.

alecandido avatar alecandido commented on August 22, 2024

What's the idea here?

The point is that in rr we store the comparison at the grid level, that is the main and only comparison there, but when you come to pineko the new one is the FkTable.

For sure we don't want to change rr because there will pineappl at some point, since rr is upstream, and independent of what happens afterwards. Then it makes sense to change it afterwards.

from pineko.

cschwan avatar cschwan commented on August 22, 2024

we can also just leave it - but then there will be results and results_fk so it's a bit inconsistent (since now there are two results ...)

I'd argue for that, because results are the true results (remember there are two entries MC and PineAPPL), and results_fk are the results with errors from the evolution. It also makes sense when you compare a grid at the process scale and an FK table; both results should be the same. And in the FK table there should be both, of course.

Moreover, results_fk should show a different table:

  • scale variations, which should be shown in results don't make any sense in results_fk
  • instead you should compare pre- and post-evolution results and just show the difference in per mille.

So the table should look as follows:

------------------------------------
    Pineko       PineAPPL   central
                             1/1000
------------------------------------
 7.218832e-01  7.218831e-01   0.0000
 7.142003e-01  7.142003e-01   0.0000
 7.928799e-01  7.928799e-01   0.0000
 7.850020e-01  7.850020e-01   0.0000
 7.728221e-01  7.728221e-01   0.0000
 8.654099e-01  8.654099e-01   0.0000

from pineko.

alecandido avatar alecandido commented on August 22, 2024

Fine for me, I was thinking that in some sense results is the main focus, and it is a bit weird for the FkTable. Nevertheless, we can keep also as it is, it's not going to hurt anything.

from pineko.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

So the table should look as follows:

and in fact, it does (quasi):

$ python -mpineko compare data/mydis.pineappl data/myfktable.mydis.pineappl.lz4  CT14llo_NF4
LHAPDF 6.3.0 loading /home/felix/local/share/LHAPDF/CT14llo_NF4/CT14llo_NF4_0000.dat
CT14llo_NF4 PDF set, member #0, version 1; LHAPDF ID = 13207
       O0 left    O0 right       O1 left      O1 right   PineAPPL    FkTable  percent_error
0    50.000000   50.000000  2.000000e-07  2.000000e-07  12.826575  12.826106      -0.003654
1    50.000000   50.000000  6.984209e-07  6.984209e-07   9.550602   9.549435      -0.012216
2    50.000000   50.000000  2.438943e-06  2.438943e-06   7.047279   7.046536      -0.010548
3    50.000000   50.000000  8.516807e-06  8.516807e-06   5.152585   5.152081      -0.009781
4    50.000000   50.000000  2.973850e-05  2.973850e-05   3.728779   3.728443      -0.008991
5    50.000000   50.000000  1.038117e-04  1.038117e-04   2.665475   2.665247      -0.008558
6    50.000000   50.000000  3.620545e-04  3.620545e-04   1.874623   1.874467      -0.008317
7    50.000000   50.000000  1.258680e-03  1.258680e-03   1.290291   1.290190      -0.007889
8    50.000000   50.000000  4.328501e-03  4.328501e-03   0.875369   0.875307      -0.007070
9    50.000000   50.000000  1.437507e-02  1.437507e-02   0.615639   0.615608      -0.005093
10   50.000000   50.000000  4.341492e-02  4.341492e-02   0.463775   0.463782       0.001509
11   50.000000   50.000000  1.091438e-01  1.091438e-01   0.361635   0.361633      -0.000531
12   50.000000   50.000000  2.195041e-01  2.195041e-01   0.288607   0.288606      -0.000214
13   50.000000   50.000000  3.668753e-01  3.668753e-01   0.195669   0.195669       0.000099
14   50.000000   50.000000  5.397572e-01  5.397572e-01   0.089400   0.089399      -0.001851
15   50.000000   50.000000  7.295868e-01  7.295868e-01   0.018904   0.018904       0.003388
16   50.000000   50.000000  9.309441e-01  9.309441e-01   0.000235   0.000236       0.515855
17   46.538363   46.538363  1.091438e-01  1.091438e-01   0.361349   0.361348      -0.000401
18   85.950595   85.950595  1.091438e-01  1.091438e-01   0.363406   0.363405      -0.000486
19  158.740105  158.740105  1.091438e-01  1.091438e-01   0.364716   0.364714      -0.000363

Note that the bin informations are also present here - something I also wish for results because otherwise I can't compare to the data from @scarlehoff who does apply cuts, and thus, e.g., presents only numbers for bin 6 onwards ...

from pineko.

cschwan avatar cschwan commented on August 22, 2024

This looks good! However, I'd suggest to change a few details, namely

  • to print the differences in per mille instead of in per cent
  • instead of O0 left and O0 right you could print
    • the actual observable names that are in the metadata,
    • and print the observable names for both columns and leave out left and right

from pineko.

cschwan avatar cschwan commented on August 22, 2024

@alecandido @felixhekhorn another completely unrelated question that also came up during the PC: in DIS the left and right bin boundaries are the same, but I wonder is this really the case also in the data? Don't we have bins in this case?

from pineko.

alecandido avatar alecandido commented on August 22, 2024
  • the actual observable names that are in the metadata,

unfortunately to do this from python we still need to merge NNPDF/pineappl#74 (we do not have access in the python interface to metadata, but we are going to have)

@alecandido @felixhekhorn another completely unrelated question that also came up during the PC: in DIS the left and right bin boundaries are the same, but I wonder is this really the case also in the data? Don't we have bins in this case?

This I don't know, we should check commondata, with which I'm not very familiar... (yet)

from pineko.

cschwan avatar cschwan commented on August 22, 2024
  • the actual observable names that are in the metadata,

unfortunately to do this from python we still need to merge N3PDF/pineappl#74 (we do not have access in the python interface to metadata, but we are going to have)

I see, I'll prioritize this Issue now.

from pineko.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

in DIS the left and right bin boundaries are the same, but I wonder is this really the case also in the data? Don't we have bins in this case?

I don't think so - at least for the HERA fully inclusive stuff: https://www.hepdata.net/record/ins1377206?version=1&table=Table%202 (I didn't check the paper fully though ... and I know in practice this is rather required)

from pineko.

cschwan avatar cschwan commented on August 22, 2024

Given what we've discussed last PC, I think we want the following information in the metadata that's not yet in there:

  • MC uncertainty
  • total error (difference between MC result and FK table result using the same PDF). This might require a standardized form for the results metadata and a new results_pdfset entry

from pineko.

alecandido avatar alecandido commented on August 22, 2024
  • total error (difference between MC result and FK table result using the same PDF). This might require a standardized form for the results metadata and a new results_pdfset entry

If we're going to have a new standardized form results entry, I would make it in whatever standard format (even csv, with whatever sep) in order to make as easy as possible to machine parse it.
Now they are more optimized for human reading, but in case we need to read them frequently I would make a printer for it. In case it is more generally needed, I'd include the printer in pineappl_cli.

from pineko.

alecandido avatar alecandido commented on August 22, 2024

@felixhekhorn if you can, try to address this as well during your pineko transition.

Since you're moving comparison from fkutils (if it were not already here), now the information should be available locally, so it's just a matter of dumping it in the metadata during convolution.

from pineko.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

This looks good! However, I'd suggest to change a few details, namely

* to print the differences in per mille instead of in per cent

* instead of `O0 left` and `O0 right` you could print
  
  * the actual observable names that are in the metadata,
  * and print the observable names for both columns and leave out `left` and `right`

@cschwan here is what I currently have:

$ pineappl info --get results_fk data/fktables/213/LHCB_DY_13TEV_DIELECTRON.pineappl.lz4 
    yll left  yll right    PineAPPL     FkTable  permille_error
0      2.000      2.125   10.337989   10.330937       -0.682085
1      2.125      2.250   30.353310   30.334170       -0.630583
2      2.250      2.375   49.379146   49.355890       -0.470961
3      2.375      2.500   66.899600   66.879018       -0.307656
4      2.500      2.625   82.655479   82.635160       -0.245826
5      2.625      2.750   96.121822   96.097904       -0.248829
6      2.750      2.875  107.081167  107.053141       -0.261728
7      2.875      3.000  115.135111  115.104099       -0.269357
8      3.000      3.125  120.077337  120.044468       -0.273733
9      3.125      3.250  121.560013  121.526957       -0.271931
10     3.250      3.375  113.520790  113.489930       -0.271838
11     3.375      3.500   96.972783   96.946347       -0.272608
12     3.500      3.625   79.585250   79.563615       -0.271837
13     3.625      3.750   62.180007   62.162765       -0.277289
14     3.750      3.875   45.516016   45.502772       -0.290960
15     3.875      4.000   30.629132   30.619719       -0.307349
16     4.000      4.250   13.755959   13.751858       -0.298148
$ pineappl info --get results_fk_pdfset data/fktables/213/LHCB_DY_13TEV_DIELECTRON.pineappl.lz4 
NNPDF40_lo_as_01180

I can not drop "left/right" since the columns in data frames have to have unique names - alternatives are "min/max" (3 letters) or "l/r" (cut to 1 letter)

from pineko.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

@alecandido agreed, but I think we can have it both machine and human readable while only deviating a little from what we already do:

* table headers get prefixed with "# " (comment character)

* content left as is

The resulting table should be easily readable with numpy.loadtxt.

adding "#" is not necessary, I think, as np.loadtxt explicitly exposes a skiprows argument

from pineko.

alecandido avatar alecandido commented on August 22, 2024

Just to tell you: if your table is not so small, always consider using pd.read_csv, since it might be much faster than np.loadtxt (a couple of years ago it was, and I didn't expect anything to have changed in the meanwhile).

from pineko.

cschwan avatar cschwan commented on August 22, 2024

adding "#" is not necessary, I think, as np.loadtxt explicitly exposes a skiprows argument

It's not necessary, but it would be nice to use the functions out-of-the-box; also note that we might consider changing the header at some point (the number of lines, add some other piece of information), or that we'd like to support other libraries. To have some flexibility I'd therefore strongly suggest to use a comment character. This should be mirrored by the results metadata field, of course.

To implement it you could set the header of the left-most column to # and left-align it.

from pineko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.