kit-cms / crown Goto Github PK

View Code? Open in Web Editor NEW

4.0 7.0 21.0 50.64 MB

C++ based ROOT Workflow for N-tuples (CROWN)

Home Page: https://crown.readthedocs.io

License: MIT License

C++ 65.52% Shell 0.80% CMake 1.94% Python 31.65% Dockerfile 0.08%

root cms particle-physics analysis hep hep-ex ntuples wlcg

crown's People

Contributors

Stargazers

Watchers

Forkers

chrisburr eguiraud khaosmos93 julius-heitkoetter de-cristo zhiyuanlcern xiaohu-cern nfaltermann probdbro conformist89 harrypuuter moritzmolch arturakh jdriesch tzela98 demuller qyguo

crown's Issues

[BUG] Jet pt ordering

Describe the bug
The algorithm to sort jets by their transverse momentum is working wrong.

CROWN/src/jets.hxx

Lines 75 to 81 in 468666c

 auto good_jets_pt = 

 ROOT::VecOps::Where(jetmask > 0, jet_pt, (float)0.); 

 Logger::get("OrderJetsByPt")->debug("Jetpt after {}", good_jets_pt); 

 // we have to convert the result into an RVec of ints since argsort 

 // gives back an unsigned long vector 

 auto temp = 

 ROOT::VecOps::Argsort(ROOT::VecOps::Nonzero(good_jets_pt));

VecOps::Argsort is sorting here the indices of the jets with non-zero pt and not the jet pt itself. This leads to a wrong ordering.

Trigger stuff

Trigger object matching
Writeout of trigger flags
Writeout of trigger scale factors

Consider to propagate global key to individual channels

Track unused configuration parameters

Is your feature request related to a problem? Please describe.
a functionality to track an report on configuration parameters, that are not used anywhere

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

Check if files are available [Request]

Currently, it can happen that if a remote file is used as an input file, that CROWN fails with an ugly segfault if the file is not available or accessible. This should be handled in a clear way, by checking beforehand if all input files are available.

Also the error message for local files that are not found is also a bit cryptic at the moment

[2022-01-24 15:28:09.758] [main] [info] input_file 1: /i/do/not/exist.root
[2022-01-24 15:28:09.758] [main] [info] Output directory: test.root
[2022-01-24 15:28:09.787] [main] [info] Starting Setup of Dataframe
terminate called after throwing an instance of 'std::runtime_error'
  what():  GetBranchNames: error in opening the tree Events
[2]    819003 abort      ./tagandprobe_data_2018 test.root /i/do/not/exist.root

Git repo status sanity

Describe the bug
Depending on the setup, it can happen that the git check of the repository fails. Additionally, this only includes the status of the main repository, not the configuration. Rework this meta-information to contain all information necessary for a reproduction

https://github.com/KIT-CMS/CROWN/blob/main/code_generation/code_generation.py#L217-L221

mm channel, zptmassreiweighting

bug appears in ntuple production, with debug=true it states:
Warning in TStreamerInfo::BuildCheck:
The StreamerInfo of class RooBinningCategory read from file data/zpt/htt_scalefactors_legacy_2018.root
has the same version (=1) as the active class but a different checksum.
You should update the version to ClassDef(RooBinningCategory,2).
Do not try to write objects with the current class definition,
the files will not be readable.

Warning in TStreamerInfo::CompareContent: The following data member of
the on-file layout version 1 of class 'RooBinningCategory' differs from
the in-memory layout version 1:
RooTemplateProxy _inputVar; //
vs
RooTemplateProxy _inputVar; //
Error in TClass::RegisterStreamerInfo: Register StreamerInfo for RooBinningCategory on non-empty slot (1).

[Request] cmake settings concerning shifts

Add some way of configuration to the cmake build command, to make it easy to

produce nominal only
produce only a specific shift
produce only a set of shifts

Selections for good Jets

Fix Introduction in the Documentation

In the 'Running the Framework' section the -DANALYSIS=config -DSAMPLES=samples flags need to be specified / how the config and samples are handled is unclear.

Extend Producer calls to be able to treat different inputs with the same producer instance

Modify the code, so it is possible to change the inputs based on the scope. If a list is provided as input, we concert this internally to a dict, and always use a scope dict for the inputs.

E.g. for q_1

q_1 = Producer(
    name="q_1",
    call="quantities::charge({df}, {output}, 0, {input})",
    input={"mt": [q.ditaupair, nanoAOD.Muon_charge],
           "et": [q.ditaupair, nanoAOD.Electron_charge],
           "tt": [q.ditaupair, nanoAOD.Tau_charge],
           "em": [q.ditaupair, nanoAOD.Electron_charge], },
    output=[q.q_1],
    scopes=["mt", "et", "tt", "em"],
)

Convert p4 quantities to floats

Hier muesst ihr aufpassen, das schreibt euch ein double raus und ihr habt am Ende evtl ein 2x so grosses ntuple ;) Selbes gilt auch fuer die anderen quantities unten.

        [](const ROOT::Math::PtEtaPhiMVector &p4) { return (float)p4.pt(); },

Originally posted by @stwunsch in #17 (comment)

Remove README and link to docs

Selections for good Electrons

Additional Lepton Veto

[Request] Dataset Management and Batch System Submission

Is your feature request related to a problem? Please describe.

We need a clear way, on how to treat and handle different datasets, and how we want the batch submission to look like exactly.

I guess this would be the time for a new brain storming session on how we want to handle this

Tau ID scale factor dependence (pt,eta,dm) the same for all channels?

Modification Rule to remove Producers from a ProducerGroup

Currently, a call ob something like

configuration.add_modification_rule(
        channels,
        RemoveProducer(producers=genparticles.gen_taujet_pt_2, samples=["emb"]),
    )

does not work, since genparticles.gen_taujet_pt_2 is included within a producerGroup, and only Producers added on the toplevel can be removed via this function. Consider adding such a functionality, to make this more transparent to the user and avoid ugly workarounds with lots of code repetition.

[Request] Add lepton and MET 4 vectors into the ntuple

Is your feature request related to a problem? Please describe.
Adding lepton and MET 4 vectors to the output will help in the calculation of friend trees later, this way we have them available directly

Error in the generation of the C++ code should result in a cmake error

E.g. when i have an Error in the python code:

AttributeError: 'VectorProducer' object has no attribute 'name'

I still get a build file

-- Add build target for file analysis_emb.cxx.
-- Add test for target analysis_emb
-- Configuring done
-- Generating done
-- Build files have been written to: /work/sbrommer/ntuple_prototype/build

-DSHIFT not working if shift is only in specific channel

https://github.com/KIT-CMS/CROWN/blob/main/code_generation/configuration.py#L607
error if shift is not in all scopes

[Request] Cutflow Diagram

Is your feature request related to a problem? Please describe.
Add a cutflow diagram to the ntuple, a representation of the filter that are already printed after an ntuple is produced, so including this

Flag_goodVertices: pass=21000      all=21000      -- eff=100.00 % cumulative eff=100.00 %
Flag_globalSuperTightHalo2016Filter: pass=20998      all=21000      -- eff=99.99 % cumulative eff=99.99 %
Flag_HBHENoiseFilter: pass=20998      all=20998      -- eff=100.00 % cumulative eff=99.99 %
Flag_HBHENoiseIsoFilter: pass=20995      all=20998      -- eff=99.99 % cumulative eff=99.98 %
Flag_EcalDeadCellTriggerPrimitiveFilter: pass=20993      all=20995      -- eff=99.99 % cumulative eff=99.97 %
Flag_BadPFMuonFilter: pass=20991      all=20993      -- eff=99.99 % cumulative eff=99.96 %
Flag_eeBadScFilter: pass=20991      all=20991      -- eff=100.00 % cumulative eff=99.96 %
Flag_ecalBadCalibFilter: pass=20987      all=20991      -- eff=99.98 % cumulative eff=99.94 %
GoodElMuPairs: pass=1822       all=20987      -- eff=8.68 % cumulative eff=8.68 %

into the sample

Z PT Reweighting

[BUG]

systematics missing in config:
jecUncEC2YearUp
jecUnjecUncEC2YearUpcHFUDown

TauES uncertainty sources

Splitting the TauES uncertainty sources based on the tau decay modes.

Add Metadata to the generated samples

Adding some metadata to the sample should help in further processing the samples and will make reproducibility of the samples easier

Some ideas which metadata could be useful

List of all shifts considered in the ntuple
List of all variable names (although this is maybe too much)
current framework commit hash
is the current repository clean
cmake options used (currently this would be era, sample, config)

[Request] Function to attempt a correct ordering of producers

Is your feature request related to a problem? Please describe.
Currently, it is possible to use an incorrect ordering of producers in the python config. This will result in a runtime error, since there is no check, if the order set in the config is actually valid.

Describe the solution you'd like
Add a new function, something like ResolveProducerOrdering and attempt to find a matching producer order there. If no valid ordering can be created, this should result in an error during the config step

2016 era

The 2016 era is split into two, 2016preVFP and 2016postVFP. So this should be expanded in the configs and so on.

[BUG] Writeout of global quantities added via ProducerRules

If a Producer is added via a ProducerRule and the Producer to be added belongs to the global scope, the output quantites that are added, are added to the outputs defined for the global scope, and not to the actual analysis scopes

e.g.

configuration.add_modification_rule(
        "global",
        AppendProducer(
            producers=[event.npartons],
            samples=["dyjets"],
        ),
    )

will result in an additional output file for the global scope, only containing the q.npartons variable.

only one scope possible in ExtendedVectorProducer

The ExtendedVectorProducer(previous TriggerVectorProducer) only works for one scope.

CROWN/code_generation/producer.py

Lines 304 to 308 in 638b0f8

 if not isinstance(scope, list): 

 scope = [scope] 

 if len(scope) != 1: 

 log.error("TriggerVectorProducer can only use one scope per instance !") 

 raise Exception

It should be checked if this can be extended to manage more then one scope since some features might need it.

Check correct output of the test input

Currently we run against a dummy nanoAOD, but don't test that the correct thing comes out.

TTBar Corrections

Pileup Reweighting

Readout weights
simple config for calculating the weights

Add flamegraph to CI test

We could add the rendering of a flamegraph to every commit, so we can monitor the performance changes more clearly

[BUG] Trigger filterbit documentation inconsistency

Describe the bug
When doing trigger object matching, a user can set a filterbit which should correspond to the filterbits mentioned in the docs: https://crown.readthedocs.io/en/latest/namespaces.html#_CPPv47trigger and moreover in the CMS code for the generation of the filterbits: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/triggerObjects_cff.py#L17

However, the values listed here are shifted by 0, and the value zero is also used for the first bit used, so the default value for no filterbit matching should be moved to -1

JSON Filter for Data

Tau Corrections Readout

Tau ID SFs

Running the Framework

When running the compiled program, the appropriate flags to use are not clear for the input / output. currently the error message reads Require exactly two additional input arguments (the input and output paths to the ROOT files) but got 0 - a help message along the lines of list of required flags: --input: *description* --output: *description* --otherflags?: *description* would be useful

[BUG] Throw expection, if two producers have the same name, resulting in overwriting of generated code

If two producers have the same name, and live in the same scope, the code generation will overwrite the files generated from these producers, without any check. It should result in an error transparent to the user

[BUG] DeltaR requirement in Pair Selection

Describe the bug
The DeltaR Requriement for the separation of the two pair candidates is not included in the algorithm

Expected behavior
It should be included

STXS Flags

[Request] Allow more than one file as input

Add the functionality to add as many input files as possible, similar to e.g. hadd. So make the first, or the last command line argument (to be decided) the name of the output file, and all remaining args are considered as input files. This would allow the framework to be more flexible and design batch system jobs more easily, as well as smaller local productions

Muon Corrections Readout

Functionality to setup a reproduction of an ntuple with a single script

Since the git commits and status are stored within the ntuple file, if we store any potential diffs in the installation folder as well, it should be possible to use any installation directory to setup an exact copy of the setup used. This would be very useful for reproduction of samples or error / bug fixing

MuonIDIso_SF = ProducerGroup(
    name="MuonIDIso_SF",
    call=None,
    input=None,
    output=None,
    scopes=["mt"],
    subproducers=[
        Muon_1_ID_SF,
        Muon_1_Iso_SF,
    ],
)

could look something like this:

MuonIDIso_SF = ProducerGroup(
    name="MuonIDIso_SF",
    call=None,
    input=None,
    output=None,
    scopes=["mt", "mm"],
    subproducers={
		"mt" :[
	          Muon_1_ID_SF,
	          Muon_1_Iso_SF,
   		],
		"mm" :[
			  Muon_1_ID_SF,
	          Muon_1_Iso_SF,
			  Muon_2_ID_SF,
	          Muon_2_Iso_SF,
		]
	}
)

for inputs, that is already possible

Add MET things

For now, PUPPIMET only

[Request] Add feature to run on a limited set of events

Add an additional cmake option to limit the number of events, to be used for debugging purposes

	auto good_jets_pt =
	ROOT::VecOps::Where(jetmask > 0, jet_pt, (float)0.);
	Logger::get("OrderJetsByPt")->debug("Jetpt after {}", good_jets_pt);
	// we have to convert the result into an RVec of ints since argsort
	// gives back an unsigned long vector
	auto temp =
	ROOT::VecOps::Argsort(ROOT::VecOps::Nonzero(good_jets_pt));

	if not isinstance(scope, list):
	scope = [scope]
	if len(scope) != 1:
	log.error("TriggerVectorProducer can only use one scope per instance !")
	raise Exception

kit-cms / crown Goto Github PK

crown's People

Contributors

Stargazers

Watchers

Forkers

crown's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs