kit-cms / crown Goto Github PK
View Code? Open in Web Editor NEWC++ based ROOT Workflow for N-tuples (CROWN)
Home Page: https://crown.readthedocs.io
License: MIT License
C++ based ROOT Workflow for N-tuples (CROWN)
Home Page: https://crown.readthedocs.io
License: MIT License
Describe the bug
The algorithm to sort jets by their transverse momentum is working wrong.
Lines 75 to 81 in 468666c
VecOps::Argsort
is sorting here the indices of the jets with non-zero pt and not the jet pt itself. This leads to a wrong ordering.Is your feature request related to a problem? Please describe.
a functionality to track an report on configuration parameters, that are not used anywhere
Describe the solution you'd like
A clear and concise description of what you want to happen.
Additional context
Add any other context or screenshots about the feature request here.
Currently, it can happen that if a remote file is used as an input file, that CROWN fails with an ugly segfault if the file is not available or accessible. This should be handled in a clear way, by checking beforehand if all input files are available.
Also the error message for local files that are not found is also a bit cryptic at the moment
[2022-01-24 15:28:09.758] [main] [info] input_file 1: /i/do/not/exist.root
[2022-01-24 15:28:09.758] [main] [info] Output directory: test.root
[2022-01-24 15:28:09.787] [main] [info] Starting Setup of Dataframe
terminate called after throwing an instance of 'std::runtime_error'
what(): GetBranchNames: error in opening the tree Events
[2] 819003 abort ./tagandprobe_data_2018 test.root /i/do/not/exist.root
Describe the bug
Depending on the setup, it can happen that the git check of the repository fails. Additionally, this only includes the status of the main repository, not the configuration. Rework this meta-information to contain all information necessary for a reproduction
https://github.com/KIT-CMS/CROWN/blob/main/code_generation/code_generation.py#L217-L221
bug appears in ntuple production, with debug=true it states:
Warning in TStreamerInfo::BuildCheck:
The StreamerInfo of class RooBinningCategory read from file data/zpt/htt_scalefactors_legacy_2018.root
has the same version (=1) as the active class but a different checksum.
You should update the version to ClassDef(RooBinningCategory,2).
Do not try to write objects with the current class definition,
the files will not be readable.
Warning in TStreamerInfo::CompareContent: The following data member of
the on-file layout version 1 of class 'RooBinningCategory' differs from
the in-memory layout version 1:
RooTemplateProxy _inputVar; //
vs
RooTemplateProxy _inputVar; //
Error in TClass::RegisterStreamerInfo: Register StreamerInfo for RooBinningCategory on non-empty slot (1).
Add some way of configuration to the cmake build command, to make it easy to
In the 'Running the Framework' section the -DANALYSIS=config -DSAMPLES=samples
flags need to be specified / how the config and samples are handled is unclear.
Modify the code, so it is possible to change the inputs based on the scope. If a list is provided as input, we concert this internally to a dict, and always use a scope dict for the inputs.
E.g. for q_1
q_1 = Producer(
name="q_1",
call="quantities::charge({df}, {output}, 0, {input})",
input={"mt": [q.ditaupair, nanoAOD.Muon_charge],
"et": [q.ditaupair, nanoAOD.Electron_charge],
"tt": [q.ditaupair, nanoAOD.Tau_charge],
"em": [q.ditaupair, nanoAOD.Electron_charge], },
output=[q.q_1],
scopes=["mt", "et", "tt", "em"],
)
Hier muesst ihr aufpassen, das schreibt euch ein double raus und ihr habt am Ende evtl ein 2x so grosses ntuple ;) Selbes gilt auch fuer die anderen quantities unten.
[](const ROOT::Math::PtEtaPhiMVector &p4) { return (float)p4.pt(); },
Originally posted by @stwunsch in #17 (comment)
Is your feature request related to a problem? Please describe.
We need a clear way, on how to treat and handle different datasets, and how we want the batch submission to look like exactly.
I guess this would be the time for a new brain storming session on how we want to handle this
Currently, a call ob something like
configuration.add_modification_rule(
channels,
RemoveProducer(producers=genparticles.gen_taujet_pt_2, samples=["emb"]),
)
does not work, since genparticles.gen_taujet_pt_2
is included within a producerGroup, and only Producers added on the toplevel can be removed via this function. Consider adding such a functionality, to make this more transparent to the user and avoid ugly workarounds with lots of code repetition.
Is your feature request related to a problem? Please describe.
Adding lepton and MET 4 vectors to the output will help in the calculation of friend trees later, this way we have them available directly
E.g. when i have an Error in the python code:
AttributeError: 'VectorProducer' object has no attribute 'name'
I still get a build file
-- Add build target for file analysis_emb.cxx.
-- Add test for target analysis_emb
-- Configuring done
-- Generating done
-- Build files have been written to: /work/sbrommer/ntuple_prototype/build
https://github.com/KIT-CMS/CROWN/blob/main/code_generation/configuration.py#L607
error if shift is not in all scopes
Is your feature request related to a problem? Please describe.
Add a cutflow diagram to the ntuple, a representation of the filter that are already printed after an ntuple is produced, so including this
Flag_goodVertices: pass=21000 all=21000 -- eff=100.00 % cumulative eff=100.00 %
Flag_globalSuperTightHalo2016Filter: pass=20998 all=21000 -- eff=99.99 % cumulative eff=99.99 %
Flag_HBHENoiseFilter: pass=20998 all=20998 -- eff=100.00 % cumulative eff=99.99 %
Flag_HBHENoiseIsoFilter: pass=20995 all=20998 -- eff=99.99 % cumulative eff=99.98 %
Flag_EcalDeadCellTriggerPrimitiveFilter: pass=20993 all=20995 -- eff=99.99 % cumulative eff=99.97 %
Flag_BadPFMuonFilter: pass=20991 all=20993 -- eff=99.99 % cumulative eff=99.96 %
Flag_eeBadScFilter: pass=20991 all=20991 -- eff=100.00 % cumulative eff=99.96 %
Flag_ecalBadCalibFilter: pass=20987 all=20991 -- eff=99.98 % cumulative eff=99.94 %
GoodElMuPairs: pass=1822 all=20987 -- eff=8.68 % cumulative eff=8.68 %
into the sample
systematics missing in config:
jecUncEC2YearUp
jecUnjecUncEC2YearUpcHFUDown
Splitting the TauES uncertainty sources based on the tau decay modes.
Adding some metadata to the sample should help in further processing the samples and will make reproducibility of the samples easier
Some ideas which metadata could be useful
Is your feature request related to a problem? Please describe.
Currently, it is possible to use an incorrect ordering of producers in the python config. This will result in a runtime error, since there is no check, if the order set in the config is actually valid.
Describe the solution you'd like
Add a new function, something like ResolveProducerOrdering
and attempt to find a matching producer order there. If no valid ordering can be created, this should result in an error during the config step
The 2016 era is split into two, 2016preVFP
and 2016postVFP
. So this should be expanded in the configs and so on.
If a Producer is added via a ProducerRule and the Producer to be added belongs to the global scope, the output quantites that are added, are added to the outputs defined for the global scope, and not to the actual analysis scopes
e.g.
configuration.add_modification_rule(
"global",
AppendProducer(
producers=[event.npartons],
samples=["dyjets"],
),
)
will result in an additional output file for the global scope, only containing the q.npartons
variable.
The ExtendedVectorProducer
(previous TriggerVectorProducer
) only works for one scope
.
CROWN/code_generation/producer.py
Lines 304 to 308 in 638b0f8
Currently we run against a dummy nanoAOD, but don't test that the correct thing comes out.
We could add the rendering of a flamegraph to every commit, so we can monitor the performance changes more clearly
Describe the bug
When doing trigger object matching, a user can set a filterbit
which should correspond to the filterbits mentioned in the docs: https://crown.readthedocs.io/en/latest/namespaces.html#_CPPv47trigger and moreover in the CMS code for the generation of the filterbits: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/triggerObjects_cff.py#L17
However, the values listed here are shifted by 0, and the value zero is also used for the first bit used, so the default value for no filterbit matching should be moved to -1
When running the compiled program, the appropriate flags to use are not clear for the input / output. currently the error message reads Require exactly two additional input arguments (the input and output paths to the ROOT files) but got 0
- a help
message along the lines of list of required flags: --input: *description* --output: *description* --otherflags?: *description*
would be useful
If two producers have the same name, and live in the same scope, the code generation will overwrite the files generated from these producers, without any check. It should result in an error transparent to the user
Describe the bug
The DeltaR Requriement for the separation of the two pair candidates is not included in the algorithm
Expected behavior
It should be included
Add the functionality to add as many input files as possible, similar to e.g. hadd
. So make the first, or the last command line argument (to be decided) the name of the output file, and all remaining args are considered as input files. This would allow the framework to be more flexible and design batch system jobs more easily, as well as smaller local productions
Since the git commits and status are stored within the ntuple file, if we store any potential diffs in the installation folder as well, it should be possible to use any installation directory to setup an exact copy of the setup used. This would be very useful for reproduction of samples or error / bug fixing
Describe the bug
The test for building docs is failing: https://github.com/KIT-CMS/CROWN/runs/3066748634?check_suite_focus=true
When two filters are defined in the producer order, the optimize step does not preserve the order of those two filters. This can lead to unwanted results, since the filters can depend on each other, and the order they are defined in.
Is your feature request related to a problem? Please describe.
This would decrease configuration overload for setup with multiple scopes
MuonIDIso_SF = ProducerGroup(
name="MuonIDIso_SF",
call=None,
input=None,
output=None,
scopes=["mt"],
subproducers=[
Muon_1_ID_SF,
Muon_1_Iso_SF,
],
)
could look something like this:
MuonIDIso_SF = ProducerGroup(
name="MuonIDIso_SF",
call=None,
input=None,
output=None,
scopes=["mt", "mm"],
subproducers={
"mt" :[
Muon_1_ID_SF,
Muon_1_Iso_SF,
],
"mm" :[
Muon_1_ID_SF,
Muon_1_Iso_SF,
Muon_2_ID_SF,
Muon_2_Iso_SF,
]
}
)
for inputs, that is already possible
For now, PUPPIMET only
mt_tot
Add an additional cmake option to limit the number of events, to be used for debugging purposes
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.