GithubHelp home page GithubHelp logo

kit-cms / crown Goto Github PK

View Code? Open in Web Editor NEW
4.0 7.0 21.0 50.64 MB

C++ based ROOT Workflow for N-tuples (CROWN)

Home Page: https://crown.readthedocs.io

License: MIT License

C++ 65.52% Shell 0.80% CMake 1.94% Python 31.65% Dockerfile 0.08%
root cms particle-physics analysis hep hep-ex ntuples wlcg

crown's People

Contributors

a-monsch avatar arturakh avatar conformist89 avatar eguiraud avatar felix-phy avatar harrypuuter avatar khaosmos93 avatar mburkart avatar nfaltermann avatar nshadskiy avatar ralfschmieder avatar stwunsch avatar winterchristian avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

crown's Issues

[BUG] Jet pt ordering

Describe the bug
The algorithm to sort jets by their transverse momentum is working wrong.

CROWN/src/jets.hxx

Lines 75 to 81 in 468666c

auto good_jets_pt =
ROOT::VecOps::Where(jetmask > 0, jet_pt, (float)0.);
Logger::get("OrderJetsByPt")->debug("Jetpt after {}", good_jets_pt);
// we have to convert the result into an RVec of ints since argsort
// gives back an unsigned long vector
auto temp =
ROOT::VecOps::Argsort(ROOT::VecOps::Nonzero(good_jets_pt));

VecOps::Argsort is sorting here the indices of the jets with non-zero pt and not the jet pt itself. This leads to a wrong ordering.

Trigger stuff

  • Trigger object matching
  • Writeout of trigger flags
  • Writeout of trigger scale factors

Track unused configuration parameters

Is your feature request related to a problem? Please describe.
a functionality to track an report on configuration parameters, that are not used anywhere

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

Check if files are available [Request]

Currently, it can happen that if a remote file is used as an input file, that CROWN fails with an ugly segfault if the file is not available or accessible. This should be handled in a clear way, by checking beforehand if all input files are available.

Also the error message for local files that are not found is also a bit cryptic at the moment

[2022-01-24 15:28:09.758] [main] [info] input_file 1: /i/do/not/exist.root
[2022-01-24 15:28:09.758] [main] [info] Output directory: test.root
[2022-01-24 15:28:09.787] [main] [info] Starting Setup of Dataframe
terminate called after throwing an instance of 'std::runtime_error'
  what():  GetBranchNames: error in opening the tree Events
[2]    819003 abort      ./tagandprobe_data_2018 test.root /i/do/not/exist.root

mm channel, zptmassreiweighting

bug appears in ntuple production, with debug=true it states:
Warning in TStreamerInfo::BuildCheck:
The StreamerInfo of class RooBinningCategory read from file data/zpt/htt_scalefactors_legacy_2018.root
has the same version (=1) as the active class but a different checksum.
You should update the version to ClassDef(RooBinningCategory,2).
Do not try to write objects with the current class definition,
the files will not be readable.

Warning in TStreamerInfo::CompareContent: The following data member of
the on-file layout version 1 of class 'RooBinningCategory' differs from
the in-memory layout version 1:
RooTemplateProxy _inputVar; //
vs
RooTemplateProxy _inputVar; //
Error in TClass::RegisterStreamerInfo: Register StreamerInfo for RooBinningCategory on non-empty slot (1).

Fix Introduction in the Documentation

In the 'Running the Framework' section the -DANALYSIS=config -DSAMPLES=samples flags need to be specified / how the config and samples are handled is unclear.

Extend Producer calls to be able to treat different inputs with the same producer instance

Modify the code, so it is possible to change the inputs based on the scope. If a list is provided as input, we concert this internally to a dict, and always use a scope dict for the inputs.

E.g. for q_1

q_1 = Producer(
    name="q_1",
    call="quantities::charge({df}, {output}, 0, {input})",
    input={"mt": [q.ditaupair, nanoAOD.Muon_charge],
           "et": [q.ditaupair, nanoAOD.Electron_charge],
           "tt": [q.ditaupair, nanoAOD.Tau_charge],
           "em": [q.ditaupair, nanoAOD.Electron_charge], },
    output=[q.q_1],
    scopes=["mt", "et", "tt", "em"],
)

Convert p4 quantities to floats

Hier muesst ihr aufpassen, das schreibt euch ein double raus und ihr habt am Ende evtl ein 2x so grosses ntuple ;) Selbes gilt auch fuer die anderen quantities unten.

        [](const ROOT::Math::PtEtaPhiMVector &p4) { return (float)p4.pt(); },

Originally posted by @stwunsch in #17 (comment)

[Request] Dataset Management and Batch System Submission

Is your feature request related to a problem? Please describe.

We need a clear way, on how to treat and handle different datasets, and how we want the batch submission to look like exactly.

I guess this would be the time for a new brain storming session on how we want to handle this

Modification Rule to remove Producers from a ProducerGroup

Currently, a call ob something like

configuration.add_modification_rule(
        channels,
        RemoveProducer(producers=genparticles.gen_taujet_pt_2, samples=["emb"]),
    )

does not work, since genparticles.gen_taujet_pt_2 is included within a producerGroup, and only Producers added on the toplevel can be removed via this function. Consider adding such a functionality, to make this more transparent to the user and avoid ugly workarounds with lots of code repetition.

Error in the generation of the C++ code should result in a cmake error

E.g. when i have an Error in the python code:

AttributeError: 'VectorProducer' object has no attribute 'name'

I still get a build file

-- Add build target for file analysis_emb.cxx.
-- Add test for target analysis_emb
-- Configuring done
-- Generating done
-- Build files have been written to: /work/sbrommer/ntuple_prototype/build

[Request] Cutflow Diagram

Is your feature request related to a problem? Please describe.
Add a cutflow diagram to the ntuple, a representation of the filter that are already printed after an ntuple is produced, so including this

Flag_goodVertices: pass=21000      all=21000      -- eff=100.00 % cumulative eff=100.00 %
Flag_globalSuperTightHalo2016Filter: pass=20998      all=21000      -- eff=99.99 % cumulative eff=99.99 %
Flag_HBHENoiseFilter: pass=20998      all=20998      -- eff=100.00 % cumulative eff=99.99 %
Flag_HBHENoiseIsoFilter: pass=20995      all=20998      -- eff=99.99 % cumulative eff=99.98 %
Flag_EcalDeadCellTriggerPrimitiveFilter: pass=20993      all=20995      -- eff=99.99 % cumulative eff=99.97 %
Flag_BadPFMuonFilter: pass=20991      all=20993      -- eff=99.99 % cumulative eff=99.96 %
Flag_eeBadScFilter: pass=20991      all=20991      -- eff=100.00 % cumulative eff=99.96 %
Flag_ecalBadCalibFilter: pass=20987      all=20991      -- eff=99.98 % cumulative eff=99.94 %
GoodElMuPairs: pass=1822       all=20987      -- eff=8.68 % cumulative eff=8.68 %

into the sample

[BUG]

systematics missing in config:
jecUncEC2YearUp
jecUnjecUncEC2YearUpcHFUDown

Add Metadata to the generated samples

Adding some metadata to the sample should help in further processing the samples and will make reproducibility of the samples easier

Some ideas which metadata could be useful

  • List of all shifts considered in the ntuple
  • List of all variable names (although this is maybe too much)
  • current framework commit hash
  • is the current repository clean
  • cmake options used (currently this would be era, sample, config)

[Request] Function to attempt a correct ordering of producers

Is your feature request related to a problem? Please describe.
Currently, it is possible to use an incorrect ordering of producers in the python config. This will result in a runtime error, since there is no check, if the order set in the config is actually valid.

Describe the solution you'd like
Add a new function, something like ResolveProducerOrdering and attempt to find a matching producer order there. If no valid ordering can be created, this should result in an error during the config step

2016 era

The 2016 era is split into two, 2016preVFP and 2016postVFP. So this should be expanded in the configs and so on.

[BUG] Writeout of global quantities added via ProducerRules

If a Producer is added via a ProducerRule and the Producer to be added belongs to the global scope, the output quantites that are added, are added to the outputs defined for the global scope, and not to the actual analysis scopes

e.g.

configuration.add_modification_rule(
        "global",
        AppendProducer(
            producers=[event.npartons],
            samples=["dyjets"],
        ),
    )

will result in an additional output file for the global scope, only containing the q.npartons variable.

only one scope possible in ExtendedVectorProducer

The ExtendedVectorProducer(previous TriggerVectorProducer) only works for one scope.

if not isinstance(scope, list):
scope = [scope]
if len(scope) != 1:
log.error("TriggerVectorProducer can only use one scope per instance !")
raise Exception

It should be checked if this can be extended to manage more then one scope since some features might need it.

Add flamegraph to CI test

We could add the rendering of a flamegraph to every commit, so we can monitor the performance changes more clearly

[BUG] Trigger filterbit documentation inconsistency

Describe the bug
When doing trigger object matching, a user can set a filterbit which should correspond to the filterbits mentioned in the docs: https://crown.readthedocs.io/en/latest/namespaces.html#_CPPv47trigger and moreover in the CMS code for the generation of the filterbits: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/triggerObjects_cff.py#L17

However, the values listed here are shifted by 0, and the value zero is also used for the first bit used, so the default value for no filterbit matching should be moved to -1

Running the Framework

When running the compiled program, the appropriate flags to use are not clear for the input / output. currently the error message reads Require exactly two additional input arguments (the input and output paths to the ROOT files) but got 0 - a help message along the lines of list of required flags: --input: *description* --output: *description* --otherflags?: *description* would be useful

[Request] Allow more than one file as input

Add the functionality to add as many input files as possible, similar to e.g. hadd. So make the first, or the last command line argument (to be decided) the name of the output file, and all remaining args are considered as input files. This would allow the framework to be more flexible and design batch system jobs more easily, as well as smaller local productions

Functionality to setup a reproduction of an ntuple with a single script

Since the git commits and status are stored within the ntuple file, if we store any potential diffs in the installation folder as well, it should be possible to use any installation directory to setup an exact copy of the setup used. This would be very useful for reproduction of samples or error / bug fixing

[BUG] Optimize Step does not preseve Filter ordering

When two filters are defined in the producer order, the optimize step does not preserve the order of those two filters. This can lead to unwanted results, since the filters can depend on each other, and the order they are defined in.

Add option to modify subproducers based on the scope

Is your feature request related to a problem? Please describe.
This would decrease configuration overload for setup with multiple scopes

MuonIDIso_SF = ProducerGroup(
    name="MuonIDIso_SF",
    call=None,
    input=None,
    output=None,
    scopes=["mt"],
    subproducers=[
        Muon_1_ID_SF,
        Muon_1_Iso_SF,
    ],
)

could look something like this:

MuonIDIso_SF = ProducerGroup(
    name="MuonIDIso_SF",
    call=None,
    input=None,
    output=None,
    scopes=["mt", "mm"],
    subproducers={
		"mt" :[
	          Muon_1_ID_SF,
	          Muon_1_Iso_SF,
   		],
		"mm" :[
			  Muon_1_ID_SF,
	          Muon_1_Iso_SF,
			  Muon_2_ID_SF,
	          Muon_2_Iso_SF,
		]
	}
)

for inputs, that is already possible

Add MET things

For now, PUPPIMET only

  • Readout of MET
  • Readout of METCoV
  • Propagation of Corrections for
    • Jets,
    • electrons,
    • taus,
    • muons (?)
  • Readout of MetUncertainties --> MetUnclustered
  • MET Recoil Corrections (special for WJets)
  • Calculation of MET related quantities ala mt_tot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.