GithubHelp home page GithubHelp logo

mdmorris / bstartotw_cmsdas2023_backgroundestimation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fiemmi/bstartotw_cmsdas2023_backgroundestimation

0.0 0.0 1.0 52.77 MB

Background estimation for the ttbar resonance, with the 2022 CMSDAS b*->tW exercise as reference

Python 32.54% Jupyter Notebook 67.46%

bstartotw_cmsdas2023_backgroundestimation's Introduction

๐‘โ€ฒโ†’๐‘ก๐‘กยฏ Background Estimation

Adapted from background estimation for the 2023 CMSDAS $b^\ast \to tW$ exercise, using the updated version of 2DAlphabet

Getting started (in bash shell)

First, ensure that you have SSH keys tied to your github account and that they've been added to the ssh-agent:

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_xyz

This step is necessary for cloning some of the Combine tools used in the 2DAlphabet installation.

Setup CMSSW and 2DAlphabet environment:

Assuming you've already created the ~/public/CMSDAS2023/ directory, first create the CMSSW environment:

ssh -XY [email protected]
export SCRAM_ARCH=slc7_amd64_gcc700
cd public/CMSDAS2023/
cmsrel CMSSW_10_6_14
cd CMSSW_10_6_14/src
cmsenv

Now set up 2DAlphabet:

cd ~/public/CMSDAS2023/CMSSW_10_6_14/src/
git clone https://github.com/mdmorris/2DAlphabet.git
git clone --branch 102x https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
curl -s https://raw.githubusercontent.com/lcorcodilos/CombineHarvester/master/CombineTools/scripts/sparse-checkout-ssh.sh | bash
scram b clean; scram b -j 4
cmsenv

Now, create a virtual environment in which to install 2DAlphabet:

python -m virtualenv twoD-env
source twoD-env/bin/activate
cd 2DAlphabet
python setup.py develop

Then, check that the 2DAlphabet installation worked by opening a python shell:

python

then, inside the python shell:

import ROOT
r = ROOT.RooParametricHist()

Finally, clone this repo to the src directory as well:

cd ~/public/CMSDAS2023/CMSSW_10_6_14/src/
git clone https://github.com/mdmorris/BstarToTW_CMSDAS2023_BackgroundEstimation.git

OR fork the code onto your own personal space and set the upstream:

https://github.com/<USERNAME>/BstarToTW_CMSDAS2023_BackgroundEstimation.git
cd BstarToTW_CMSDAS2023_BackgroundEstimation
git remote add upstream https://github.com/mdmorris/BstarToTW_CMSDAS2023_BackgroundEstimation.git
git remote -v

What to do after reconnecting to LXPLUS:

Go back to the directory where you installed 2DAlphabet and where the virtual environment resides:

ssh -XY [email protected]
cd ~/public/CMSDAS2023/CMSSW_10_6_14/src/
cmsenv
source twoD-env/bin/activate

Then you should be good to go!

Background estimate

For this exercise we will use the 2DAlphabet github package. This package uses .json configuration files to specify the input histograms (to perform the fit) and the uncertainties. These uncertainties will be used inside of the Higgs Combine backend, the fitting package used widely within CMS. The 2DAlphabet package serves as a nice interface with Combine to allow the user to use the 2DAlphabet method without having to create their own custom version of combine.

Input root files

Root files with the pass and fail histograms can be found in:

/afs/cern.ch/user/m/mmorris/public/ttbarAllHad/twodalphabet/

Configuration file

The configuration file that you will be using is called bstar.json, located in this repository. Let's take a look at this file and see the various parts:

  • GLOBAL

    • This section contains meta information regarding the location (path), filenames (FILE), and input histogram names (HIST) for all ROOT files used in the background estimation procedure.
    • Everything in this section will be used in a file-wide find-and-replace. So wherever you see the name of the sub-objects in this file, it will be expanded by the value assigned to it in this section.
    • Additionally, the SIGNAME list should include the name(s) of all signals you wish to investigate, so that they are added to the workspace when you run the python script.
      • If you wanted to investigate limits for only three signals, for example, you'd just add their names as given in the ROOT files to this list.
      • For this exercise, the default is signalLH2400, the 2.4 TeV signal sample. You'll want to change this as the exercise progresses
  • REGIONS

    • This section contains the various regions we are interested in transferring between.
    • Each region contains a PROCESSES object, listing the signals and backgrounds to be included in the fit, as well as BINNING object, which is defined elsewhere in the config file.
    • The name of each region in REGIONS is dependent on the input histogram name, as well as your choice of HIST name in the GLOBAL section above
      • For instance, in this file we declared HIST = MtwvMt$region, where $region will be expanded as the name given in REGIONS.
      • We chose this name because the input histograms are titled MtwvMtPass and MtwvMtFail for the Pass and Fail regions, respectively.
  • PROCESSES

    • In this section we define all of the various process ROOT files that will be used to produce the fit. These include data, signals, and backgrounds.
    • Each process contains its own set of options:
      • SYSTEMATICS: a list of systematic uncertainties, whose properties are defined elsewhere in the config file
      • SCALE: how much to scale this process by in the fit
      • COLOR: color to plot in the fit (ROOT color schema)
      • TYPE: DATA, BKG, SIGNAL
      • TITLE: label in the plot legend (LaTeX compatible)
      • ALIAS: if the process has a different filename than standard, this will be what replaces process in the GLOBAL section's FILE option, so that this process gets picked up properly
      • LOC: the location of the file, using the definitions laid out in GLOBAL
  • SYSTEMATICS

    • This contains the names of all systematic uncertainties you want to apply to the various processes.
    • The CODE key describes the type of systematic that will be used in Combine.
    • The VAL key is how we assign the value of that uncertainty. For instance, a VAL of 1.018 in the lumi (luminosity) means that this systematic has a 1.8% uncertainty on the yield.
  • BINNING

    • This section allows us to name and define custom binning schema. After naming the schema, one would define several variables for both X and Y:
      • NAME: allows us to denote what is being plotted on the given axis
      • TITLE: the axis label for the plot (LaTeX enabled)
      • BINS: a list of bins
      • SIGSTART, SIGEND: the bins defining a window [SIGSTART, SIGEND] around which to blind (if the blinded option is selected)
  • OPTIONS

    • A list of boolean and other options to be considered when generating the fit
    • (explanation WIP)

Running the ML fit

By default, the ttbar.py python API should set up a workspace, perform the ML fit, and plot the distributions.

python ttbar.py

The output is stored in the ttbarfits/ output directory by default.

Running the ML fit for b-tag and y analysis regions

run the python ttbar.py command for all 6 regions:

cd regions/2016
python ttbar.py cen0b
python ttbar.py cen1b
python ttbar.py cen2b
python ttbar.py fwd0b
python ttbar.py fwd1b
python ttbar.py fwd2b

This will create separate directories labelled by the region and function, for example

ttbarfits_cen0b_3x1

In the ttbarfits directory, data cards are saved in the signal{XXXXX} directories. In order to combine the data cards from all 6 regions into one inclusive data card, run (still in the regions/2016 directory)

./combine_cards.sh

The combineTool.py jobs are submitted to condor, and when the jobs are completed the combined data cards and asymptotic root files can be found in the new directory regions/2016/ttbarfits_inclusive

In order to plot the limit, run the limits.ipynb notebook.

Repeat these steps for 2017 and 2018 in regions/2017 and regions/2018

The json files for the inclusive histograms are located in inclusive/ The json files for the split b-tag and y regions are locateed in regions/2016, regions/2017, and regions/2018

Statistical Tests

FTests

To run the FTest comparisons, first run 2dalphabet using all of the transfer functions you wish to compare. Save the directories in the format

ttbarfits_cen_ftest_1x2
ttbarfits_fwd_ftest_1x2

for the central and forward 1x2 transfer function, and then the same for all the other transfer functions.

To compare 1x2 to 2x2 for central and forward, in the FTest.py code, add the line to the end (below if __name__=="__main__":):

    directory = '/eos/home-m/mmorris/Documents/TTbarResonance/backgroundEstimate/CMSSW_10_6_14/src/BstarToTW_CMSDAS2023_BackgroundEstimation/tight/2018/'
    
    FTest('1x2','2x2', directory, 'cen')
    FTest('1x2','2x2', directory, 'fwd')

and change the directory variable to the appropriate directory.

Run the FTests and find the plots saved in the ftests directory:

mkdir ftests
python FTest.py

Systematics

Systematic uncertainties were described in the config file section above. Add the Top pT uncertainties to the appropriate processes in the config file, then re-run the fit after having copied the old Combine card somewhere safe. Compare the pre- and post-Top pT Combine cards using diff.

Limit setting

Limits for each signal are calculated using the perform_limit function in ttbar.py. The limits can then be plotted using the set_limits.py script, or interactively with the limits.ipynb notebook. The mass points and cross sections for each signal are located in signal_xs.json

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.