GithubHelp home page GithubHelp logo

vincecr0ft / pynfold Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 2.49 MB

Unfolding the inverse problem

Home Page: https://pynfold.readthedocs.io/

License: MIT License

Python 98.02% Makefile 1.22% Dockerfile 0.76%

pynfold's Introduction

PynFold

pynfold.fold

An interface for pythonic solutions to the inverse problem also known as unfolding.

pynfold.fold(type, response, x_shape, y_shape = None, data = None, data_hist = None, data_bins = None)

All parameters are optional but cannot return unless sufficient information is supplied.

  • Parameters:

    • type:
      • Naive (naive) - Matrix Inversion, Correction Factors
      • Dimensionality - Truncated SVD, Richardson-Lucy/D'Agostini/Iterative/'Bayesian'
      • Composite - Tikonov/damped LSQ, Fully Bayesian, RUN
    • response
      • a two dimensional mapping true:measured in lists of numpy arrays
      • can also be set using fold.set_response(xy)
      • a response matrix can be supplied using fold.set_response_matrix(A)
      • a response function can be supplied using fold.set_response_function(foo) this is evaluated for a uniform distribution over the required range.
    • x_shape - discretisation of function in x. In Nystrom methods these are the abscissas for the quadrature rule and in Galerkin methods the span of the basis vectors. This is the desired shape and should be of lower dimensionality then the y_shape.
    • y_shape - discretisation of function in x. In Nystrom methods these are the abscissas for the quadrature rule and in Galerkin methods the span of the basis vectors. If a histogram of data points is supplied then the shape is taken to match this..
    • data - the distribution to unfold.
      • data - a data vector which can be discretised by either Nystrom or Galerkin methods.
      • data_hist - a vector of bin contents e.g. the outcome of a particle detector experiment with implicit Nystrom discretisation.
      • data_bins - shape of data vector. y_shape is matched to this shape.
  • Methods:

    • Fill(x, y), Miss(x) - set response manually for individual events (float) x and y. (RooUnfold Legacy mode)
    • response_matrix()
      • returns A - a matrix of probabilities of dim(m, n) = dim(y), dim(x)
    • Unfold() - Performs deconvolution of data given the response according to specified algorithm.
    • h_reco()
      • Returns g - a histogram of shape x_shape estimated from the Unfold() step
    • b_reco()
      • Returns g - a b-spline of with knots as x_shape estimated from the Unfold() step.

Notes

Unfolding Folklore

Prof. Dr. Volker Blobel compiled a list of rules of thumb to be used when approaching the inverse problem in high energy physics.

  1. Use at least 100 times more events in your response (MC) than you have data if possible.
  2. To avoid an underconstrained system you should use at least twice as many data points(bins) as you have estimators (truth bins).
  3. The response, though independent from the data, should be derived over a distribution as close to the target distribution as possible. (The uniform distribution used when a response_function is set to the fold is rarely optimal.)

Work in progress.

Currently (partly) Available

  • Naive - Matrix Inversion, Composite - damped LSQ , Dimensionality - Richardson-Lucy and Dimensionality - Truncated SVD all available using RooUnfold type settings.
  • HistogramPdf for Nystrom discretisation implemented but lacking functor+minimiser to fit a pseudo inverse.
  • response_matrix is currently the only available kernel. Weighted polynomials (as used in RUN) and kde under preparation.
  • Fully Bayesian Unfolding was available under version 1.6 but has been temporarilly removed to avoid dependency issues.

up-coming

  • Fit mode - construct the likelihood for the estimators and retrieve full minos intervals.
  • spline-mode - discretise data and reconstructed functions using b-splies as basis functions in Galerkin expansion.
  • Regularisation Optimisation - Control/Visualise the effect of varying regularisation strengths.
  • Background estimates, Systematics and multidimensional distributions

Examples

A typical smearing function:

def smear(xt):
    xeff = 0.3 + (1.0 - 0.3) / 20. * (xt + 10.0)  # efficiency                                                     
    x = np.random.rand()
    if x > xeff:
        return None
    xsmear = np.random.normal(-2.5, 0.2)
    return xt + xsmear

define data and response

x = np.random.normal(0.3, 2.5,1000)
data = np.array([smear(i) for i in x])
data_hist, bins = np.histogram(data[data != np.array(None)], 10)

now define a simple default fold and check each input in tern.

f = fold()
f.set_data_hist(data_hist, bins)
f.set_response(np.array([x,data]))

though the 'true' distribution isn't used in the Unfolding calculation, the optimised binning is useful.

true_hist, x_bins = np.histogram(x, 5)
f.set_x_shape(shape=x_bins)

our response is obtained as:

print f.response_matrix()

and we can test the output in forward mode.

print data_hist
print 'should equal'
print true_hist
print 'times'
print f.response_matrix()

perform calculation

A = f.response_matrix()
f = np.asarray(true_hist)
print 'let us try it!'
print  f*A

And now in inverse/unfolded mode

reco, var = f.Unfold()

in this case the reco vector is the nystrom discretisation of weights for the 'true' function f(x) or the reconstructed histogram. such that f should equal reco if the number of events are the same (e.g. 100 times less events means scaling f by 0.01)

Specifying regularisation strength for some algorithms the regularisation strength needs to be supplied e.g. RL - n iterations, tikonov - lagrange mulitplier etc) this is done dynamically.

f = fold('composite', response= np.array([x,data]), x_shape= x_bins, data_hist= data_hist, data_bins= bins)
reco, var = f.Unfold(.25)

pynfold's People

Contributors

vincecr0ft avatar

Stargazers

Patrick J Steffanic avatar Archit Agrawal avatar Kyle Cranmer avatar

Watchers

 avatar

pynfold's Issues

Unfolding Object

Should be initialisable and contain:

  • relevant Histogram objects

  • response/migration matrix

  • error handling in case object defined without necessary properties.

  • set priors (nominally flat)

  • plotting functionality (separate issue?)

Avoid ROOT dependencies

Avoid ROOT dependencies inside, let's have it be pure python. If people want to use ROOT, they should use something else to convert ROOT histograms into numpy format.

OneHist should just be a numpy array.

TwoHist should just be a numpy matrix (don't call it a 2-d histogram).

TUnfold

A clone of the TUnfold method developed by Stefan Schmitt

  • A notebook detailing the base algorithm
  • Implementation in PynFold backend
  • description and examples of implementation

Find RooUnfold equivalent to InverseProblem solutions.

The nomenclature of these fields might be misleading. This is a regularised unfolding, but also iterative. The precise algorithm used should be exactly the same

  • if there is no RooUnfold equivalent find another package that is.
  • write a notebook to describe algorithm with rigour.

Additional functions

Expand base functionality to include features not directly classified as 'unfolding' but that uses the same data stored within the base class.

  • identify any expansions

Inversion without regularisation

A simple inversion of the response matrix without regularisation.

  • A notebook to detail the base algorithm
  • Implementation in pynfold backend
  • description and examples of implementation.

Regularised SVD

Singular value decomposition (SVD; as proposed by Höcker and Kartvelishvili and implemented in TSVDUnfold)

  • Notebook describing base algorithm
  • implementation in pynfold backend
  • documentation and example of implementation

Bin-by-bin

The bin-by-bin unfolding method aka simple correction factors.

  • Notebook detailing the base algorithm
  • implementation in pynfold backend
  • documentation and examples of implementation

Add option to create Theano graph for calcualation

The priority of this issue depends on how well the numpy linear algebra solutions perform. If calculations are particularly slow then this might need bumping up.

  • expand pynfold base class to use theano for matrix algebra.

Implement inverse-problem in pynfold

Implement Tikonov regularisation algorithm in pynfold. First iteration might use a notebook depending on the status of the pynfold backend milestone.

  • Implement algorithm

Histogram Objects

To enable comparisons between PynFold and RooUnfold it should be possible to read several input formats and provide a custom Histogram object.

  • check if numpy array (else)

  • check if numpy histogram (else)

  • check if ROOT installed and if so is THF

  • apply histogram operations (subtract, multiply, divide etc.)

  • ensure that operations propagate errors correctly.

  • Basic support for 1 dimensional histograms of equal size (truth and measured)

  • 1 dimensional histograms of different size

  • Basic 2 dimensional histograms (response matrix)

  • multidimensional hists (truth and measured)

Iterative

Iterative ("Bayesian"; as proposed by D'Agostini)

  • notebook on base algorithm
  • backend implementation
  • verification of algorithm
  • documentation of implementation

Compare PynFold to RooUnfold to Non-ROOT alternatives.

PyFold should either be able to take in ROOT Histograms AND numpy arrays or use root_numpy to convert algorithms such that the histograms can be compared bin to bin.

  • Devise method for comparing results
    • Bin counting (if algorithm is deterministic)
    • χ² test (otherwise)
  • Compare results

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.