GithubHelp home page GithubHelp logo

ogcjn / extreme-value-analysis-of-huge-datasets Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4.94 MB

PhD Thesis. Some results in Extreme Value Theory with applications to High-Throughput Screening and Bioinformatics.

phd thesis extreme value theory huge datasets high throughput screening

extreme-value-analysis-of-huge-datasets's Introduction

Extreme Value Analysis of Huge Datasets: Tail Estimation Methods in High-Throughput Screening and Bioinformatics

Abstract

The thesis presents results in Extreme Value Theory with applications to High-Throughput Screening and Bioinformatics. The methods described in the thesis, however, are applicable to statistical analysis of huge datasets in general. The main results are covered in four papers.

The first paper develops novel methods to handle false rejections in High-Throughput Screening experiments where testing is done at extreme significance levels, with low degrees of freedom, and when the true null distribution may differ from the theoretical one. We introduce efficient and accurate estimators of False Discovery Rate and related quantities, and provide methods of estimation of the true null distribution resulting from data preprocessing, as well as techniques to compare it with the theoretical null distribution. Extreme Value Statistics provides a natural analysis tool: a simple polynomial model for the tail of the distribution of p-values. We exhibit the properties of the estimators of the parameters of the model, and point to model checking tools, both for independent and dependent data. The methods are tried out on two large scale genomic studies and on an fMRI brain scan experiment.

The second paper gives a strict mathematical basis for the above methods. We present asymptotic formulas for the distribution tails of, probably, the most commonly used statistical tests, under non-normality, dependence, and non-homogeneity, and derive bounds for the absolute and relative errors of the approximations.

In papers three and four we study high-level excursions of the Shepp statistic for the Wiener process and for a Gaussian random walk. The application areas include finance and insurance, and sequence alignment scoring and database searches in Bioinformatics.

Reference

BiBTeX

@phdthesis{Zholud2011,
  Author = {Dmitrii Zholud},
  Year = {2011},
  Title = {{E}xtreme {V}alue {A}nalysis of {H}uge {D}atasets: {T}ail {E}stimation {M}ethods in {H}igh-{T}hroughput {S}creening and {B}ioinformatics},
  School = {Gothenburg Univ. ISBN: 978-91-628-8354-6}
}

Update 2018

extreme-value-analysis-of-huge-datasets's People

Contributors

ogcjn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.