GithubHelp home page GithubHelp logo

lbelzile / mev Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 4.0 60.65 MB

Modelling extreme values

Home Page: https://lbelzile.github.io/mev

R 57.02% C++ 12.63% C 0.01% HTML 29.18% TeX 1.16%
extreme-value-statistics threshold-selection max-stable simulation likelihood-functions

mev's Introduction

mev: Modelling Extreme values

CRAN_Status_Badge License Downloads

An R package for the analysis of univariate, multivariate and functional extreme values. The package includes routine functions for univariate analyses multiple threshold selection diagnostics, optimization, bias-correction and tangent exponential model approximations, non-parametric spectral measure estimation using empirical likelihood methods, etc. Multivariate functionalities revolve around simulation algorithms for multivariate models, empirical likelihood, empirical dependence measures. Likelihood functions for elliptical processes and user-provided methodologies.

To install from Github, use

remotes::install_github("lbelzile/mev")

after installing remotes.

Functionalities

The functionalities of the package are sorted below by topic.

Univariate

The package focuses on likelihood based inference for parametric models.

Log likelihood, score and information matrices for the following univariate models:

  • gpd: generalized Pareto distribution (alternative parametrizations gpde, gpdN, gpdr)
  • gev: generalized extreme value distribution (alternative parametrizations gevN, gevr)
  • pp: inhomogeneous Poisson process for extremes
  • rlarg: asymptotic r-largest order statistics

Fitting procedures and higher order asymptotic inference for univariate extremes

  • fit.* for maximum likelihood estimation
  • *.bcor for bias correction via score vectors or by subtraction
  • *.pll: profile likelihood for objects
  • *.tem for tangent exponential model approximation to profile likelihood

Two additional penultimate models and utilities for approximations

  • egp: extended generalized Pareto models of Papastathopoulos and Tawn (2013)
  • extgp: extended generalized Pareto models of Naveau et al. for rainfall
  • smith.penult: Smith (1987) penultimate approximations to parametric models

Threshold selection

Multiple functions can be used for threshold selection for the peaks over threshold method

  • automrl: automatic threshold selection for mean residual life plots
  • cvselect: threshold selection via coefficient of variation
  • tstab.egp: threshold stability plots for egp models
  • infomat.test: information matrix test for time series
  • NC.diag: Northrop and Coleman (2014) score tests
  • tstab.gp: threshold stability plot for generalized Pareto distribution
  • vmetric.diag: metric-based threshold selection of Varty et al.
  • W.diag: Wadsworth (2016) sequential analysis threshold diagnostics

Multivariate

Some functionalities (incomplete) for multivariate models. There is currently no function to optimize multivariate threshold models, but likelihoods are provided for logistic, Brown--Resnick, Huesler--Reiss and extremal Student models

  • ibvpot: interpretation of bivariate models (extension of evir for all bivariate models from evd)
  • likmgp, clikmgp: (censored) likelihood for multivariate generalized Pareto
  • expme: exponent measure of parametric extreme value models

Two tests, one for max-stability and the other for asymptotic independence

  • maxstabtest: test of max-stability
  • scoreindep: score test of asymptotic independence for bivariate logistic model

Nonparametric

Estimation of the angular distribution using empirical estimation or empirical likelihood, with or without smoothing

  • angmeas: rank-based estimation of the angular measure
  • angmeasdir: Dirichlet mixture smoothing of angular measure

Simulation

Sampling algorithms for parametric models, multivariate and spatial extreme values, angular distribution and (generalized) risk-Pareto processes using accept-reject or composition sampling (approximate).

  • rrlarg: simulation of $r$-largest observations from point process of extremes
  • rdir: simulation of Dirichlet vectors
  • mvrnorm: simulation of multivariate normal vectors
  • rmev: exact simulation of multivariate extreme value distributions
  • rmevspec: random samples from angular distributions of multivariate extreme value models.
  • rparp: simulation from R-Pareto processes
  • rparpcs: simulation from Pareto processes (max) using composition sampling
  • rparpcshr: simulation of generalized Huesler-Reiss Pareto vectors via composition sampling
  • rgparp: simulation from generalized R-Pareto processes

Extremal dependence measures

Measures of tail dependence $\theta$, $\eta$, $\chi$ and $\varphi$.

  • taildep: estimators of coefficients of tail dependence $\eta$ and tail correlation $\chi$
  • extcoef: estimators of the extremal coefficient
  • xasym: estimators of the extremal asymmetry coefficient
  • angextrapo: bivariate tail dependence $\eta$ across rays
  • lambdadep: bivariate function of Wadsworth and Tawn (2013)
  • ext.index: extremal index estimators based on interexceedance time and gap of exceedances
  • extremo: pairwise extremogram as a function of distance for spatial data

Datasets

Various datasets collected here and there, (exclusively?) for univariate peaks over threshold analysis

  • abisko: Abisko rainfall
  • eskrain: Eskdalemuir observatory daily rainfall
  • geomagnetic: magnitude of geomagnetic storms
  • maiquetia: Maiquetia daily rainfall series
  • nidd: river Nidd daily flow
  • venice: Venice sea level data
  • w1500m: women 1500m track records

Spatial

Some functionalities for fitting spatial data

  • distg: matrix of pairwise distance with geometric anisotropy
  • Variogram models (unexported functions powerexp.cor, power.vario, schlather.vario)
  • Lambda2cov: conver variogram to covariance of conditional random field

Miscellaneous

Functions used internally that could be of more general use.

  • emplik: empirical likelihood for vector mean
  • wecdf: weighted empirical distribution function
  • spline.corr and tem.corr: corrections for Fraser--Reid objects to remove singularities nead the mode

mev's People

Contributors

lbelzile avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mev's Issues

gpd.infomat not scaling linearly with option "exp"

Submitted via email by Paulo da Costa Perreira.

The function gpd.infomat does not scale linearly with the sample size when argument option="exp.
Also, no warning/error is triggered when data outside the range of the distribution was provided in the infomat functions.

This bug impacts the functions gpd.Fscore and gpd.bcor.

Root correction in gpd.Fscore was -1/3 instead of -0.3, leads to systematic bias.

Information matrix of the Poisson point process

There is a mistake in my code for the observed information matrix of the Poisson point process likelihood is wrong. The observed information coincides with the negative Hessian of the log-likelihood, but I don't get positive definite matrices necessarily...

The entries of the expected information matrix are given in Sharkey and Tawn (2018), but it would be wise to rederive them.

ext.index

This function presents problems when I use apply or within a cicle. For example, for 1000x500 matrix (samples are coluns) it calculates the first 10 but no more. I've tried with a cicle in stead and I realize that sometimes it does not calculate but I don't understand why. Has any one tried the same problem? I wanted to compare it with other estimates, calculating the mean value for some number of replications...

Make package more dependency lean

Consider replacing calls to package nloptr with buggy Fortran F77 routines unless the latter is updated.

All remaining calls are in profile.R calls

Improve fit of `extgp`

Request of P. Naveau and P. Ailliot

Issues with discrete data for extgp model. Following code to look at.

pextgpd=function(theta,x){  #fonction de répartition sans prendre en compte la discrétisation
p=(1-pmax((1+theta[3]*(x-theta[4])/theta[1]),0)^(-1/theta[3]))^(theta[2]) #renvoie NaN si x<theta[4]
  p[is.na(p)]=0
  return(p)
}

qextgpd=function(theta,p,step){  #fonction quantile avec discrétisation (pour les qqplots)
qcont=theta[4]+theta[1]/theta[3]*((1-p^(1/theta[2]))^(-theta[3])-1) #sans discrétisation
  return(step*floor((qcont-step)/step))  #quantile après discrétisation
}

LLexpGPD=function(theta,y,step){  #fonction de vraisemblance discrétisée
  #theta=(sigma,alpha,xi,ym)
out=sum(log(pextgpd(theta,y+step)-pextgpd(theta,pmax(y,theta[4]))))
  return(-out)
}

Unscaled features in `tstab.gpd` lead to failure of optimization routines

Reported by John Ery.

Error in t(c(1, -thresh[i] + thresh[1])) %*% gpdu$vcov : 
  requires numeric/complex matrix/vector arguments
In addition: Warning message:
In gp.fit(xdat = na.omit(as.vector(xdat)), threshold = threshold,  :
  Cannot calculate standard error based on observed information

This error is caused by unscaled features (approximately 10e9); the numerical tolerance is too small, leading to lack of convergence in the optimizer and warning/failure of the routine.

Perhaps it would make sense to scale data first before computing and using location-scale properties to give back the estimates.

rmev for hr model

Parameter matrix must be conditionally non-negative definite to obtain non NA values. There are currently no test for such and no warning if NA are returned.

`W.diag` plots messes up with user settings.

The current code resets the parameters of the graphics console, which prevents people from adding graphs to existing plots. It also is particularly unwanted if the code returns no plot (plots=NA).

regp2 returns invalid type

I am currently testing different rainfall distribution especially the extented gp2.
I am experiencing some difficulties with the regp2 function.

For instance when I execute:

x = regp2(n=1000, kappa=2, sigma=1, xi=0.5,type=1)

I have the following message:

Erreur dans qegp2.G(unifsamp, prob, kappa, delta, type) :
Invalid `type' argument

Release mev 1.14

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • devtools::check_mac_release()
  • revdepcheck::revdep_check(num_workers = 4)
  • pkgdown::build_site()

Submit to CRAN:

  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()

Simulation of Pareto processes

  • Create a wrapper for multivariate generalized Pareto vectors
  • Investigate the case of unequal thresholds
  • Add more families for composition sampling
  • Merge functions with an algorithm argument
  • Check that the Hüsler-Reiss and the Brown-Resnick have the same parametrization throughout.

rmev with asymmetric logistic model

rmev(n=1, d=2, param=2, asy=list(0.5, 0.5, c(0.5, 0.5)), model="alog")
returns error Expecting a single value: [extent=0].

Bug submitted by email by Michaël Lalancette.

`gdp.ll` calculates log(0), causing `gpd.nll` to silently pass error to `tsab.gpd`.

Hi, many thanks for the package and continued development.

For the example data below, tstab.gpd returns the error Error in seq.default(1, lux, len = nknots) : 'length.out' must be a non-negative number. Forking this package and tracing the error, it appears to occur in this line of gpd.ll, where the third element of 1 + xi/sigma * dat is 0, the log of which is -Inf, giving said error.

Could there (i) be a sensible solution to this possible bug, or (ii) a more sensible error message when it occurs? There are also errors in the other threshold selection methods shown here, presumably for a similar reason.

Reprex for this issue:

library(mev)

# data
diffs <- c(2.81951297074556, 1.08243174850941, 9.10901987552643, 3.40175648033619, 
           1.64474362134933, 1.6418571472168, 7.74637460708618, 42.545136988163, 
           4.42913325130939, 33.4073157981038, 0.791920393705368, 2.95044100284576, 
           10.998235322535, 20.4508430361748, 2.69683688879013, 3.33915970474482, 
           37.7822144702077, 10.698377430439, 9.73197823762894, 0.390749335289001, 
           0.218639880418777, 11.9363431185484, 0.366057872772217, 2.119623452425, 
           41.309825360775, 30.4940503239632, 3.6598953306675, 2.0952250957489, 
           5.28838634490967, 20.3412247598171, 0.577117383480072, 5.04886768013239, 
           5.99352183938026, 0.845351293683052, 3.17710891366005, 1.17714792490005, 
           1.11236909031868, 0.418272614479065, 0.561639964580536, 5.2942131832242, 
           41.5214887857437, 2.95898319780826, 0.183028936386108, 0.626213699579239, 
           0.42288264632225, 1.17204022407532, 1.44027414917946, 5.32458829879761, 
           16.4689251184464, 1.69967293739319, 6.59124094247818, 1.27000004053116, 
           3.48916530609131, 2.80104877054691, 0.683028936386108, 1.500114813447, 
           0.186223056167364, 1.50851613283157, 0.0966406241059303, 0.127125412225723, 
           0.282202571630478, 0.886843204498291, 37.8948299884796, 1.78267043828964, 
           0.401593267917633, 4.40598256886005, 0.0817966535687447, 3.25152472406626, 
           8.40313673019409, 0.254000008106232, 18.2489268779755, 1.08870995044708, 
           1.74532070755959, 0.331796653568745, 0.0915144681930542, 0.604640662670135, 
           6.03872404247522, 1.6589712202549, 8.79172092676163, 10.5326889157295, 
           0.980914600193501, 3.05491060763597, 4.65537348389626, 11.4831943511963, 
           0.818241983652115, 0.117766812443733, 17.0479534268379, 1.39983341097832, 
           0.254000008106232, 0.53671869635582, 0.0588834062218666, 0.15039786696434, 
           0.983999967575073, 0.423241868615151, 1.34176522493362)

plot(diffs)

thcan <- quantile(
  diffs, 
  seq(0.5, 0.99, length.out = 25)
)

tstab.gpd(xdat = diffs, 
          thresh = thcan,  
          method = "profile"
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.