GithubHelp home page GithubHelp logo

jonasmoss / univariateml Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 5.0 6.75 MB

An R package for maximum likelihood estimation of univariate densities.

Home Page: https://jonasmoss.github.io/univariateML/

License: Other

R 97.78% TeX 2.22%
estimation density maximum-likelihood

univariateml's Introduction

univariateML

R-CMD-check Codecov test coverage Project Status: Active – The project has reached a stable, usable state and is being actively developed. DOI CRAN_Status_Badge

Overview

univariateML is an R-package for user-friendly maximum likelihood estimation of a selection of parametric univariate densities. In addition to basic estimation capabilities, this package support visualization through plot and qqmlplot, model selection by AIC and BIC, confidence sets through the parametric bootstrap with bootstrapml, and convenience functions such as the density, distribution function, quantile function, and random sampling at the estimated distribution parameters.

Installation

Use the following command from inside R to install from CRAN.

install.packages("univariateML")

Or install the development version from Github.

# install.packages("devtools")
devtools::install_github("JonasMoss/univariateML")

Usage

The core of univariateML are the ml*** functions, where *** is a distribution suffix such as norm, gamma, or weibull.

library("univariateML")
mlweibull(egypt$age)
#> Maximum likelihood estimates for the Weibull model 
#>  shape   scale  
#>  1.404  33.564

Now we can visually assess the fit of the Weibull model to the egypt data with a plot.

hist(egypt$age, freq = FALSE, xlab = "Mortality", main = "Egypt")
lines(mlweibull(egypt$age))

Supported densities

Name univariateML function Package
Cauchy distribution mlcauchy stats
Gumbel distribution mlgumbel extraDistr
Laplace distribution mllaplace extraDistr
Logistic distribution mllogis stats
Normal distribution mlnorm stats
Student t distribution mlstd fGarch
Generalized Error distribution mlged fGarch
Skew Normal distribution mlsnorm fGarch
Skew Student t distribution mlsstd fGarch
Skew Generalized Error distribution mlsged fGarch
Beta prime distribution mlbetapr extraDistr
Exponential distribution mlexp stats
Gamma distribution mlgamma stats
Inverse gamma distribution mlinvgamma extraDistr
Inverse Gaussian distribution mlinvgauss actuar
Inverse Weibull distribution mlinvweibull actuar
Log-logistic distribution mlllogis actuar
Log-normal distribution mllnorm stats
Lomax distribution mllomax extraDistr
Rayleigh distribution mlrayleigh extraDistr
Weibull distribution mlweibull stats
Log-gamma distribution mllgamma actuar
Pareto distribution mlpareto extraDistr
Beta distribution mlbeta stats
Kumaraswamy distribution mlkumar extraDistr
Logit-normal mllogitnorm logitnorm
Uniform distribution mlunif stats
Power distribution mlpower extraDistr

Implementations

Analytic formulae for the maximum likelihood estimates are used whenever they exist. Most ml*** functions without analytic solutions have a custom made Newton-Raphson solver. These can be much faster than a naïve solution using nlm or optim. For example, mlbeta has a large speedup over the naïve solution using nlm.

# install.packages("microbenchmark")
set.seed(313)
x <- rbeta(500, 2, 7)

microbenchmark::microbenchmark(
  univariateML = univariateML::mlbeta(x),
  naive = nlm(function(p) -sum(dbeta(x, p[1], p[2], log = TRUE)), p = c(1, 1)))
#> Unit: microseconds
#>          expr     min       lq      mean   median       uq     max neval
#>  univariateML   259.2   348.75   557.959   447.05   536.40  5103.5   100
#>         naive 15349.1 15978.35 16955.165 16365.45 17082.25 48941.4   100

The maximum likelihood estimators in this package have all been subject to testing, see the tests folder for details.

Documentation

For an overview of the package and its features see the overview vignette. For an illustration of how this package can make an otherwise long and laborious process much simpler, see the copula vignette.

How to Contribute or Get Help

Please read CONTRIBUTING.md for details about how to contribute or get help.

univariateml's People

Contributors

jonasmoss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

univariateml's Issues

CRAN release

Would it be possible to make publish a new CRAN version in the next 1-2 weeks? (I want to publish a new package which relies on univariateML). I can help preparing everything if necessary.

Unit tests: Add support to unit tests.

Add support attribute testing to the unit tests.

Should add tests for dml, pml and qml for all ml options.

  • dnorm
  • dlogis
  • dcauchy
  • dgumbel
  • dlaplace
  • dexp
  • dlomax
  • drayleigh
  • dgamma
  • dweibull
  • dlnorm
  • dinvgamma
  • dbetapr
  • dwald
  • dbeta
  • dkumar
  • dunif
  • dpower
  • dpareto

Consider Rfast!

Rfast is a package with many univariate densities implemented. Most if not all of the implementations have a higher speed than the implementations in this package and the overlap is large.

Do one of these:

  1. Import Rfast and make use of their implementations;
  2. Make univariateML GPL and adopt the code from Rfast.

Add "safety" pml, qml, rml.

Not every density has a CDF, quantile function, or random variate generator. Make safe surrogates in the pml, qml and rml functions.

Better automated tests.

Currently, most of the automated tests for the ml*** functions are copy-pasta. This has two downsides: 1.) It is hard to verify if each test is complete. 2.) I we want to implement more tests or change the object structure, we would have to modify 22 tests.

One test file that avoids this problem is test_input_checks.R. This goes through every ml*** file with just a couple of lines.

Most of the tests are straight-forward such as checking whether the input checks work, if the objects have all the needed attributes, etc. I propose the following pattern:

fun = function(x) eval(call("attr", match.call()[[1]], "type"))
attr(fun, "type") = "continuous"

Then

fun()
# > [1] "continuous"

These function attributes can then be used to do both testing and populating attributes of the univariateML objects. Take mlcauchy and consider

attr(mlcauchy, "model") = "Cauchy"
attr(mlcauchy, "name") = "mlcauchy"
attr(mlcauchy, "density") = "stats::dcauchy"
attr(mlcauchy, "parameters") = c("location", "scale")
attr(mlcauchy, "test_call") = "stats::dcauchy(n, 0, 1)"
attr(mlcauchy, "support") = c(-Inf, Inf)

This setup allows us to test a lot of things easily (currently untested), such as the existence of the appropriate d*** and r*** functions and whether the parameter names are correct. (They should be the first parameters after x.)

Potential problems: The attributes can be removed from the function mlcauchy, which will break the function in most scenarios. An alternative is to start each function with a variable such as name = quote(mlcauchy) which identifies the function instead of using match.call().

Extended input checks

These should probably throw an error:

> mat_3cols <- replicate(3, rexp(100))
> mlexp(mat_3cols)
Maximum likelihood estimates for the Exponential model 
rate  
1.07
> mlexp(letters[1:5])
Maximum likelihood estimates for the Exponential model 
rate  
  NA  
Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA

mlnaka() does not ensure parameter bounds

There seems to be a bug in mlnaka():

set.seed(1)
x <- rexp(50)
fit <- mlnaka(x)
fit
#> Maximum likelihood estimates for the Nakagami model 
#> shape   scale  
#> 0.4638  1.7477 

pml(x, fit)
#> Error: 'shape and scale are not valid parameter vectors

As far as I understand, the shape parameter must be >= 0.5, but that's not ensured by the fitting procedure.

Make an overview vignette.

Make an overview vignette. It should contain:

  • AIC and BIC comparisons
  • Plotting
  • Ordinary PP plots and QQ plots
  • Comparative QQ plots
  • Parametric bootstrapping.
  • Usage of density, cdfs, quantile and random variate generation functions.

Then remove parts of the large readme and redirect them to the vignette.

Make P-P plots and Q-Q plots.

Should make qqmlplot modelled after stats:qqplot. In addition a ppplot should be made.

  • qqmlplot: Basic qq plot functionality.
  • ppmlplot: Basic pp plot functionality.
  • Basic documentation modeled after stats:qqplot.
  • Basic examples for both.
  • Make unit tests for 100% coverage.
  • Write more documentation with references and an explanation about the difference between this simple qq plot and the more sophisticated qqplot used in stats::qqnorm. Must have references here.

Add more densities.

We should add more distributions to the package. Two points to keep in mind: The densities should have few parameters, so no normal mixtures with 10 parameters. Two, they must be implemented in another package with at least the density function d*** implemented.

extraDistr

The package extraDistr has these non-implemented distributions that might be interesting to our users.

  • TruncNormal
  • Frechet
  • GEV
  • Gompertz
  • ShiftGomp
  • GPD
  • LocationScaleT
  • BirnbaumSaunders

actuar

The actuar package contains a lot of heavy-tailed distributions such as.

  • Loglogistic: Transform and use the current logistic ML.
  • Loggamma: Transform and use the current gamma ML.
  • InverseWeibull: Transform and use the current Weibull ML.
  • Burr: Might have the same kind of problems as Lomax.
  • InverseGaussian: Use this package instead of statmod to reduce the number of dependencies.

Some other distributions

Here are some other distributions. Be sure to check the quality of the implementation before making an ML function. If the implementations are too bad, make a small package containing only the d***, p*** functions and submit it to CRAN.

  • Foldnorm: In VGAM
  • LogitNormal: logitnorm package. Very easy to implement.
  • GenGamma: In flexsurv package.
  • EMG: In EMG package

mllomax: When does the MLE exist? Does the current algorithm work?

The MLE doesn't always exist for the Lomax distribution; its maximum would be at lambda = 0 in these cases. When it doesn't exist, the sequence f(x; lambda, kappa) will converge to an exponential. This is handled by checking if the optimizer gets really close to 0 in lambda. The current algorithm is a simple Newton-Raphson, but the function is not convex. Will it always work?

  1. Find and implement a simple check for the existence of the MLE.
  2. Will the algorithm work when the MLE exists? If not, make one that works.

Check all ML functions against the literature.

First task

Make details for each distribution.

Second Task

Check all ML functions against the literature.

  • dnorm
  • dlogis
  • dcauchy
  • dgumbel
  • dlaplace
  • dexp
  • dlomax
  • drayleigh
  • dgamma
  • dweibull
  • dlnorm
  • dinvgamma
  • dbetapr
  • dwald
  • dbeta
  • dkumar
  • dunif
  • dpower
  • dpareto
  • dllogis
  • dinvweibull
  • dlgamma
  • dlogitnorm

Support for discrete distributions

Would be nice to support some distributions with discrete support (like Poisson or Binomial). Are there any reasons to not support them in this package? I might work on a PR at some point, but wanted to check with you first.

P.S.: Great, great package! I always wanted to do something like that, but never got around to it. Relieved that I can cross that off my list ;)

Default plotting range not working for uniform distributions

Function plot_wrangler (and hence the plot, lines and points methods) doesn't work for uniform distributions if range = NULL:

library(univariateML)
plot(mlunif(0:1))

produces the following error:

Error in abs(support[1]) : non-numeric argument to mathematical function

Docs: Parametric boostrap.

  • Add reference to parametric bootstraps.
  • Note the importance of pivots.
  • Explain / document the rôle of the map and reducers, maybe with a reference.
  • More tests for reducers and mappers.

Attributes on the ml*** functions.

Sometimes we need to know e.g. the support of the ml*** before calling it. For instance, the package kdensity needs both the support and the name of the density. Or by finding the "attr(object, "support")" part of the code.

One way to fix this is to let the functions themselves have attributes.

bug in model_select

Hello,
It seems that the fix resulted in a bug in model_select function.
For all available distributions:

set.seed(1)
x <- actuar::rllogis(500, shape = 3, rate = 2)
univariateML::model_select(x, criterion = "bic")
Error in log(nos) : non-numeric argument to mathematical function

For normal distribution:

univariateML::model_select(x, models = "norm", criterion = "bic")
Maximum likelihood estimates for the Normal model 
  mean      sd  
0.5834  0.3815 

For log-logistic distribution:

univariateML::model_select(x, models = "llogis", criterion = "bic")
Error in log(nos) : non-numeric argument to mathematical function

I cannot easily see why it happens.

installation from GitHub

Hello,
when I tried to install the development version of the package, I got a warning for the actuar package:

> devtools::install_github("JonasMoss/univariateML")
Downloading GitHub repo JonasMoss/univariateML@HEAD
Skipping 1 packages not available: actuar

It seems to result from the actuar's R dependence (≥ 4.1.0) in its last version.
Is there a way to solve this issue besides updating the R version or installing the package via CRAN?

Weighted likelihood fitting

Could be interesting to allow users to supply weights (useful for imbalanced samples or varying coefficients models). We would need to check feasibility first, since other packages this library depends on might not allow for weights. Don't have time right now, but might come back to this in the future.

References for every distribution.

Add references to papers or books for the density and the package the density can be found in.

  • [] dnorm
  • [] dlogis
  • [] dcauchy
  • [] dgumbel
  • [] dlaplace
  • [] dexp
  • [] dlomax
  • [] drayleigh
  • [] dgamma
  • [] dweibull
  • [] dlnorm
  • [] dinvgamma
  • [] dbetapr
  • [] dwald
  • [] dbeta
  • [] dkumar
  • [] dunif
  • [] dpower
  • [] dpareto

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.