GithubHelp home page GithubHelp logo

panlanfeng / kernelestimator.jl Goto Github PK

View Code? Open in Web Editor NEW
27.0 4.0 6.0 666 KB

The julia package for nonparametric density estimate and regression

License: Other

Julia 100.00%
nonparametrics density-estimation nonparametric-regression kernel-density

kernelestimator.jl's Introduction

KernelEstimator

Linux: Build Status

The Julia package for nonparametric kernel density estimate and regression. This package currently includes univariate kernel density estimate, local constant regression (Nadaraya-Watson regression) and local linear regression. It can also compute the Bootstrap confidence band [4].

This package provides Gamma and Beta kernel to deal with bounded density estimation and regression. These two kernels are free of boundary bias for one side and two sides bounded data respectively, see [2, 3]. In particular, this package provide least square cross validation (LSCV) bandwidth selection functions for Gamma and Beta kernels.

Bandwidth selection is critical in kernel estimation. LSCV is always recommended. Likelihood cross validation is provided but should be avoided because of known drawbacks. For regression problem, the bandwidth of local constant regression is selected using LSCV while that for local linear regression is chosen by AIC [6].

To install and use this package in Julia,

Pkg.add("KernelEstimator")
using KernelEstimator

See usage under examples/ directory.

This package calculate densities via direct approach, i.e. adding kernel functions together. To define new kernel, you need to define a new function takes the same arguments as gaussiankernel and output the kernel weights at given point. If no bandwidth selection function is provided, lscv with numeric integration will be used by default.

Functions

This package provides two major functions, kerneldensity for kernel density estimation and npr for nonparametric regression. For kernel density, you can simply use

xdata = randn(1000)
kerneldensity(xdata)

or specify some options

xeval = linspace(-3, 3, 100)
bw = bwlscv(xdata, gaussiankernel)
kerneldensity(xdata, xeval=xeval, lb=-Inf, ub=Inf, kernel=gaussiankernel,h = bw)

xeval specifies the position you want to evaluate the density at. Default to be the same as xdata. lb and ub means lower bound and upper bound of the data. If you specify either of them to be some finite value, user choice of kernel function will be suppressed and gammakernel will be used with a warning. If you specify both, betakernel is used with a warning if user's choice is different.

For kernel regression, you can use

x = rand(Beta(4,2), 500) * 10
y=2 .* x.^2 + x .* rand(Normal(0, 5), 500)
npr(x, y)

or change the default by

npr(x, y, xeval=x, reg=locallinear, kernel=betakernel,lb=0.0, ub=10.0)

reg specifies the order of local polynomial regression. You can choose localconstant, local constant regression or locallinear, local linear regression. locallinear has better theoretical properties in prediction y and is used by default but is more computing intensive.

There is also a function computing the bootstrap confidence interval for regression.

bootstrapCB(x, y; xeval=x, B=500, reg=locallinear, lb=-Inf, ub=Inf, kernel=gaussiankernel)

B specifies the number of bootstrap sampling.

The following functions are also provided:

  • localconstant(xdata::RealVector, ydata::RealVector; xeval::RealVector=xdata, kernel::Function=gaussiankernel, h::Real=bwlocalconstant(xdata,ydata,kernel)), local constant regression (or Nadaraya-Watson)

  • locallinear(xdata::RealVector, ydata::RealVector; xeval::RealVector=xdata, kernel::Function=gaussiankernel, h::Real=bwlocalconstant(xdata,ydata,kernel)), local linear regression

and bandwidth selection functions:

  • bwnormal(xdata::Vector), bandwidth selection for density estimate by referencing to normal distribution

  • bwlscv(xdata::RealVector, kernel::Function), bandwidth selection for density estimate by least square cross validation

  • bwlcv(xdata::RealVector, kernel::Function), bandwidth selection for density estimate by likelihood cross validation

  • bwlocalconstant(xdata, ydata::Vector, kernel), bandwidth selection for local constant regression using LSCV

  • bwlocallinear(xdata, ydata::Vector, kernel), bandwidth selection for local linear regression using corrected AIC. See reference [6]

The meaning of arguments:

  • xeval is the point(s) where the density or fitted value is calculated

  • xdata is the input X

  • ydata is the response vector y; should have same length with xdata

  • reg is the regression function, localconstant or locallinear

  • kernel defaults to be gaussiankernel; should be a function

  • h is the bandwidth, should be a real scalar; If negative, the default bandwidth selection method will be used to find the bandwidth and replace it

  • lb, ub are the boundary for x. Must provide if use Beta or Gamma kernel

Demos

  • Kernel density estimate

     using Distributions
     x=rand(Normal(), 500)
     xeval=linspace(minimum(x), maximum(x), 100)
     den=kerneldensity(x, xeval=xeval)
    
  • Local regression

     y=2 .* x.^2 + rand(Normal(), 500)
     yfit0=localconstant(x, y, xeval=xeval)
     yfit1=locallinear(x, y, xeval=xeval)
     yfit0=npr(x, y, xeval=xeval, reg=localconstant)
     yfit1=npr(x, y, xeval=xeval, reg=locallinear)
    
  • Confidence Band

     cb=bootstrapCB(x, y, xeval=xeval)
     using Gadfly
     plot(layer(x=x, y=y, Geom.point), layer(x=xeval, y=yfit1, Geom.line, Theme(default_color=color("black"))),
       layer(x = xeval, y = cb[1,:], Geom.line, Theme(default_color=color("red"))),
       layer(x=xeval, y=cb[2,:], Geom.line, Theme(default_color=color("red"))))
    

Reference

  • [1] Lecture notes from Dr. Song Xi Chen

  • [2] Chen, Song Xi. "Beta kernel estimators for density functions." Computational Statistics & Data Analysis 31, no. 2 (1999): 131-145.

  • [3] Chen, Song Xi. "Probability density function estimation using gamma kernels." Annals of the Institute of Statistical Mathematics 52, no. 3 (2000): 471-480.

  • [4] W. Hardle and J. S. Marron (1991). Bootstrap Simultaneous Error Bars for Nonparametric Regression. The Annals of Statistics. Vol. 19, No. 2 (Jun., 1991), pp. 778-796

  • [5] W.Hardle and E. Mammen (1993). Comparing Nonparametric Versus Parametric Regression Fits. The Annals of Statistics. Vol. 21, No. 4 (Dec., 1993), pp. 1926-1947

  • [6] Clifford M. Hurvich, Jeffrey S. Simonoff and Chih-Ling Tsai (1998). Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion. Journal of the Royal Statistical Society. Series B (Statistical Methodology), Vol. 60, No. 2 (1998), pp. 271-293

kernelestimator.jl's People

Contributors

panlanfeng avatar pkofod avatar tkelman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kernelestimator.jl's Issues

installation issue

Hi,

on Julia 0.4.5 (Windows 7, 64bit), Pkg.add("KernelEstimator") generates lots of error messages when trying to install Cubature.

However, this works: first install Cubature, then Yeppp and finally KernelEstimator.

Btw. the Require file says "julia 0.3." Wrong?

/Paul S

zero density error at bounds

There are examples of situations where the density at a bound is returned as zero when it shouldn't be.

An example:

using KernelEstimator, Distributions, StatsPlots

dist = Exponential(1)
x = rand(dist, 1000);
histogram(x, normalize=:pdf, alpha=0.2, label=false)
xs = LinRange(0, maximum(x)+1, 399);
den = kerneldensity(x, xeval=xs, kernel=gammakernel, lb=0.0, ub=Inf);
plot!(xs, den, lw=2, colour=:red, label="kerneldensity")
plot!(xs, pdf(dist, xs), color=:black, lw=2, label="true")

Screenshot 2021-02-13 at 13 21 23

and all(den .> 0) results in false

Basic demo fails with DomainError()

xdata = randn(1000)
kerneldensity(xdata)
DomainError:
in nan_dom_err at .\math.jl:196 [inlined]
in log(::Float64) at .\math.jl:202
in gaussiankernel(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Int64) at~.julia\v0.5\KernelEstimator\src\kernel.jl:132
in #kerneldensity#11(::Array{Float64,1}, ::Float64, ::Float64, ::KernelEstimator.#gaussiankernel, ::Float64, ::Function, ::Array{Float64,1}) at ~.julia\v0.5\KernelEstimator\src\density.jl:12
in kerneldensity(::Array{Float64,1}) at ~.julia\v0.5\KernelEstimator\src\density.jl:4
in include_string(::String, ::String) at .\loading.jl:441
in eval(::Module, ::Any) at .\boot.jl:234
in (::Atom.##67#70)() at ~.julia\v0.5\Atom\src\eval.jl:40
in withpath(::Atom.##67#70, ::Void) at ~.julia\v0.5\CodeTools\src\utils.jl:30
in withpath(::Function, ::Void) at ~\v0.5\Atom\src\eval.jl:46
in macro expansion at ~.julia\v0.5\Atom\src\eval.jl:109 [inlined]
in (::Atom.##66#69)() at .\task.jl:60

support for custom kernels

I'd like to use my own kernel function, based on the one used here in opt_h.m. I'm unaware of a name for it, but the expression is

15/16 * (1-u^2)^2 * (abs(u) <= 1)

This is easy enough to code up, but bwlocalconstant is structured so that choice of h0, hlb, hub are hard-coded depending on the choice of kernel.

I can get around the issue by defining my own version of bwlocalconstant, but it might be more robust to fix it in the package so that others can also use their own kernels

Can't install: Unsatisfiable requirements

(@v1.9) pkg> add KernelEstimator
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package CUDA [052768ef]:
 CUDA [052768ef] log:
 ├─possible versions are: 0.1.0-4.3.2 or uninstalled
 ├─restricted to versions * by an explicit requirement, leaving only versions: 0.1.0-4.3.2
 ├─restricted by compatibility requirements with SpecialFunctions [276daf66] to versions: 0.1.0-2.6.3 or uninstalled, leaving only versions: 0.1.0-2.6.3
 │ └─SpecialFunctions [276daf66] log:
 │   ├─possible versions are: 0.7.0-2.2.0 or uninstalled
 │   ├─restricted to versions * by an explicit requirement, leaving only versions: 0.7.0-2.2.0
 │   └─restricted by compatibility requirements with KernelEstimator [857edff2] to versions: 0.7.0-0.10.3
 │     └─KernelEstimator [857edff2] log:
 │       ├─possible versions are: 0.3.3 or uninstalled
 │       └─restricted to versions * by an explicit requirement, leaving only versions: 0.3.3
 ├─restricted by julia compatibility requirements to versions: [2.3.0, 2.5.0-4.3.2] or uninstalled, leaving only versions: [2.3.0, 2.5.0-2.6.3]
 └─restricted by compatibility requirements with LLVM [929cbde3] to versions: 3.3.3-4.3.2 or uninstalled — no versions left
   └─LLVM [929cbde3] log:
     ├─possible versions are: 0.9.0-5.2.0 or uninstalled
     ├─restricted by compatibility requirements with CUDA [052768ef] to versions: 1.5.0-5.2.0
     │ └─CUDA [052768ef] log: see above
     └─restricted by julia compatibility requirements to versions: 4.0.0-5.2.0 or uninstalled, leaving only versions: 4.0.0-5.2.0
``

Multivariant regression?

Hi there, nice package. Any plans/capabilities for multiple regression, several independent variables, or multivariant regression (more than one dependent variable) ?

Ridging to handle small bandwidth?

Hi,
Thank you so much for your code.
I was wondering if there was a plan of dealing with small bandwidth, which can create numerical issues.
Cheng,Hall, and Titterington (1997) talks about ridging when bandwidth values are too small.
Do you plan to implement something similar?
The "np" package by Racine and Hayfield in R uses that, and I would love to see it implemented in Julia for my research purposes.
Thanks you !
Thomas Vigié

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.