GithubHelp home page GithubHelp logo

ajd98 / kde Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 3.0 99 KB

Python library for kernel density estimation, including interface for WESTPA data

Python 74.85% Makefile 1.64% C 6.25% C++ 0.43% Cuda 16.82%

kde's Introduction

kde

Overview

This repository provides a Python library for kernel density estimation. In comparison to other Python implementations of kernel density estimation, key features of this library include:

  1. Support for weighted samples
  2. A variety of kernels, including a smooth, compact kernel.
  3. Interface for kernel density estimation from WESTPA data sets (https://westpa.github.io/westpa/).

Basics

Kernel density estimation is a technique for estimation of a probability density function based on empirical data. Suppose we have some observations xᵢ ∈ V where i = 1, ..., n and V is some feature space, typically ℝᵈ. Given a metric 𝒹: V × V → ℝ⁺∪{0}, a kernel function K: ℝ → ℝ⁺∪{0} with ∀ x ∈ V, ∫ᵥK(𝒹(x,y))dy = 1, and a bandwidth h ∈ ℝ⁺, the kernel density estimate p: V → ℝ⁺∪{0} is defined as:

p(x) := 1/(hn) ΣᵢK(𝒹(x,xᵢ)/h)

This library simplifies calculation by including only a set of metrics 𝒹 that may be expressed as 𝒹(x,xᵢ) = q(x-xᵢ) for some norm q:V → ℝ⁺∪{0}:

p(x) := 1/(hn) ΣᵢK(q(x-xᵢ)/h)

Similarly, a weighted version of the kernel density estimate may be defined as:

p(x) := 1/h ΣᵢwᵢK(q(x-xᵢ)/h)

where wᵢ is the weight of the ith sample, and Σᵢwᵢ=1.

This package includes the following kernel functions:

kernel equation
bump p(x) ∝ 1Aexp(1/(x²-1))
cosine p(x) ∝ 1Acos(πx/2)
epanechnikov p(x) ∝ 1A(1-x²)
gaussian p(x) ∝ exp(-x²/2)
logistic p(x) ∝ 1/(exp(-x)+2+exp(x))
quartic p(x) ∝ 1A(1-x²)²
tophat p(x) ∝ 1A
triangle p(x) ∝ 1A(1-‖x‖)
tricube p(x) ∝ 1A(1-‖x‖³)³

In the above definitions, 1A is the indicator function and A = {x: ‖x‖ < 1}.

For more information on the mathematical theory of kernel density estimation, see the following references:

  1. Rosenblatt, Murray. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Statist. 27 (1956), no. 3, 832–837. doi:10.1214/aoms/1177728190.
  2. Parzen, Emanuel. On Estimation of a Probability Density Function and Mode. Ann. Math. Statist. 33 (1962), no. 3, 1065–1076. doi:10.1214/aoms/1177704472.

Installation

This library requires Numpy, Scipy, and Cython. In addition, gcc is required for compilation. To install, run make from the directory in which this README file is found.

CUDA installation requires the CUDA toolkit and has been tested with CUDA version 8.0.44 on GTX 1080 cards. To install the CUDA backend, run make cuda from the directory in which this README file is found. The CUDA backend may be used by passing the cuda=True keyword argument to the evaluate method of the KDE class.

Use

Kernel density estimation with arbitrary data

Before using this library, you will need to make sure that it may be imported by Python. To do so, add the top-level directory of this git repository (the directory containing this README file) to your PYTHONPATH environment variable. If this does not work, you may also add the following commands to the top of your Python script:

import sys
sys.path.append("path to this git repository")

Then, import the kde module via Python.

Kernel density estimation is performed via the KDE class, accessible as kde.KDE.

class kde.KDE(training_points, kernel='gaussian', weights=None, metric='euclidean_distance', bw=1)

Parameters:

Parameter Data type Description
training_points numpy.ndarray The values of the samples in ℝⁿ or (S¹)ⁿ = S¹×S¹×...×S¹
kernel string The kernel. Options are:
"bump"
"cosine"
"epanechnikov"
"gaussian"
"logistic"
"quartic"
"tophat"
"triangle"
"tricube"
See above for kernel definitions.
weights numpy.ndarray or None The weights of the samples. If None, the samples are uniformly weighted.
metric string The norm from which to induce the metric for distance between points. Options are 'euclidean_distance' and 'euclidean_distance_ntorus'. 'euclidean_distance_ntorus' assumes the sample space is an n-torus (S¹×S¹×...×S¹) where each dimension runs between -180 and 180, and the distance is the minimum euclidean distance to a periodic image.
bw float The bandwidth of the kernel

Methods:

Method Description
set_kernel_type(kernel) Set the kernel to kernel. See above for options.
evaluate(p, cuda=False) Evaluate the kernel density estimate at each position of p, an m-by-k numpy array, where m is the number of samples and k is the number of features. If cuda=True, use the CUDA backend (requires compilation with the cuda option; see Installation above).

Kernel density estimation with WESTPA data

This library provides classes for interacting with WESTPA data sets, enabling kernel density estimation from WESTPA data via Python scripts and via the command line.

From within a Python script, import the kde module, which provides the kde.WKDE class for interacting with WESTPA data sets. The WKDE class should be initialized as:

kde.WKDE(westh5, first_iter=None, last_iter=None, load_func=None, bw=1)
Parameter Data type Description
westh5 h5py HDF5 File object The WESTPA data file (typically named 'west.h5')
first_iter int or None The first weighted ensemble iteration from which to use data. If None, start at iteration 1.
last_iter int or None The last weighted ensemble iteration from which to use data (inclusive).
load_func Python function or None Load data using the specified Python function. The function will be called as load_func(iter_group, niter) where iter_group is the HDF5 group corresponding to a weighted ensemble iteration, and niter is an integer denoting the index of the weighted ensemble iteration. The function should return a numpy array of shape (nsegs, ntimepoints, ndim) where nsegs is the number of segments in that iteration, ntimepoints is the number of sub-iteration timepoints, and ndim is the number of dimensions of the coordinate. If None (default), load the progress coordinate data.
bw float The bandwidth to use for the kernel. See the bw parameter of the kde.KDE class for more information.

Following initialization, call the evaluate method as <WKDE class instance>.evaluate(points) to evaluate the kernel density estimate at each point in points. A gaussian kernel is set automatically; to use another kernel, use the set_kernel_type method (see documentation for kde.KDE) followed by the evaluate method.

To interact with WESTPA data from the command line, run python -m kde.w_kde; include the -h or --help flag for more information.

kde's People

Contributors

ajd98 avatar

Stargazers

 avatar XYZliang avatar Matthias Schöffel avatar James Chang avatar YeChongjie avatar Ryan Young avatar Izzy Turtle avatar Frederick Ayala avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.