GithubHelp home page GithubHelp logo

8-u8 / gpboost Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fabsig/gpboost

0.0 0.0 0.0 177.99 MB

Combining tree-boosting with Gaussian process and mixed effects models

License: Other

CMake 1.21% C 1.26% R 1.66% C++ 90.42% HTML 0.06% Python 1.52% Cuda 0.45% Shell 0.13% M4 0.01% Fortran 2.65% JavaScript 0.02% CSS 0.01% Makefile 0.01% Starlark 0.31% NASL 0.24% Batchfile 0.01% Less 0.01% SWIG 0.05% XSLT 0.01%

gpboost's Introduction

GPBoost icon

GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

Table of Contents

  1. Get Started
  2. Modeling background
  3. News
  4. Open issues - contribute
  5. References
  6. License

Get started

GPBoost is a software library for combining tree-boosting with Gaussian process and grouped random effects models (aka mixed effects models or latent Gaussian models). It also allows for independently applying tree-boosting as well as Gaussian process and (generalized) linear mixed effects models (LMMs and GLMMs). The GPBoost library is predominantly written in C++, it has a C interface, and there exist both a Python package and an R package.

For more information, you may want to have a look at:

Modeling background

The GPBoost library allows for combining tree-boosting with Gaussian process and grouped random effects models in order to leverage advantages of both techniques and to remedy drawbacks of these two modeling approaches.

Background on Gaussian process and grouped random effects models

Tree-boosting has the following advantages and disadvantages:

Advantages of tree-boosting Disadvantages of tree-boosting
- Achieves state-of-the-art predictive accuracy - Assumes conditional independence of samples
- Automatic modeling of non-linearities, discontinuities, and complex high-order interactions - Produces discontinuous predictions for, e.g., spatial data
- Robust to outliers in and multicollinearity among predictor variables - Can have difficulty with high-cardinality categorical variables
- Scale-invariant to monotone transformations of the predictor variables
- Automatic handling of missing values in predictor variables

Gaussian process (GPs) and grouped random effects models (aka mixed effects models or latent Gaussian models) have the following advantages and disadvantages:

Advantages of GPs / random effects models Disadvantages of GPs / random effects models
- Probabilistic predictions which allows for uncertainty quantification - Zero or a linear prior mean (predictor, fixed effects) function
- Incorporation of reasonable prior knowledge. E.g. for spatial data: "close samples are more similar to each other than distant samples" and a function should vary contiunuously / smoothly over space
- Modeling of dependency which, among other things, can allow for more efficient learning of the fixed effects (predictor) function
- Grouped random effects can be used for modeling high-cardinality categorical variables

GPBoost and LaGaBoost algorithms

The GPBoost library implements two algorithms for combining tree-boosting with Gaussian process and grouped random effects models: the GPBoost algorithm (Sigrist, 2020) for data with a Gaussian likelihood (conditional distribution of data) and the LaGaBoost algorithm (Sigrist, 2021) for data with non-Gaussian likelihoods.

For Gaussian likelihoods (GPBoost algorithm), it is assumed that the response variable (aka label) y is the sum of a potentially non-linear mean function F(X) and random effects Zb:

y = F(X) + Zb + xi

where xi is an independent error term and X are predictor variables (aka covariates or features).

For non-Gaussian likelihoods (LaGaBoost algorithm), it is assumed that the response variable y follows some distribution p(y|m) and that a (potentially multivariate) parameter m of this distribution is related to a non-linear function F(X) and random effects Zb:

y ~ p(y|m)
m = G(F(X) + Zb)

where G() is a so-called link function.

In the GPBoost library, the random effects can consists of

  • Gaussian processes (including random coefficient processes)
  • Grouped random effects (including nested, crossed, and random coefficient effects)
  • Combinations of the above

Learning the above-mentioned models means learning both the covariance parameters (aka hyperparameters) of the random effects and the predictor function F(X). Both the GPBoost and the LaGaBoost algorithms iteratively learn the covariance parameters and add a tree to the ensemble of trees F(X) using a gradient and/or a Newton boosting step. In the GPBoost library, covariance parameters can (currently) be learned using (Nesterov accelerated) gradient descent, Fisher scoring (aka natural gradient descent), and Nelder-Mead. Further, trees are learned using the LightGBM library.

See Sigrist (2020) and Sigrist (2021) for more details.

News

Open issues - contribute

Software issues

Computational issues

  • Add GPU support for Gaussian processes
  • Add CHOLMOD support

Methodological issues

  • Add a spatio-temporal Gaussian process model (e.g. a separable one)
  • Add possibility to predict latent Gaussian processes and random effects (e.g. random coefficients)
  • Implement more approaches such that computations scale well (memory and time) for Gaussian process models and mixed effects models with more than one grouping variable for non-Gaussian data
  • Support sample weights

References

License

This project is licensed under the terms of the Apache License 2.0. See LICENSE for additional details.

gpboost's People

Contributors

fabsig avatar fonnesbeck avatar lorenzwalthert avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.