GithubHelp home page GithubHelp logo

jamesrobertlloyd / gpss-research Goto Github PK

View Code? Open in Web Editor NEW
185.0 19.0 73.0 228.07 MB

Kernel structure discovery research code - likely to be unstable

License: MIT License

M 0.05% MATLAB 15.46% Python 10.13% Mathematica 3.61% TeX 63.68% Shell 0.15% R 0.09% HTML 3.45% CSS 0.01% Makefile 0.04% C++ 0.75% Fortran 2.50% C 0.02% OpenEdge ABL 0.07%

gpss-research's Introduction

This is part of the automatic statistician project

Automatic Bayesian Covariance Discovery

This repo contains the source code to run the system described in the paper

Automatic Construction and Natural-Language Description of Nonparametric Regression Models by James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum and Zoubin Ghahramani, appearing in AAAI 2014.

Abstract

This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural-language text. Our approach treats unknown regression functions nonparametrically using Gaussian processes, which has two important consequences. First, Gaussian processes can model functions in terms of high-level properties (e.g. smoothness, trends, periodicity, changepoints). Taken together with the compositional structure of our language of models this allows us to automatically describe functions in simple terms. Second, the use of flexible nonparametric models and a rich language for composing them in an open-ended manner also results in state-of-the-art extrapolation performance evaluated over 13 real time series data sets from various domains.

Feel free to email the authors with any questions:
James Lloyd ([email protected])
David Duvenaud ([email protected])
Roger Grosse ([email protected])

Data used in the paper

Related Repo

Source code to run an earlier version of the system, appearing in Structure Discovery in Nonparametric Regression through Compositional Kernel Search by David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani
can be found at

github.com/jamesrobertlloyd/gp-structure-search/.

gpss-research's People

Contributors

duvenaud avatar jamesrobertlloyd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpss-research's Issues

Unexpected number of expansions in additive mode

/scratch/home/Research/GPs/gpss-research/experiments/2013-09-26.py

( M(0, SE(ell=0.3, sf=5.5)) + ( M(0, FT(ell=-1.9, p=-0.0, sf=3.2)) x M(0, LN(off=-0.8, ell=1.3, loc=1950.7)) ) )

yielded as much as

( M(0, SE(ell=0.3, sf=5.5)) + M(0, FT(ell=-1.9, p=-0.0, sf=3.2)) )

Seems wrong - but I might have been mistaken

Blackout (and others) are numerically unstable

Derivative w.r.t location can be infty * 0 - can either be fixed by changing order of calculation or by thresholding quantities by realmax

Another (fiddly) way would be to use signed log transforms

Try the log transform of some data

A teaser for learning output warping or a demonstration of deficiency?

How would we compare marginal likelihoods? Check out the warped GP paper.

Unstandardised data?

Most aspects of the algorithm can scale appropriately but hard to control everything - should we just standardise data before running the search?

Make suitable for multi-d again

Changepoints etc. should select a dimension to act upon (but should pass all data shape and variables downstream)

The 10 fold cross validation needs to be updated

The new data shape parameters need to behave correctly

Combine anticorrelated components

If the sum of two kernel components dramatically reduces uncertainty (at points where the uncertainty is greater than zero e.g. blackouts / changepoints) then these components probably belong together e.g. A + A + B -> 2A + B

Reintroduce the laplace approx

Will increase the need for anticorrelated component detection (and combination) since laplace will recognise this as being ok

SE lengthscale restarts

Should sometimes be very large e.g. twice the data range - this is the neutral value (ie.. inifintiy)

Mixture of lengthscales kernel

Rather than the broad mixture that is RQ - maybe try a tighter mixture e.g. a Gaussian centred on a particular lengthscale?

Sum, Product, Changepoint should be operators

This will likely tidy up some duplicate code since we know that all operators have operands etc

Also - if we record properties like commutativity / distributivity etc. we can abstract their behaviour.

Does the jitter heuristic work

Does the search multiply by Lin again?
Is the jitter size correct (too big and we lose optimised values - too small and spurious Lins will appear again)

Mask kernels should allow None dimensions

e.g. for Const kernel which does not depend on dimension

Alternatively - we should not always use masks - only when appropriate

One of these solutions needed to make hashing in multi-d correct

More data

Earthquakes
EEG
Changepoint papers?
Fault detection papers?
Multiresolution paper?
Fix some of the current data sets that were subsampled.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.