GithubHelp home page GithubHelp logo

gpclust's Introduction

GPclust

Clustering time series using Gaussian processes and variational Bayes.

User guide and tutorials are available via the included notebooks.

Currently implemented models are

  • MOG - Mixture of Gaussians
  • MOHGP - Mixtures of Hierarchical Gaussian processes
  • OMGP - Overlapping mixtures of Gaussian processes

Citation

The underlying algorithm is based on the 2012 NIPS paper:

http://books.nips.cc/papers/files/nips25/NIPS2012_1314.pdf

@article{hensman2012fast,
  title={Fast variational inference in the conjugate exponential family},
  author={Hensman, James and Rattray, Magnus and Lawrence, Neil D},
  journal={Advances in Neural Information Processing Systems},
  year={2012}
}

The code also implements clustering of Hierachical Gaussian Processes using that inference framework, detailed in the two following works:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6802369

@article{hensman2014fast,
  author={Hensman, J. and Rattray, M. and Lawrence, N.},
  journal={Pattern Analysis and Machine Intelligence, IEEE Transactions on},
  title={Fast nonparametric clustering of structured time-series},
  year={2014},
  volume={PP},
  number={99},
  keywords={Biological system modeling;Computational modeling;Data models;Gaussian processes;Optimization;Time series analysis},
  doi={10.1109/TPAMI.2014.2318711},
  ISSN={0162-8828}
}

http://www.biomedcentral.com/1471-2105/14/252

@article{hensman2013hierarchical,
  title={Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters},
  author={Hensman, James and Lawrence, Neil D and Rattray, Magnus},
  journal={BMC bioinformatics},
  volume={14},
  number={1},
  pages={1--12},
  year={2013},
  publisher={BioMed Central}
}

Additionally Overlapping Mixtures of Gaussian Processes model is implemented (using the variational methods described in the above), which was published in this paper:

@article{Lazaro-Gredilla2012,
  title = {{Overlapping Mixtures of Gaussian Processes for the data association problem}},
  author = {L{\'{a}}zaro-Gredilla, Miguel and {Van Vaerenbergh}, Steven and Lawrence, Neil D.},
  doi = {10.1016/j.patcog.2011.10.004},
  journal = {Pattern Recognition},
  month = {apr},
  number = {4},
  pages = {1386--1395},
  url = {},
  volume = {45},
  year = {2012}
}

Dependencies

This work depends on the GPy project, as well as the numpy/scipy stack. matplotlib is optional for plotting.

I've tested the demos with GPy v0.8, but it should work with later versions also.

Contributors

  • James Hensman
  • Valentine Svensson
  • Max Zwiessele

gpclust's People

Contributors

jameshensman avatar lawrennd avatar mzwiessele avatar vals avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpclust's Issues

plot error (OMGP)

Python 3.6.8

I executed this code.

from GPclust import OMGP
import numpy as np
import matplotlib.pyplot as plt

X = np.linspace(0.0, 1.0, 10).reshape(-1, 1)
Y = np.linspace(0.0, 1.0, 10).reshape(-1, 1)
m = OMGP(X, Y, K=2, variance=0.01, prior_Z='DP')
m.plot()

Error message appeared.

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/matplotlib/colors.py in to_rgba(c, alpha)
    173     try:
--> 174         rgba = _colors_full_map.cache[c, alpha]
    175     except (KeyError, TypeError):  # Not in cache, or unhashable.

KeyError: (0.38499836444719515, None)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
8 frames
ValueError: Invalid RGBA argument: 0.38499836444719515

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4243                         "acceptable for use with 'x' with size {xs}, "
   4244                         "'y' with size {ys}."
-> 4245                         .format(nc=n_elem, xs=x.size, ys=y.size)
   4246                     )
   4247                 # Both the mapping *and* the RGBA conversion failed: pretty

ValueError: 'c' argument has 10 elements, which is not acceptable for use with 'x' with size 10, 'y' with size 10.

multiple_pdinv does NOT return inverses

Hi. In the utilities.py file, the function multiple_pdinv is described as:

def multiple_pdinv(A):
    """
    Arguments
    ---------
    A : A DxDxN numpy array (each A[:,:,i] is pd)

    Returns
    -------
    invs : the inverses of A
    hld: 0.5* the log of the determinants of A
    """

but when I call the function with random PD matrices, I don't always return the inverse.

In particular, calling the function with the PD matrix

A = np.asarray([[2.,-1.,0.],[-1.,2.,-1.],[0.,-1.,2.]])[:,:,None]

returns the correct

[[ 0.75  0.5   0.25]
 [ 0.5   1.    0.5 ]
 [ 0.25  0.5   0.75]]

but calling it with a random matrix, say

A = [[[ 1.00487128]
  [ 0.10450152]
  [ 0.78902973]]
 [[ 0.64999066]
  [ 1.92596954]
  [ 0.08218574]]
 [[ 0.59447227]
  [ 0.43842888]
  [ 1.17832435]]]

returns

invs[:,:,0] = [[ 1.4981143   0.14944808 -1.01359122]
 [-0.48097944  0.47961461  0.28862138]
 [-0.57684638 -0.2538517   1.25263636]]

but the inverse is

np.linalg.inv(A) = [[ 1.66485487 -0.40496621 -0.68925095]
 [-0.40496621  0.66577222 -0.04341129]
 [-0.68925095 -0.04341129  1.21254673]]

Am I missing an assumption on what types of tensors go into multiple_pdinv?

A bit unclear on the dimensions of the "times" array

Hello,

I am trying to reproduce your drosophila example. I was unable to get the kalinka dataset through urlretrieve but I got it from a thread on the GP's github page. Unfortunately, I get stuck here:

replicates, times = np.loadtxt('/Users/n593117/GPclust/data/kalinka09_pdata.csv', delimiter=',').T

with this error:

  /Library/Python/2.7/site-packages/numpy/lib/npyio.pyc in floatconv(x)
    657         if b'0x' in x:
    658             return float.fromhex(asstr(x))
--> 659         return float(x)
    660 
    661     typ = dtype.type

 ValueError: could not convert string to float: an

This is too bad because now I can't print out 'times', and I am a bit confused as to what the dimensions of 'times' are supposed to be. At first I thought it would be just a 57x1 array for the 57 time values, but I think it's a bit more involved than that. Could you please help?

Index Signals Within Clusters (MOHGP)

I have been implementing MOHGP, following your notebook MOHGP_demo.How do you find information which describes which signals/indices have been assigned to each cluster? Thanks

Demos giving error due to mismatch of arguments.

Hi,
I was trying to execute some demo files using both 'master' and 'devel' branch of GPy.
But most of the demos are breaking giving me the following error.
Thanks.

In [1]: run MOHGP_sine_demo.py
/home/muhammad/mlprojects/GPy/GPy/util/linalg.py:48: UserWarning: warning: caught this exception:'module' object has no attribute '_dotblas'

warnings.warn("warning: caught this exception:" + str(e))

TypeError Traceback (most recent call last)
/home/muhammad/mlprojects/colvb/examples/MOHGP_sine_demo.py in ()
37 #Ky2 = kern.white(1,0.01)
38 Ky = Ky1 + Ky2
---> 39 m = MOHGP(X,Kf,Ky,Y, K=Nclust, prior_Z = 'DP', alpha=alpha)
40 #m.ensure_default_constraints()
41 #m.checkgrad(verbose=1)

/home/muhammad/mlprojects/GPy/GPy/core/parameterization/parameterized.py in call(self, _args, *_kw)
17 self.in_init = True
18 #import ipdb;ipdb.set_trace()
---> 19 self = super(ParametersChangedMeta, self).call(_args, *_kw)
20 logger.debug("finished init")
21 self.in_init = False

/home/muhammad/mlprojects/colvb/colvb/MOHGP.pyc in init(self, X, kernF, kernY, Y, K, alpha, prior_Z)
33 self.YTY = np.dot(self.Y.T,self.Y)
34
---> 35 collapsed_mixture.init(self, N, K, prior_Z, alpha)
36
37 def _set_params(self,x):

/home/muhammad/mlprojects/colvb/colvb/col_mix.pyc in init(self, N, K, prior_Z, alpha)
34 self.set_vb_param(np.random.randn(self.N*self.K))
35
---> 36 col_vb.init(self)
37
38 def set_vb_param(self,phi_):

/home/muhammad/mlprojects/colvb/colvb/col_vb.pyc in init(self)
23 def init(self):
24 """"""
---> 25 GPy.core.model.Model.init(self)
26
27 #stuff for monitoring the different methods

TypeError: init() takes exactly 2 arguments (1 given)

Model initialization

Line 25 in col_vb.py:
GPy.core.model.Model.init(self)

Line 19 in GPy.model.py:
def init(self, name)

It seems that there is a parameter mismatch. Need to pass name to the model initialization. What version of GPy do you use to run colvb?

Possible error in GPclust plot routine

Here is the code snippet that plots the error on the model prediction. Is the use of YY_var[:, 0] correct?

plt.fill_between(XX[:,0],

   YY_mu[:, 0] - 2 * np.sqrt(YY_var[:, 0]),
   YY_mu[:, 0] + 2 * np.sqrt(YY_var[:, 0]),
   alpha=0.1,
   facecolor=col)

If not, YY_var[:, 0],needs to be replaced with the elements of the diagonal of the matrix. Is that correct?

Maybe numpy.diagonal(YY_var)?

Different time series for each time course

I've been looking at how to modify GPClust to handle the situation in which each time course has a different X. The application I'm working on is clustering patients with MND. They have various metrics recorded at irregular intervals (e.g. one person was sampled on day 3,34,64,71,99; another on day 12,54,102,103,120, etc...). I suspect that there are different 'types' of progression and would like to see if I can detect clusters.

I'll also need to look into whether I can add a time offset as a parameter, for each time course (as I don't know when each time course 'starts', i.e. when day zero was for each person). Finally, each person has ~10 different metrics (all recorded together at each interval) - I'll need to look into how to use a multiple-output GP in the clustering framework.

I noticed in MOHGP.py you mention that "#prediction as per my notes" - I'm trying to go from your paper to the code, but if there's some intermediate reasoning somewhere, that would be super helpful!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.