GithubHelp home page GithubHelp logo

kdd-opensource / coreq Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 53 KB

CORelation EQuivalence (COREQ) algorithm to compute low redundancy correlation matrices

License: MIT License

Makefile 0.50% C 88.17% Python 11.34%

coreq's Introduction

COREQ

An implementation of the COREQ (CORrelation EQuivalence) algorithm to compute low redundancy estimates for large correlation matrices. The algorithm is described and analysed in detail in the paper:

Erik Scharwächter, Fabian Geier, Lukas Faber, Emmanuel Müller. Low Redundancy Estimation of Correlation Matrices for Time Series using Triangular Bounds. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018.

Contact and citation

  • Corresponding author: Erik Scharwächter
  • Please cite our paper if you use or modify our code for your own work.

Description

Please check our our project website at https://hpi.de/mueller/coreq.html for more information on the method, supplementary material and visualizations.

Installation

Automatically build and install the BlockCorr module for Python2 and Python3 with

$ make
$ make install

or use the setup.py script (check --help for details).

Usage

Example: A dataset of 1000 time series (length 50) that consists of two perfectly separated clusters of size 700 and 300.

>>> import numpy as np
>>> import BlockCorr
>>> np.random.seed(0)
>>> X = np.random.permutation(np.concatenate((np.repeat(np.random.randn(1, 50), 700, axis=0),
            np.repeat(np.random.randn(1, 50), 300, axis=0))))
>>> X.shape
(1000, 50)
>>> alpha = 0.8
>>> labels, pivots, pivot_corr_triu, computations = BlockCorr.COREQ(X, BlockCorr.ESTIMATE_AVERAGE, alpha)
clustering finished with 1300 correlation computations --- 2 clusters detected
>>> pivot_corr_triu
array([ 1.        , -0.05918059,  1.        ])

>>> # full pivot correlation matrix:
>>> pivot_corr = np.zeros((len(pivots), len(pivots)))
>>> pivot_corr[np.triu_indices(len(pivots))] = pivot_corr_triu
>>> pivot_corr[np.tril_indices(len(pivots))] = pivot_corr_triu

>>> # full time series correlation matrix:
>>> cluster_matrix = np.zeros((X.shape[0], len(pivots)))
>>> cluster_matrix[range(0,len(labels)), labels] = 1
>>> R = cluster_matrix.dot(pivot_corr).dot(cluster_matrix.transpose())
>>> np.abs(R - np.corrcoef(X)).sum()
0.0

License

The code is released under the MIT license.

coreq's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.