machenslab / dpca Goto Github PK

An implementation of demixed Principal Component Analysis (a supervised linear dimensionality reduction technique)

License: MIT License

MATLAB 22.23% Python 9.33% Jupyter Notebook 68.45%

dpca's Introduction

demixed Principal Component Analysis (dPCA)

dPCA is a linear dimensionality reduction technique that automatically discovers and highlights the essential features of complex population activities. The population activity is decomposed into a few demixed components that capture most of the variance in the data and that highlight the dynamic tuning of the population to various task parameters, such as stimuli, decisions, rewards, etc.

D Kobak⁺, W Brendel⁺, C Constantinidis, CE Feierstein, A Kepecs, ZF Mainen, X-L Qi, R Romo, N Uchida, CK Machens
Demixed principal component analysis of neural population data
eLife 2016, https://elifesciences.org/content/5/e10989
(arXiv link: http://arxiv.org/abs/1410.6031)

This repository provides easy to use Python and MATLAB implementations of dPCA as well as example code.

Use dPCA

Simple example code for surrogate data can be found in dpca_demo.ipynb and dpca_demo.m.

Python package

The Python package is tested against Python 2.7 and Python 3.4. To install, first make sure that numpy, cython, scipy, sklearn, itertools and numexpr are avaible. Then copy the files from the Python subfolder to a location in the Python search path.

Alternatively, from the terminal you can install the package by running:

$  cd /path/to/dPCA/python
$  python setup.py install

API of dPCA is similar to sklearn. To use dPCA, you should first import dPCA,
from dpca import dPCA
then initialize it,
dpca = dPCA(labels, n_components, regularizer)
then call the fitting function on your data to get the latent components Z,
Z = dpca.fit_transform(X).

The required initialization parameters are:

X - A multidimensional array containing the trial-averaged data. E.g. X[n,t,s,d] could correspond to the mean response of the n-th neuron at time t in trials with stimulus s and decision d. The observable (e.g. neuron index) needs to come first.
labels - Optional; list of characters with which to describe the parameter axes, e.g. 'tsd' to denote time, stimulus and decision axis. All marginalizations (e.g. time-stimulus) are refered to by subsets of those characters (e.g. 'ts').
n_components - Dictionary or integer; if integer use the same number of components in each marginalization, otherwise every (key,value) pair refers to the number of components (value) in a marginalization (key).

More detailed documentation, and additional options, can be found in dpca.py.

MATLAB package

Add the Matlab subfolder to the Matlab search path.

Example code in dpca_demo.m generates surrogate data and provides a walkthrough for running PCA and dPCA analysis and plotting the results.

Support

Email [email protected] (Python) or [email protected] (Matlab) with any questions.

Contributors

A big thanks for 3rd party contributions goes to cboulay.

dpca's People

Contributors

Stargazers

Watchers

Forkers

lupitases kelsonss rcfduarte chenfangand stevest mccannannamary mnarayan hyy2046 wilhelmbraun danjgale sundaymorning2017 sjjerjian kekeding jacklxc batermj eejd nemyers colinbredenberg swkeemink yonibrowning zwzhang36 paranyaj manu-tej siriusplus ssharma435 windsorting joanpe williazhang takahashikazutaka shhong qiguangyao jfragnie julian-hinz saloni2611 vian-vian gumpfly suikaxhq edgarbc mafrasiabi vdplasthijs spewil fagan2888 haixinliuneuro wj2 srinivaschivukula lpl1977 wielandbrendel yguo29 ioqfwfq buercrazy int-brain-lab jianshu-l cbschen superyip li-zhao-0621 shinkira brkanter snel-repo stephentown42 alihaydaroglu jyotikab eltabbal psychenjg ryaoki kumarneelabh13 pedrolagomarsino charlesbmi rotaryheaven zhangxiaoxing miao-cissy healthyao kylepblum jourmore dabalp qingchu109945 miketrumpis bkxcyu xhb120633 dlondon12 haidongqing2022 elsayedaa paulmanderson qsimeon ts-mindyourmind hackayan camilosada fly9636

dpca's Issues

Add cross-validation code to example notebook

Unable to find vcvarsall.bat

I've tried installing dPCA on two Windows 10 x64 computers and run into the same error message when building 'dPCA.nan_shuffle' extension.

Python version is 2.7.14 64-bit. It is an Anaconda distribution.

I tried installing using python setup.py install after confirming that all the required packages have been installed.

Small typo in python docs for `join` keyword?

Apologies if this is a misunderstanding on my part, but I think the example is slightly wrong here?

    join : None or dict
        ...
        e.g. if we are only interested in the time-modulated stimulus components.
        In this case, we would pass {'ts' : ['t','ts']}.

Shouldn't we pass {'ts' : ['s','ts']} instead?

the sum of explained_variance_ratio_ is not 1 in python

Hi. Just to make sure whether it's normal to be much less than 1 (about 0.5) when having all the explained_variance_ratio_ added up in dPCA (python)? if not, where should I make a mistake?

(Python) Inconsistency in _randomized_dpca docstring and transform/inverse_transform

The doctstring for _randomized_dpca says that it returns P, encoding matrices used to transform data, and D, decoding matrices used to inverse transform data.

However, the actual implementation of transform(X) uses D and the implementation of inverse_transform uses P.

How to get encoder, decoder matrices

Hi,

How do we get the encoder and decoder matrices ?

Thank you

Two questions of analysing the olfactory task in your elife paper

Hello,

I just read your elife paper and your Matlab code. My understanding is that dPCA needs every neuron is recorded in all trials, but as I know, the Olfactory categorization task of Kepecs and his colleagues in 2008 used electrodes which could advance every day. In that way, each time the electrodes advanced, they recorded different neurons. So I think the analysis should only limit in every single day and repeat the analysis for every day, am I correct?
The second question is, I don't understand the way of "re-stretching", I know the first and second step, set the Ti and deltaT, but then, how it stretches?

Thank you very much.

Matlab: dpca_plot_default does not work for data with ndims < 4

I know the intention is for the user to write their own plotting function, but I came across a small fix that could make dpca_plot_default more generalizable. dpca_plot_default assumes the time dimension is in the data's 4th dimension. A simple fix would be to change this line to:

time = 1:size(data,ndims(data));

octave run dpca_demo.m in colab

python version works fine, but don't have Matlab and need to use dSCA (Demixed shared component analysis) which is build on dpca Matlab version.

so I tried install octave and octave-statistics on colab,

add the following command in dpca_demo.m :
'''
pkg load statistics
cd /content/dPCA/matlab
'''
delete minimal plot and computing explained variance function in step one (they will encounter error)

but still get error in :

%% Step 2: PCA in each marginalization separately
dpca_perMarginalization(firingRatesAverage, @dpca_plot_default, ...
'combinedParams', combinedParams);

error: 'containers' undefined near line 86 column 20
error: called from
dpca_marginalize at line 86 column 18
dpca_perMarginalization at line 71 column 18

is it possible to run in octave?

ValueError when filling unbalanced trialX parameter with NaNs as indicated.

My trialX is slightly unbalanced, so I followed the instructions If different combinations of features have different number of trials, then set n_samples to the maximum number of trials and fill unoccupied dat points with NaN., but this results in a ValueError: array must not contain infs or NaNs.

Full traceback:
` File "/home/pietro/pythonprojects/starecase/DemixedPCA/my_dPCA.py", line 113, in
significance_masks = dpca.significance_analysis(trial_average_data,single_trial_data,axis='t',n_shuffles=10,n_splits=10,n_consecutive=10)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 864, in significance_analysis true_score = compute_mean_score(X,trialX,n_splits)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 821, in compute_mean_score trainZ = self.fit_transform(trainX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 168, in fit_transform
self._fit(X,trialX=trialX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 570, in _fit
self.P, self.D = self._randomized_dpca(regX,regmXs,pinvX=pregX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 472, in _randomized_dpca
U,s,V = randomized_svd(np.dot(C,rX),n_components=self.n_components,n_iter=self.n_iter,random_state=np.random.randint(10e5))

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 364, in randomized_svd
power_iteration_normalizer, random_state)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 266, in randomized_range_finder
Q, _ = linalg.qr(safe_sparse_dot(A, Q), mode='economic')

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/scipy/linalg/decomp_qr.py", line 126, in qr
a1 = numpy.asarray_chkfinite(a)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/numpy/lib/function_base.py", line 1215, in asarray_chkfinite
"array must not contain infs or NaNs")

ValueError: array must not contain infs or NaNs`

Problems understanding how to generate firingRates array (MATLAB)

Dear Dmitry,

thanks for the code and the elife paper on dPCA. I am trying to get started with it for my data set, however I struggle in understanding how to generate the firingRates matrix (despite looking through the demo)

Lets say I have:

N= 30 neurons;
T= A trial length of 60 data points
maxTrialNum= 135.

So e.g. this is represented as a 3D 30x60x135 matrix already in my data structure.

And then I have two different conditions (S) and two different Decisions (D) in my behaviour which are represented as two different 1D 135x1 arrays (with 0 and 1 for right/left stimuli/decisions).

I struggle wrapping my head around how to produce from this your

firingRates: N x S x D x T x maxTrialNum

data structure to get started with the dPCA (should it end up being a 30x2x2x60x135 array?). I guess my head stops working at 3 dimensions.
Anyhow thanks a lot for the tool and have a nice weekend,
Eduardo

Bug in fit_transform when using dpca.join method

Hi!

I was trying to use the dPCA code to run a DPCA fit on some data, [ N x T x S ] where N is neurons, T is time, S is stimulus conditions. I ran the analysis previously in MATLAB and combined parameters related to stimulus/stimulus-time interactions using:
combinedParams = { { 1, [1 2] }, {2} };
To do the same in the Python code, I followed the example in the dPCA.py code docstring for the dPCA function to set:

dpca = dPCA.dPCA( labels = 'st' )
dpca.join = { 'st': [ 's', 'st' ] }
dpca.protect = ['t']

When I tried to run fit_transform I ran into an error:

    Z = dpca.fit_transform( mean_data_for_dpca )
  File "/snel/home/lwimala/bin/dPCA/python/dPCA/dPCA.py", line 170, in fit_transform
    return self.transform(X)
  File "/snel/home/lwimala/bin/dPCA/python/dPCA/dPCA.py", line 963, in transform
    X_transformed[key] = np.dot(self.D[key].T, X.reshape((X.shape[0],-1))).reshape((self.D[key].shape[1],) + X.shape[1:])
KeyError: 's'

I looked into the error a little bit and figured out that the issue had to do with the self.marginalizations not being updated to join the combined parameters during the _marginalize function. To fix the bug, I hacked together this quick fix that added handling of the marginalizations within the _marginalize function (I used code from the get_parameter_combinations function to handle updating the marignalizations properly).

Added to the end of _marginalize in dPCA.py (Line 318):

# recompute self.marginalizations (needed if performing regularization optimization)    
self.marginalizations = self._get_parameter_combinations(join=False)                    
                                                                                               
# handle updating marginalization names if join is passed (taken from 'get_parameter_combinations' function)                             
if isinstance(self.join,dict):                                                          
    for key, combs in self.join.items():                                                
        tmp = [self.marginalizations[comb] for comb in combs]                                                                                                                        
        for comb in combs:                                                              
            del self.marginalizations[comb]                                                                                                                                           
         self.marginalizations[key] = tmp

This fix enabled me to compute the dPCA transformation for my data and replicate my results that I found using the MATLAB package from your code distribution. Just wanted to raise an issue since this wasn't posted before. Thanks!

Error when executing matlab/dpca_demo

Right after extracting the repository and trying to run dpca_demo.m I ran into

`Elapsed time is 0.266575 seconds.
Iteration #1 out of 2.......................... [1 s]
Iteration #2 out of 2.......................... [1 s]
Repetition # 1 out of 5... [0 s]
Repetition # 2 out of 5... [0 s]
Repetition # 3 out of 5... [0 s]
Repetition # 4 out of 5... [0 s]
Repetition # 5 out of 5... [0 s]
Cell contents reference from a non-cell array object.

Error in dpca_classificationPlot (line 91)
title([options.marginalizationNames{i} ' # ' num2str(i)])

Error in dpca_demo (line 235)
dpca_classificationPlot(accuracy, [], [], [], decodingClasses)`

Merging components before fitting

I am wondering if I can use the code to merge effects and fit on the combined marginalization. Similar to what the paper describes describing a marginalization main effect (sensory) and its interaction with time in a single component.
I have been trying to use the join parameter but have stumbled on a couple of issues. Also I am not sure if the fact that the _marginalize method calls the parameter generator method with join=False by default

Adding figure titles for the dpca_demo.m

It would be great to add titles to all the 10 figures produced in spca_demo.m.

How to get noise component?

I am running a fit on my neural data. I am wondering how can I access the noise component Xnoise as it is described in the paper of the fit?

Getting "Buffer dtype mismatch, expected 'long' but got 'long long'" when doing significance analysis

Hello,

I have been applying dPCA in matlab and now I am trying to do the same in Python (in order to learn). When I am trying to do the significant analysis I gt the following error:

Compute score of shuffled data: 0 / 100 Traceback (most recent call last):

File "", line 1, in
significance_masks = dpca.significance_analysis(Matr_CondT, firingRates, n_shuffles=100, n_splits=10, n_consecutive=10)

File "C:\Users\amengual\anaconda3\lib\site-packages\dPCA\dPCA.py", line 875, in significance_analysis
self.shuffle_labels(trialX)

File "C:\Users\amengual\anaconda3\lib\site-packages\dPCA\dPCA.py", line 735, in shuffle_labels
nan_shuffle.shuffle2D(trialX)

File "dPCA\nan_shuffle.pyx", line 9, in dPCA.nan_shuffle.shuffle2D

ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'

I am sorry in advance, I am still learning the language, and any help on this would be really appreciated.

Thanks a lot in advance

how to use dPCA function

I want to transform data of shape(429945,25)(25 features) to (429945,2) or (429945,3)(2-3 features) using the dPCA function. How to do it? I'm getting confused about your function.
Also that from sklearn.decomposition import dPCA is not working. Please help me to fix the problem.

components_ method in python

Code works great, but it seems that the dPCA.components_ method was never implemented, although I think that dPCA.D contains exactly what components should?

LICENSE file?

I am potentially interested in trying to extend this code for research purposes. This might involve porting some of this code to Julia, since I have been using that for some of my other projects. I also may play around with the MATLAB code a bit (specifically, I'd like to modify the dpca_marginalize function).

Can I recommend that you add a permissive license (e.g. MIT) to this repo so that others can repurpose your code (under the condition that they give proper attribution/recognition to your work)?

‘re-balancing’ or not

I'm a little confused about whether the dpca.m used 're-balancing' or not. I have thought that the dpca.m didn't use 're-balancing'(the default setting is to accept 'balanced' data), while the input of dpca function is "Xfull" with D+1 dimensions, just as 'firingRatesAverage'. According to the paper on eLife(Kobak, 2016) , if you want to 're-balance' the data, replace X with $\tilde{X}$. The 'Xfull' with D+1 dimensions is just the same as $\tilde{X}$(except that $\tilde{X}$ is flattened). So, the dpca.m used 're-balancing' because it accepts $\tilde{X}$ as input rather than X? Moreover, the default setting is to deal with 'unbalanced' data rather than 'balanced' data?
I'm new to dPCA and MATLAB, and sorry about my poor English. Hope I make my question clear. Thanks a lot!

installation using pip

I am trying to install using: pip install dPCA but I get the following error
...
Collecting dPCA
Downloading https://files.pythonhosted.org/packages/b1/e0/6a0b83a5c8f5f23bd0e77d48fe5dc63558c34852d87e5bd1caef91951be9/dPCA-0.1.tar.gz (117kB)
100% |████████████████████████████████| 122kB 1.6MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-2bcojpbi/dPCA/setup.py", line 2, in
from Cython.Build import cythonize
ModuleNotFoundError: No module named 'Cython'

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2bcojpbi/dPCA/

----------------------------------------

If I install the dependencies manually,

Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-tjy39pjk/dPCA/setup.py", line 16, in
ext_module = cythonize("dPCA/nan_shuffle.pyx")
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 966, in cythonize
aliases=aliases)
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 810, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 109, in nonempty
raise ValueError(error_msg)
ValueError: 'dPCA/nan_shuffle.pyx' doesn't match any files

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-tjy39pjk/dPCA/

----------------------------------------

Has anybody solved this? Thanks!

Grouping marginalizations in Python

Is there a straightforward way to replicate the MATLAB implementation's 'combinedParams' behavior using the Python dPCA code? I would like to do a grouping similar to the stimulus, decision, interaction, and time grouping shown in the MATLAB demo.

For an example stimulus-group, would I simply add the s and the st components to get the "first stimulus-group component", and also add the explained variances?

(I know this project is not under maintenance anymore, so I can also use the MATLAB version if that is easier.)

Implementation of explained_variance_ratio_ etc. in Python

It seems like explained_variance_ratio_, mean_, n_components_, noise_variance_ are not assigned/set in the python version of the code at all, although mentioned in the docs.

dpca_plot_default seems to swap stimulus and decision

The order of the decision seems to be reversed when plotting.

One suspects this as making the firing rates for a particular decision be 0 between two time points results in components that show the abrupt jump in value for the opposite decision

To reproduce this, simply add a new line in dpca_demo.m after Line 63 with firingRates(:, 1, 1, 10:15, :) = 0; the resulting plots will show that the decision 2 and stimulus 1 line show a jump between time 10 and 15 (it should have been decision 1 stimulus 1 that shows this behaviour)

TypeError: Wrong type for labels. Please either set labels to the number of variables or provide the axis labels as a single string of characters (like "ts" for time and stimulus)

Hi,

I'm trying to run dPCA on this dataset. I suppose that my problem stems either from the shape of the input array or from my erroneous understanding of labels. The shape of the input array is (127, 2, 2, 6, 6). So I've got 127 average cell firing rates for trials with 4 variables having 2, 2, 6 and 6 levels. Is the shape correct? The labels defined below don't seem to cut it. Thanks a bunch!

from dPCA import dPCA

labels = ['cdfo'] # choice, decision, spatial frequency, orientation
dpca = dPCA.dPCA(labels, 5, regularizer = "auto")
demixed_mouse1 = dpca.fit_transform(dPCADataMouse1T5)

UPD: the same error is thrown when no labels are specified:

dpca = dPCA.dPCA()
demixed_mouse1 = dpca.fit_transform(dPCADataMouse1T5)

Installing dPCA under Python 3.6

I just installed the dPCA python package under Python 3.6 (Anaconda Python distribution). It took quite a bit of time to work out how so I though I would share how I got it to work in case other people have the same problems.

Trying to install using PIP I got an error ValueError: 'dPCA/nan_shuffle.pyx' doesn't match any files

Trying to install using python setup.py install I got an error: error: Unable to find vcvarsall.bat

This was due to Cython not being able to find a C++ compiler. Following instructions here it created a file Anaconda\Lib\distutils\distutils.cfg containing:

[build]
compiler=mingw32

[build_ext]
compiler = mingw32

However I still could not install with either PIP or setup.py, getting an error:

File "C:\Users\takam\Anaconda3\lib\distutils\cygwinccompiler.py", line 126, in
 __init__
    if self.ld_version >= "2.10.90":
TypeError: '>=' not supported between instances of 'NoneType' and 'str'

Using the Microsoft Visual Studio 2015 build tools as the compiler rather than mingw32 fixed this problem. I downloaded and installed the tools from here and edited the file Anaconda\Lib\distutils\distutils.cfg to read:

[build]
compiler=msvc

[build_ext]
compiler = msvc

I could then install dPCA from the setup.py file with: python setup.py install

Problems with array shape when running dPCA in Python

Hi,
I constructed the initial neural activity matrix (dF_dff) size (N, S, D, T, E):
N = number of neurons
S = number of stimuli
D = number of decision
T = time each trial
E = max number of trial per condition

In my case dF_dff shape is (1307, 2, 2, 83, 43), and I take the average on the first dimension (neurons), obtaining dF_dff_average, shape (2, 2, 83, 43).
To run the dPCA, I am doing the following thing:

label = 'tsd'
join = [{'s': ['s', 'ts']}, {'d': ['d', 'td']}, {'t'}, {'sd': ['tsd']}]
dpca = dPCA.dPCA(labels=label, join=join, n_components=2, regularizer='auto')
dpca.protect = ['t']
Z = dpca.fit_transform(dF_dff_average, dF_dff)

But this doesn't work. I get an index error:

IndexError: index 43 is out of bounds for axis 3 with size 43

Maybe I am doing something wrong. Hopefully you can help me figuring out what's the problem
Thank you

Number of marginalizations seems off

So I am implementing this on a Neurons * Time point * Category matrix. Below is some description of my inputs:

  % size(binnedDataStack) = [276    44     8];
  margNames = {'Time', 'Category'};
  combinedParams = {{1, [1 2]}, {2, [2 1]}};

Running the following...
[W, V, whichMarg] = dpca(binnedDataStack, 15);

whichMarg has 3's present, despite me only defining 2 marginalizations. This leads to errors when a 3rd marginalization is being looked for in margNames. Am I using this correctly, and if so, what should I change?

unable to install dPCA on Mac

The new solution gives this error"[Errno 2] No such file or directory:
'/content/dPCA/python/'/Users/addison/Desktop/Online_Course/Neuromatch/Users/addison/Desktop/Online_Course/Neuromatch python: can't open file '/content/dPCA/python/setup.py': [Errno 2] No such file or directory"

permute() in dpca_plot_default.m

In line 121 of dpca_plot_default.m:

data = permute(data, [dims(end) dims(1:end-1)]);

Should I think be:

data = permute(data, [numel(dims) 1:numel(dims)-1]);

Python significance analysis for multiple task parameters

When computing significance, we found that the python code to test significance did not generalise to cases where a second task parameter (such as decision) was added. I have created a notebook that replicates this error for random data, as in your demo. See:

https://github.com/vdplasthijs/dPCA/blob/master/dPCA_significance_test.ipynb

Here, one can either set D=1 or D=2, the latter resulting in the error when dpca.significance_analysis() is called.

While (I think) I was able to resolve the initial error (please see just added pull request), I still think additional changes are needed, because although no errors are raised, the decision component is (unexpectedly) not deemed significant (see final fig of notebook for D=2, when using the updated dPCA code).

Thank you!

machenslab / dpca Goto Github PK

dpca's Introduction

demixed Principal Component Analysis (dPCA)

Use dPCA

Python package

MATLAB package

Support

Contributors

dpca's People

Contributors

Stargazers

Watchers

Forkers

dpca's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs