GithubHelp home page GithubHelp logo

machenslab / dpca Goto Github PK

View Code? Open in Web Editor NEW
264.0 26.0 90.0 1.55 MB

An implementation of demixed Principal Component Analysis (a supervised linear dimensionality reduction technique)

License: MIT License

MATLAB 22.23% Python 9.33% Jupyter Notebook 68.45%
dimensionality-reduction python matlab

dpca's Introduction

demixed Principal Component Analysis (dPCA)

dPCA is a linear dimensionality reduction technique that automatically discovers and highlights the essential features of complex population activities. The population activity is decomposed into a few demixed components that capture most of the variance in the data and that highlight the dynamic tuning of the population to various task parameters, such as stimuli, decisions, rewards, etc.

D Kobak+, W Brendel+, C Constantinidis, CE Feierstein, A Kepecs, ZF Mainen, X-L Qi, R Romo, N Uchida, CK Machens
Demixed principal component analysis of neural population data
eLife 2016, https://elifesciences.org/content/5/e10989
(arXiv link: http://arxiv.org/abs/1410.6031)

This repository provides easy to use Python and MATLAB implementations of dPCA as well as example code.

Use dPCA

Simple example code for surrogate data can be found in dpca_demo.ipynb and dpca_demo.m.

Python package

The Python package is tested against Python 2.7 and Python 3.4. To install, first make sure that numpy, cython, scipy, sklearn, itertools and numexpr are avaible. Then copy the files from the Python subfolder to a location in the Python search path.

Alternatively, from the terminal you can install the package by running:

$  cd /path/to/dPCA/python
$  python setup.py install

API of dPCA is similar to sklearn. To use dPCA, you should first import dPCA,
from dpca import dPCA
then initialize it,
dpca = dPCA(labels, n_components, regularizer)
then call the fitting function on your data to get the latent components Z,
Z = dpca.fit_transform(X).

The required initialization parameters are:

  • X - A multidimensional array containing the trial-averaged data. E.g. X[n,t,s,d] could correspond to the mean response of the n-th neuron at time t in trials with stimulus s and decision d. The observable (e.g. neuron index) needs to come first.
  • labels - Optional; list of characters with which to describe the parameter axes, e.g. 'tsd' to denote time, stimulus and decision axis. All marginalizations (e.g. time-stimulus) are refered to by subsets of those characters (e.g. 'ts').
  • n_components - Dictionary or integer; if integer use the same number of components in each marginalization, otherwise every (key,value) pair refers to the number of components (value) in a marginalization (key).

More detailed documentation, and additional options, can be found in dpca.py.

MATLAB package

Add the Matlab subfolder to the Matlab search path.

Example code in dpca_demo.m generates surrogate data and provides a walkthrough for running PCA and dPCA analysis and plotting the results.

Support

Email [email protected] (Python) or [email protected] (Matlab) with any questions.

Contributors

A big thanks for 3rd party contributions goes to cboulay.

dpca's People

Contributors

ahwillia avatar cboulay avatar charlesbmi avatar dkobak avatar wielandbrendel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dpca's Issues

Unable to find vcvarsall.bat

I've tried installing dPCA on two Windows 10 x64 computers and run into the same error message when building 'dPCA.nan_shuffle' extension.

Python version is 2.7.14 64-bit. It is an Anaconda distribution.

I tried installing using python setup.py install after confirming that all the required packages have been installed.

Small typo in python docs for `join` keyword?

Apologies if this is a misunderstanding on my part, but I think the example is slightly wrong here?

    join : None or dict
        ...
        e.g. if we are only interested in the time-modulated stimulus components.
        In this case, we would pass {'ts' : ['t','ts']}.

Shouldn't we pass {'ts' : ['s','ts']} instead?

Two questions of analysing the olfactory task in your elife paper

Hello,

I just read your elife paper and your Matlab code. My understanding is that dPCA needs every neuron is recorded in all trials, but as I know, the Olfactory categorization task of Kepecs and his colleagues in 2008 used electrodes which could advance every day. In that way, each time the electrodes advanced, they recorded different neurons. So I think the analysis should only limit in every single day and repeat the analysis for every day, am I correct?
The second question is, I don't understand the way of "re-stretching", I know the first and second step, set the Ti and deltaT, but then, how it stretches?

Thank you very much.

octave run dpca_demo.m in colab

python version works fine, but don't have Matlab and need to use dSCA (Demixed shared component analysis) which is build on dpca Matlab version.

so I tried install octave and octave-statistics on colab,

add the following command in dpca_demo.m :
'''
pkg load statistics
cd /content/dPCA/matlab
'''
delete minimal plot and computing explained variance function in step one (they will encounter error)

but still get error in :

%% Step 2: PCA in each marginalization separately
dpca_perMarginalization(firingRatesAverage, @dpca_plot_default, ...
'combinedParams', combinedParams);

error: 'containers' undefined near line 86 column 20
error: called from
dpca_marginalize at line 86 column 18
dpca_perMarginalization at line 71 column 18

is it possible to run in octave?

ValueError when filling unbalanced trialX parameter with NaNs as indicated.

My trialX is slightly unbalanced, so I followed the instructions If different combinations of features have different number of trials, then set n_samples to the maximum number of trials and fill unoccupied dat points with NaN., but this results in a ValueError: array must not contain infs or NaNs.

Full traceback:
` File "/home/pietro/pythonprojects/starecase/DemixedPCA/my_dPCA.py", line 113, in
significance_masks = dpca.significance_analysis(trial_average_data,single_trial_data,axis='t',n_shuffles=10,n_splits=10,n_consecutive=10)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 864, in significance_analysis true_score = compute_mean_score(X,trialX,n_splits)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 821, in compute_mean_score trainZ = self.fit_transform(trainX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 168, in fit_transform
self._fit(X,trialX=trialX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 570, in _fit
self.P, self.D = self._randomized_dpca(regX,regmXs,pinvX=pregX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 472, in _randomized_dpca
U,s,V = randomized_svd(np.dot(C,rX),n_components=self.n_components,n_iter=self.n_iter,random_state=np.random.randint(10e5))

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 364, in randomized_svd
power_iteration_normalizer, random_state)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 266, in randomized_range_finder
Q, _ = linalg.qr(safe_sparse_dot(A, Q), mode='economic')

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/scipy/linalg/decomp_qr.py", line 126, in qr
a1 = numpy.asarray_chkfinite(a)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/numpy/lib/function_base.py", line 1215, in asarray_chkfinite
"array must not contain infs or NaNs")

ValueError: array must not contain infs or NaNs`

Problems understanding how to generate firingRates array (MATLAB)

Dear Dmitry,

thanks for the code and the elife paper on dPCA. I am trying to get started with it for my data set, however I struggle in understanding how to generate the firingRates matrix (despite looking through the demo)

Lets say I have:

N= 30 neurons;
T= A trial length of 60 data points
maxTrialNum= 135.

So e.g. this is represented as a 3D 30x60x135 matrix already in my data structure.

And then I have two different conditions (S) and two different Decisions (D) in my behaviour which are represented as two different 1D 135x1 arrays (with 0 and 1 for right/left stimuli/decisions).

I struggle wrapping my head around how to produce from this your

firingRates: N x S x D x T x maxTrialNum

data structure to get started with the dPCA (should it end up being a 30x2x2x60x135 array?). I guess my head stops working at 3 dimensions.
Anyhow thanks a lot for the tool and have a nice weekend,
Eduardo

Bug in fit_transform when using dpca.join method

Hi!

I was trying to use the dPCA code to run a DPCA fit on some data, [ N x T x S ] where N is neurons, T is time, S is stimulus conditions. I ran the analysis previously in MATLAB and combined parameters related to stimulus/stimulus-time interactions using:
combinedParams = { { 1, [1 2] }, {2} };
To do the same in the Python code, I followed the example in the dPCA.py code docstring for the dPCA function to set:

dpca = dPCA.dPCA( labels = 'st' )
dpca.join = { 'st': [ 's', 'st' ] }
dpca.protect = ['t']

When I tried to run fit_transform I ran into an error:

    Z = dpca.fit_transform( mean_data_for_dpca )
  File "/snel/home/lwimala/bin/dPCA/python/dPCA/dPCA.py", line 170, in fit_transform
    return self.transform(X)
  File "/snel/home/lwimala/bin/dPCA/python/dPCA/dPCA.py", line 963, in transform
    X_transformed[key] = np.dot(self.D[key].T, X.reshape((X.shape[0],-1))).reshape((self.D[key].shape[1],) + X.shape[1:])
KeyError: 's'

I looked into the error a little bit and figured out that the issue had to do with the self.marginalizations not being updated to join the combined parameters during the _marginalize function. To fix the bug, I hacked together this quick fix that added handling of the marginalizations within the _marginalize function (I used code from the get_parameter_combinations function to handle updating the marignalizations properly).

Added to the end of _marginalize in dPCA.py (Line 318):

# recompute self.marginalizations (needed if performing regularization optimization)    
self.marginalizations = self._get_parameter_combinations(join=False)                    
                                                                                               
# handle updating marginalization names if join is passed (taken from 'get_parameter_combinations' function)                             
if isinstance(self.join,dict):                                                          
    for key, combs in self.join.items():                                                
        tmp = [self.marginalizations[comb] for comb in combs]                                                                                                                        
        for comb in combs:                                                              
            del self.marginalizations[comb]                                                                                                                                           
         self.marginalizations[key] = tmp     

This fix enabled me to compute the dPCA transformation for my data and replicate my results that I found using the MATLAB package from your code distribution. Just wanted to raise an issue since this wasn't posted before. Thanks!

Error when executing matlab/dpca_demo

Right after extracting the repository and trying to run dpca_demo.m I ran into

`Elapsed time is 0.266575 seconds.
Iteration #1 out of 2.......................... [1 s]
Iteration #2 out of 2.......................... [1 s]
Repetition # 1 out of 5... [0 s]
Repetition # 2 out of 5... [0 s]
Repetition # 3 out of 5... [0 s]
Repetition # 4 out of 5... [0 s]
Repetition # 5 out of 5... [0 s]
Cell contents reference from a non-cell array object.

Error in dpca_classificationPlot (line 91)
title([options.marginalizationNames{i} ' # ' num2str(i)])

Error in dpca_demo (line 235)
dpca_classificationPlot(accuracy, [], [], [], decodingClasses)`

Merging components before fitting

I am wondering if I can use the code to merge effects and fit on the combined marginalization. Similar to what the paper describes describing a marginalization main effect (sensory) and its interaction with time in a single component.
I have been trying to use the join parameter but have stumbled on a couple of issues. Also I am not sure if the fact that the _marginalize method calls the parameter generator method with join=False by default

How to get noise component?

I am running a fit on my neural data. I am wondering how can I access the noise component Xnoise as it is described in the paper of the fit?

Getting "Buffer dtype mismatch, expected 'long' but got 'long long'" when doing significance analysis

Hello,

I have been applying dPCA in matlab and now I am trying to do the same in Python (in order to learn). When I am trying to do the significant analysis I gt the following error:

Compute score of shuffled data: 0 / 100 Traceback (most recent call last):

File "", line 1, in
significance_masks = dpca.significance_analysis(Matr_CondT, firingRates, n_shuffles=100, n_splits=10, n_consecutive=10)

File "C:\Users\amengual\anaconda3\lib\site-packages\dPCA\dPCA.py", line 875, in significance_analysis
self.shuffle_labels(trialX)

File "C:\Users\amengual\anaconda3\lib\site-packages\dPCA\dPCA.py", line 735, in shuffle_labels
nan_shuffle.shuffle2D(trialX)

File "dPCA\nan_shuffle.pyx", line 9, in dPCA.nan_shuffle.shuffle2D

ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'

I am sorry in advance, I am still learning the language, and any help on this would be really appreciated.

Thanks a lot in advance

how to use dPCA function

I want to transform data of shape(429945,25)(25 features) to (429945,2) or (429945,3)(2-3 features) using the dPCA function. How to do it? I'm getting confused about your function.
Also that from sklearn.decomposition import dPCA is not working. Please help me to fix the problem.
Screenshot 2020-07-11 at 12 35 48 PM
Screenshot 2020-07-11 at 12 37 02 PM

components_ method in python

Code works great, but it seems that the dPCA.components_ method was never implemented, although I think that dPCA.D contains exactly what components should?

LICENSE file?

I am potentially interested in trying to extend this code for research purposes. This might involve porting some of this code to Julia, since I have been using that for some of my other projects. I also may play around with the MATLAB code a bit (specifically, I'd like to modify the dpca_marginalize function).

Can I recommend that you add a permissive license (e.g. MIT) to this repo so that others can repurpose your code (under the condition that they give proper attribution/recognition to your work)?

‘re-balancing’ or not

I'm a little confused about whether the dpca.m used 're-balancing' or not. I have thought that the dpca.m didn't use 're-balancing'(the default setting is to accept 'balanced' data), while the input of dpca function is "Xfull" with D+1 dimensions, just as 'firingRatesAverage'. According to the paper on eLife(Kobak, 2016) , if you want to 're-balance' the data, replace X with $\tilde{X}$. The 'Xfull' with D+1 dimensions is just the same as $\tilde{X}$(except that $\tilde{X}$ is flattened). So, the dpca.m used 're-balancing' because it accepts $\tilde{X}$ as input rather than X? Moreover, the default setting is to deal with 'unbalanced' data rather than 'balanced' data?
I'm new to dPCA and MATLAB, and sorry about my poor English. Hope I make my question clear. Thanks a lot!

installation using pip

I am trying to install using: pip install dPCA but I get the following error
...
Collecting dPCA
Downloading https://files.pythonhosted.org/packages/b1/e0/6a0b83a5c8f5f23bd0e77d48fe5dc63558c34852d87e5bd1caef91951be9/dPCA-0.1.tar.gz (117kB)
100% |████████████████████████████████| 122kB 1.6MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-2bcojpbi/dPCA/setup.py", line 2, in
from Cython.Build import cythonize
ModuleNotFoundError: No module named 'Cython'

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2bcojpbi/dPCA/

----------------------------------------

If I install the dependencies manually,

Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-tjy39pjk/dPCA/setup.py", line 16, in
ext_module = cythonize("dPCA/nan_shuffle.pyx")
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 966, in cythonize
aliases=aliases)
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 810, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 109, in nonempty
raise ValueError(error_msg)
ValueError: 'dPCA/nan_shuffle.pyx' doesn't match any files

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-tjy39pjk/dPCA/

----------------------------------------

Has anybody solved this? Thanks!

Grouping marginalizations in Python

Is there a straightforward way to replicate the MATLAB implementation's 'combinedParams' behavior using the Python dPCA code? I would like to do a grouping similar to the stimulus, decision, interaction, and time grouping shown in the MATLAB demo.

For an example stimulus-group, would I simply add the s and the st components to get the "first stimulus-group component", and also add the explained variances?

(I know this project is not under maintenance anymore, so I can also use the MATLAB version if that is easier.)

dpca_plot_default seems to swap stimulus and decision

The order of the decision seems to be reversed when plotting.

One suspects this as making the firing rates for a particular decision be 0 between two time points results in components that show the abrupt jump in value for the opposite decision

To reproduce this, simply add a new line in dpca_demo.m after Line 63 with firingRates(:, 1, 1, 10:15, :) = 0; the resulting plots will show that the decision 2 and stimulus 1 line show a jump between time 10 and 15 (it should have been decision 1 stimulus 1 that shows this behaviour)

TypeError: Wrong type for labels. Please either set labels to the number of variables or provide the axis labels as a single string of characters (like "ts" for time and stimulus)

Hi,

I'm trying to run dPCA on this dataset. I suppose that my problem stems either from the shape of the input array or from my erroneous understanding of labels. The shape of the input array is (127, 2, 2, 6, 6). So I've got 127 average cell firing rates for trials with 4 variables having 2, 2, 6 and 6 levels. Is the shape correct? The labels defined below don't seem to cut it. Thanks a bunch!

from dPCA import dPCA

labels = ['cdfo'] # choice, decision, spatial frequency, orientation
dpca = dPCA.dPCA(labels, 5, regularizer = "auto")
demixed_mouse1 = dpca.fit_transform(dPCADataMouse1T5)

UPD: the same error is thrown when no labels are specified:

dpca = dPCA.dPCA()
demixed_mouse1 = dpca.fit_transform(dPCADataMouse1T5)

Installing dPCA under Python 3.6

I just installed the dPCA python package under Python 3.6 (Anaconda Python distribution). It took quite a bit of time to work out how so I though I would share how I got it to work in case other people have the same problems.

Trying to install using PIP I got an error ValueError: 'dPCA/nan_shuffle.pyx' doesn't match any files

Trying to install using python setup.py install I got an error: error: Unable to find vcvarsall.bat

This was due to Cython not being able to find a C++ compiler. Following instructions here it created a file Anaconda\Lib\distutils\distutils.cfg containing:

[build]
compiler=mingw32

[build_ext]
compiler = mingw32 

However I still could not install with either PIP or setup.py, getting an error:

File "C:\Users\takam\Anaconda3\lib\distutils\cygwinccompiler.py", line 126, in
 __init__
    if self.ld_version >= "2.10.90":
TypeError: '>=' not supported between instances of 'NoneType' and 'str'

Using the Microsoft Visual Studio 2015 build tools as the compiler rather than mingw32 fixed this problem. I downloaded and installed the tools from here and edited the file Anaconda\Lib\distutils\distutils.cfg to read:

[build]
compiler=msvc

[build_ext]
compiler = msvc

I could then install dPCA from the setup.py file with: python setup.py install

Problems with array shape when running dPCA in Python

Hi,
I constructed the initial neural activity matrix (dF_dff) size (N, S, D, T, E):
N = number of neurons
S = number of stimuli
D = number of decision
T = time each trial
E = max number of trial per condition

In my case dF_dff shape is (1307, 2, 2, 83, 43), and I take the average on the first dimension (neurons), obtaining dF_dff_average, shape (2, 2, 83, 43).
To run the dPCA, I am doing the following thing:

label = 'tsd'
join = [{'s': ['s', 'ts']}, {'d': ['d', 'td']}, {'t'}, {'sd': ['tsd']}]
dpca = dPCA.dPCA(labels=label, join=join, n_components=2, regularizer='auto')
dpca.protect = ['t']
Z = dpca.fit_transform(dF_dff_average, dF_dff)

But this doesn't work. I get an index error:

IndexError: index 43 is out of bounds for axis 3 with size 43

Maybe I am doing something wrong. Hopefully you can help me figuring out what's the problem
Thank you

Number of marginalizations seems off

So I am implementing this on a Neurons * Time point * Category matrix. Below is some description of my inputs:

  % size(binnedDataStack) = [276    44     8];
  margNames = {'Time', 'Category'};
  combinedParams = {{1, [1 2]}, {2, [2 1]}};

Running the following...
[W, V, whichMarg] = dpca(binnedDataStack, 15);

whichMarg has 3's present, despite me only defining 2 marginalizations. This leads to errors when a 3rd marginalization is being looked for in margNames. Am I using this correctly, and if so, what should I change?

unable to install dPCA on Mac

The new solution gives this error"[Errno 2] No such file or directory:
'/content/dPCA/python/'/Users/addison/Desktop/Online_Course/Neuromatch/Users/addison/Desktop/Online_Course/Neuromatch python: can't open file '/content/dPCA/python/setup.py': [Errno 2] No such file or directory"

permute() in dpca_plot_default.m

In line 121 of dpca_plot_default.m:

data = permute(data, [dims(end) dims(1:end-1)]);

Should I think be:

data = permute(data, [numel(dims) 1:numel(dims)-1]);

Python significance analysis for multiple task parameters

When computing significance, we found that the python code to test significance did not generalise to cases where a second task parameter (such as decision) was added. I have created a notebook that replicates this error for random data, as in your demo. See:

https://github.com/vdplasthijs/dPCA/blob/master/dPCA_significance_test.ipynb

Here, one can either set D=1 or D=2, the latter resulting in the error when dpca.significance_analysis() is called.

While (I think) I was able to resolve the initial error (please see just added pull request), I still think additional changes are needed, because although no errors are raised, the decision component is (unexpectedly) not deemed significant (see final fig of notebook for D=2, when using the updated dPCA code).

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.