Hello, thanks for making this code available! I'm working to replicate your distance n

Question about cross-validated correlation distance about megmvpa HOT 3 OPEN

haleyk commented on July 20, 2024

Question about cross-validated correlation distance

from megmvpa.

Comments (3)

m-guggenmos commented on July 20, 2024

Hi Haley, thanks for your interest and your question. I haven't worked on this since this paper, but I remember being very unhappy about the way in which we circumvented the case of a negative covariance. Essentially it is a heuristic that assumes that the cross-validated denominator is unlikely to be below 1/4 of the non-cross-validated denominator. The number 1/4 is arbitrary, which is unsatisfying.

I wonder whether an easy solution would be to compute the coefficient of determination (r²) instead of the Pearson correlation. This would avoid taking the square root in the denominator, while the maximum still would be 1. What do you think?

from megmvpa.

haleyk commented on July 20, 2024

Thanks for the quick response! I've been thinking about the r² idea and my only concern is that if both patterns have a negative train/test covariance, that could mask itself as a large positive denominator and fall back into the range of values with a positive train/test covariance without an easy way to distinguish them. Other papers I've seen published just use the training set variance as a denominator and your solution definitely captures another important piece of information which the variance solution neglects, so I'm going to keep thinking about how to apply this. One thing which might give slightly more information is the product of the train and test variance for each pattern, rather than just using train, but it still misses information about how well the train and test set match each other.

I haven't totally thought this through but I wonder if there would be a way to get an average 'coefficient of covariance' for a given train and test set split using the diagonal elements of the matrix or simply the average covariance of each single pattern across the train/test set? And then weighting each train/test set fold for the overall average RDM by its stability there. This would probably depend on what type of cross validation folds one uses, and would still result in negative distances, so it might be difficult to interpret. But could incorporate the overall train/test set covariance without needing it to be in each individual pair of patterns/ part of the denominator in the correlation.

from megmvpa.

m-guggenmos commented on July 20, 2024

You're absolutely right, the r² idea doesn't really make sense. I think one key point is that the pattern (co)variances in the denominator should not be negative, because by definition, they should be scaling factors that normalize the covariance in the nominator. A negative scaling factor just doesn't fulfill this rationale and will lead to uninterpretable results.

By the product of the train and test variance you mean that instead of $\mathrm{cov}(\mathbf{x_{[A]}}, \mathbf{x_{[B]}})$ in the denomiator we would have $\sqrt{\mathrm{var}(\mathbf{x_{[A]}})\mathrm{var}(\mathbf{x_{[B]}})}$ (cf. formula 7 in the paper)? That might also be an idea.

I was not sure whether I got your ideas in the second paragraph:

average 'coefficient of covariance' for a given train and test set split using the diagonal elements of the matrix

Average across what?

the average covariance of each single pattern across the train/test set

Would this not be identical to the variance of the pattern in the full dataset (assuming that train+test=full dataset)?

I remember that when working on the paper I was convinced that the scaling factors in the denominator must be covariances between train & test, for it to be a truly cross-validated measure. But I'm not so sure anymore though and I don't remember why I thought that. Maybe one of the solutions (e.g. using the variance in training) or your product solution would still consitute a cross-validated measure? I think this can be tested though with simulated data in which there is sufficient data in training and test, such that negative scaling factors in the denominator do not occur. Whatever fix one proposed for the empirical negative covariance issue, it should not lead to systematically different results on these simulated data.

from megmvpa.

Question about cross-validated correlation distance about megmvpa HOT 3 OPEN

Comments (3)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs