GithubHelp home page GithubHelp logo

ddof parameter in FData.cov about scikit-fda HOT 7 CLOSED

pcuestas avatar pcuestas commented on September 26, 2024
ddof parameter in FData.cov

from scikit-fda.

Comments (7)

vnmabus avatar vnmabus commented on September 26, 2024 1

Note that this parameter should be also added to var.

from scikit-fda.

pcuestas avatar pcuestas commented on September 26, 2024

I understand that ddof should also be 1 by default in var. This is going to change the default behavior of var, as it now normalizes by N.

from scikit-fda.

vnmabus avatar vnmabus commented on September 26, 2024

That is right. I do not think that changing it will cause problems, as it is not a very used function.

from scikit-fda.

pcuestas avatar pcuestas commented on September 26, 2024

Some tests are failing because of the modification in var.

================================== short test summary info ===================================
FAILED skfda/ml/clustering/_kmeans.py::skfda.ml.clustering._kmeans.FuzzyCMeans
FAILED skfda/tests/test_neighbors.py::TestNeighbors::test_score_functional_response - AssertionError: 
FAILED skfda/tests/test_scoring.py::TestScoreFunctionsGrid::test_all - AssertionError: 
FAILED skfda/tests/test_scoring.py::TestScoreFunctionGridBasis::test_all - AssertionError: 0.9859361757845293 != 0.992968013919044 within 2 places (0.00703183813451...
FAILED skfda/tests/test_scoring.py::TestScoreZeroDenominator::test_zero_r2 - AssertionError: 

I am going to write ddof=0 in the calls to var that appear in the code:

This solves the issues with the tests.

from scikit-fda.

vnmabus avatar vnmabus commented on September 26, 2024

Do not "just" write ddof=0. It would be best to analyze the intended denominator in each place.

from scikit-fda.

pcuestas avatar pcuestas commented on September 26, 2024

Sorry, I thought those functions were already intentionally using $1/N$ as the variance normalization.

First case

A function _var is defined:

def _var(
x: FDataGrid,
weights: NDArrayFloat | None = None,
) -> FDataGrid:
from ..exploratory.stats import mean, var
if weights is None:
return var(x)
return mean( # type: ignore[no-any-return]
np.power(x - mean(x, weights=weights), 2),
weights=weights,
)
and it is only used in two places.

Here:

num = _var(y_true - y_pred, weights=sample_weight)
den = _var(y_true, weights=sample_weight)
# Divisions by zero allowed
with np.errstate(divide='ignore', invalid='ignore'):
score = 1 - num / den
where the normalization coefficient is not relevant, due to the division in line 250.

And here:

ss_res = mean(
np.power(y_true - y_pred, 2),
weights=sample_weight,
)
ss_tot = _var(y_true, weights=sample_weight)
# Divisions by zero allowed
with np.errstate(divide='ignore', invalid='ignore'):
score: FDataGrid = 1 - ss_res / ss_tot
where the stats.var function is called iff sample_weight=None, in which case the variable ss_res is also normalized by N by mean, so I think ddof should be 0 in this first case.

Second case (
variance = fdata.var()
)

Here the variance is used only to calculate the tolerance used by the K-Means algorithm to check for convergence:

def _tolerance(self, fdata: Input) -> float:
variance = fdata.var()
mean_variance = np.mean(variance[0].data_matrix)
return float(mean_variance * self.tol)

I believe that ddof=0 can be acceptable here but I do not have any arguments against ddof=1 other than the fact that the tests were built taking into account the previous definition of tolerance that used the variance with ddof=0.

from scikit-fda.

vnmabus avatar vnmabus commented on September 26, 2024

I think that your analysis is correct.

from scikit-fda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.