Comments (2)
From the definition, its actually going into the if condition at last, to avoid division by zero:
def matthews_corrcoef(y_true, y_pred, *, sample_weight=None):
"""Compute the Matthews correlation coefficient (MCC).
The Matthews correlation coefficient is used in machine learning as a
measure of the quality of binary and multiclass classifications. It takes
into account true and false positives and negatives and is generally
regarded as a balanced measure which can be used even if the classes are of
very different sizes. The MCC is in essence a correlation coefficient value
between -1 and +1. A coefficient of +1 represents a perfect prediction, 0
an average random prediction and -1 an inverse prediction. The statistic
is also known as the phi coefficient. [source: Wikipedia]
Binary and multiclass labels are supported. Only in the binary case does
this relate to information about true and false positives and negatives.
See references below.
Read more in the :ref:`User Guide <matthews_corrcoef>`.
Parameters
----------
y_true : array, shape = [n_samples]
Ground truth (correct) target values.
y_pred : array, shape = [n_samples]
Estimated targets as returned by a classifier.
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
.. versionadded:: 0.18
Returns
-------
mcc : float
The Matthews correlation coefficient (+1 represents a perfect
prediction, 0 an average random prediction and -1 and inverse
prediction).
References
----------
.. [1] :doi:`Baldi, Brunak, Chauvin, Andersen and Nielsen, (2000). Assessing the
accuracy of prediction algorithms for classification: an overview.
<10.1093/bioinformatics/16.5.412>`
.. [2] `Wikipedia entry for the Matthews Correlation Coefficient
<https://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_.
.. [3] `Gorodkin, (2004). Comparing two K-category assignments by a
K-category correlation coefficient
<https://www.sciencedirect.com/science/article/pii/S1476927104000799>`_.
.. [4] `Jurman, Riccadonna, Furlanello, (2012). A Comparison of MCC and CEN
Error Measures in MultiClass Prediction
<https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0041882>`_.
Examples
--------
>>> from sklearn.metrics import matthews_corrcoef
>>> y_true = [+1, +1, +1, -1]
>>> y_pred = [+1, -1, +1, +1]
>>> matthews_corrcoef(y_true, y_pred)
-0.33...
"""
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
check_consistent_length(y_true, y_pred, sample_weight)
if y_type not in {"binary", "multiclass"}:
raise ValueError("%s is not supported" % y_type)
lb = LabelEncoder()
lb.fit(np.hstack([y_true, y_pred]))
y_true = lb.transform(y_true)
y_pred = lb.transform(y_pred)
C = confusion_matrix(y_true, y_pred, sample_weight=sample_weight)
t_sum = C.sum(axis=1, dtype=np.float64)
p_sum = C.sum(axis=0, dtype=np.float64)
n_correct = np.trace(C, dtype=np.float64)
n_samples = p_sum.sum()
cov_ytyp = n_correct * n_samples - np.dot(t_sum, p_sum)
cov_ypyp = n_samples**2 - np.dot(p_sum, p_sum)
cov_ytyt = n_samples**2 - np.dot(t_sum, t_sum)
if cov_ypyp * cov_ytyt == 0:
return 0.0
else:
return cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
from scikit-learn.
Basically the solution is to provide a zero_division
parameter to have a better handling here because this is a corner case: #28509
from scikit-learn.
Related Issues (20)
- Don't refit in FixedThresholdClassifier when original model is already trained. HOT 2
- GridSearchCV.score: support multiple scoring metrics HOT 3
- Broken link at the 1.5.0 release page HOT 1
- Sphinx search summary disappeared from 1.5 website HOT 11
- `GridSearchCV` with custom estimator and nested Parameter Grids raises `ValueError` in scikit-learn 1.5.0 HOT 5
- Fix version warning banner on the stable documentation page HOT 6
- Samples with nan distance are included in the computation of mean in `KNNImputer` for uniform weights HOT 1
- `sklearn.neighbors.NearestNeighbors` allow processing nan values
- DEP loss_function_ attribute in PassiveAggressiveClassifier HOT 2
- RFC Future of HalvingGridSearchCV HOT 3
- Deprecate copy in Birch HOT 1
- Add More Robustness Tests to the California Housing Dataset HOT 2
- Deprecate copy_X in TheilSenRegressor HOT 1
- RFC module location in API table for the API reference page HOT 5
- When a Pipeline step is changed via set_params, the set_output state is cleared HOT 3
- Allow users to override `_fit_and_score` of the BaseSearchCV HOT 1
- problem with convert_sklearn and onnx opset HOT 1
- Incorrect invalid device error introduced in #25956 HOT 6
- ColumnTransformer ignores certain column t HOT 1
- Request to update "Choosing the Right Estimator" Graphic (scikit-learn algorithm cheat sheet) HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-learn.