GithubHelp home page GithubHelp logo

pytorch / torcheval Goto Github PK

View Code? Open in Web Editor NEW
200.0 16.0 43.0 3.55 MB

A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.

Home Page: https://pytorch.org/torcheval

License: Other

Python 100.00%

torcheval's Introduction

TorchEval

build status pypi version pypi nightly version bsd license docs

This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible. If you have suggestions for improvements, please open a GitHub issue. We'd love to hear your feedback.

A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.

Installing TorchEval

Requires Python >= 3.8 and PyTorch >= 1.11

From pip:

pip install torcheval

For nighly build version

pip install --pre torcheval-nightly

From source:

git clone https://github.com/pytorch/torcheval
cd torcheval
pip install -r requirements.txt
python setup.py install

Quick Start

Take a look at the quickstart notebook, or fork it on Colab.

There are more examples in the examples directory:

cd torcheval
python examples/simple_example.py

Documentation

Documentation can be found at at pytorch.org/torcheval

Using TorchEval

TorchEval can be run on CPU, GPU, and in a multi-process or multi-GPU setting. Metrics are provided in two interfaces, functional and class based. The functional interfaces can be found in torcheval.metrics.functional and are useful when your program runs in a single process setting. To use multi-process or multi-gpu configurations, the class-based interfaces, found in torcheval.metrics provide a much simpler experience. The class based interfaces also allow you to defer some of the computation of the metric by calling update() multiple times before compute(). This can be advantageous even in a single process setting due to saved computation overhead.

Single Process

For use in a single process program, the simplest use case utilizes a functional metric. We simply import the metric function and feed in our outputs and targets. The example below shows a minimal PyTorch training loop that evaluates the multiclass accuracy of every fourth batch of data.

Functional Version (immediate computation of metric)

import torch
from torcheval.metrics.functional import multiclass_accuracy

NUM_BATCHES = 16
BATCH_SIZE = 8
INPUT_SIZE = 10
NUM_CLASSES = 6
eval_frequency = 4

model = torch.nn.Sequential(torch.nn.Linear(INPUT_SIZE, NUM_CLASSES), torch.nn.ReLU())
optim = torch.optim.Adagrad(model.parameters(), lr=0.001)
loss_fn = torch.nn.CrossEntropyLoss()

metric_history = []
for batch in range(NUM_BATCHES):
    input = torch.rand(size=(BATCH_SIZE, INPUT_SIZE))
    target = torch.randint(size=(BATCH_SIZE,), high=NUM_CLASSES)
    outputs = model(input)

    loss = loss_fn(outputs, target)
    optim.zero_grad()
    loss.backward()
    optim.step()

    # metric only computed every 4 batches,
    # data from previous three batches is lost
    if (batch + 1) % eval_frequency == 0:
        metric_history.append(multiclass_accuracy(outputs, target))

Single Process with Deferred Computation

Class Version (enables deferred computation of metric)

import torch
from torcheval.metrics import MulticlassAccuracy

NUM_BATCHES = 16
BATCH_SIZE = 8
INPUT_SIZE = 10
NUM_CLASSES = 6
eval_frequency = 4

model = torch.nn.Sequential(torch.nn.Linear(INPUT_SIZE, NUM_CLASSES), torch.nn.ReLU())
optim = torch.optim.Adagrad(model.parameters(), lr=0.001)
loss_fn = torch.nn.CrossEntropyLoss()
metric = MulticlassAccuracy()

metric_history = []
for batch in range(NUM_BATCHES):
    input = torch.rand(size=(BATCH_SIZE, INPUT_SIZE))
    target = torch.randint(size=(BATCH_SIZE,), high=NUM_CLASSES)
    outputs = model(input)

    loss = loss_fn(outputs, target)
    optim.zero_grad()
    loss.backward()
    optim.step()

    # metric only computed every 4 batches,
    # data from previous three batches is included
    metric.update(input, target)
    if (batch + 1) % eval_frequency == 0:
        metric_history.append(metric.compute())
        # remove old data so that the next call
        # to compute is only based off next 4 batches
        metric.reset()

Multi-Process or Multi-GPU

For usage on multiple devices a minimal example is given below. In the normal torch.distributed paradigm, each device is allocated its own process gets a unique numerical ID called a "global rank", counting up from 0.

Class Version (enables deferred computation and multi-processing)

import torch
from torcheval.metrics.toolkit import sync_and_compute
from torcheval.metrics import MulticlassAccuracy

# Using torch.distributed
local_rank = int(os.environ["LOCAL_RANK"]) #rank on local machine, i.e. unique ID within a machine
global_rank = int(os.environ["RANK"]) #rank in global pool, i.e. unique ID within the entire process group
world_size  = int(os.environ["WORLD_SIZE"]) #total number of processes or "ranks" in the entire process group

device = torch.device(
    f"cuda:{local_rank}"
    if torch.cuda.is_available() and torch.cuda.device_count() >= world_size
    else "cpu"
)

metric = MulticlassAccuracy(device=device)
num_epochs, num_batches = 4, 8

for epoch in range(num_epochs):
    for i in range(num_batches):
        input = torch.randint(high=5, size=(10,), device=device)
        target = torch.randint(high=5, size=(10,), device=device)

        # Add data to metric locally
        metric.update(input, target)

        # metric.compute() will returns metric value from
        # all seen data on the local process since last reset()
        local_compute_result = metric.compute()

        # sync_and_compute(metric) syncs metric data across all ranks and computes the metric value
        global_compute_result = sync_and_compute(metric)
        if global_rank == 0:
            print(global_compute_result)

    # metric.reset() clears the data on each process so that subsequent
    # calls to compute() only act on new data
    metric.reset()

See the example directory for more examples.

Contributing

We welcome PRs! See the CONTRIBUTING file.

License

TorchEval is BSD licensed, as found in the LICENSE file.

torcheval's People

Contributors

ananthsub avatar andreasfloros avatar benjaminhelyer avatar bobakfb avatar connorguo avatar damienallonsius avatar daniellepintz avatar devsatpathy avatar diego-urgell avatar duyicong515 avatar edward-io avatar enmalik avatar eriknikulski avatar facebook-github-bot avatar gagancodes avatar hadarrotschield avatar hwangjeff avatar jingchi-wang avatar jksenthil avatar lindawangg avatar ninginthecloud avatar pbontrager avatar plutonium-239 avatar qiaozhennn avatar qihongliu avatar samiwilf avatar tangbinh avatar williamhufb avatar yuxqiu avatar zixinyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torcheval's Issues

RetrievalRecall, RetrievalPrecision require different, 1D input than MulticlassRecall, MulticlassPrecision which accept batch input

๐Ÿ› Describe the bug

The different behavior of RetrievalRecall and RetrievalPrecision make it difficult to compute standard metrics such as Precision@k or Recall@k for multiclass classification problems.

Would it be possible to have them accept the same shape of input, e.g. inputs of shape batch_size, num_classes and targets of shape batch_size, num_classes?

Example code below:

To install: pip install --pre torcheval-nightly; using '0.0.7'.

import torch
from torch.nn import functional as F
from torcheval.metrics import RetrievalRecall


batch_size = 10
num_classes = 20
# generate random predictions
preds = torch.rand(batch_size, num_classes)
# generate random targets
targets = torch.randint(0, num_classes, (batch_size,))

recall = RetrievalRecall(num_queries=batch_size, k=5)

# first make the targets one hot (RetrievalRecall does not accept num_classes arguments, requires binary targets)
targets_one_hot = F.one_hot(targets.type(torch.long), num_classes)
targets_one_hot.shape

# indexes associate each prediction with a target
indexes = torch.arange(batch_size).repeat(num_classes, 1).T

recall.update(preds.ravel(), targets_one_hot.ravel(), indexes=indexes.ravel())

recall.compute().mean() # -> 0.1


from torcheval.metrics import MulticlassRecall, MulticlassPrecision

recall = MulticlassRecall(num_classes=num_classes)
precision = MulticlassPrecision(num_classes=num_classes)
recall.update(preds, targets)
precision.update(preds, targets)
recall.compute(), precision.compute() # -> 0.1, 0.1

Current workaround:

import torch
from torch.nn import functional as F
from torcheval.metrics import RetrievalRecall


class MulticlassRetrievalRecall(RetrievalRecall):
    def __init__(self, batch_size, num_classes, **kwargs):
        super().__init__(num_queries=batch_size, **kwargs)
        self.num_classes = num_classes
        
    def update(self, input, target):
        target_one_hot = F.one_hot(target.type(torch.long), self.num_classes)
        indexes = torch.arange(len(input)).repeat(self.num_classes, 1).T
        super().update(input.ravel(), target_one_hot.ravel(), indexes=indexes.ravel())

Usage:

recall_multi = MulticlassRetrievalRecall(batch_size, num_classes, k=5)
recall_multi.update(preds, targets)
recall_multi.compute().mean() # -> 0.1

Open to any tips on how best to do this! Thank for this helpful canonical library :)

Versions

python collect_env.py                                                                                    ๎‚ฒ ๏€Œ ๎‚ฒ 9854 ๎‚ฒ 17:14:34 ๏€— 

Collecting environment information...
PyTorch version: 2.1.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.6.2 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.0.40.1)
CMake version: version 3.22.2
Libc version: N/A

Python version: 3.11.6 (main, Nov  2 2023, 04:39:43) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime)
Python platform: macOS-13.6.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Max

Versions of relevant libraries:
[pip3] numpy==1.26.2
[pip3] torch==2.1.1
[pip3] torchaudio==2.1.1
[pip3] torchdata==0.7.1
[pip3] torcheval==0.0.7
[pip3] torcheval-nightly==2023.12.21
[pip3] torchtext==0.16.1
[pip3] torchvision==0.16.1
[conda] numpy                     1.24.3          py310hb93e574_0  
[conda] numpy-base                1.24.3          py310haf87e8b_0  
[conda] torch                     2.0.1                    pypi_0    pypi

FLOPs and ModuleSummary Documentation

๐Ÿ“š The doc issue

I recently discovered this library and it looks very promising! As a newcomer, I have a couple of specific ideas about module documentation and hope they might be able to benefit the project in some way. Please excuse me if these suggestions are overly-specific: they just encapsulate a few small roadblocks that I encountered when I was first playing with the library.

Specifically, I am interested in the FLOPs and module evaluation advertised in the goals in Issue #82 . Here are my specific suggestions:

  1. Minor fix for the docstring of FlopTensorDispatchMode. As it currently stands, the example code won't run due to a small variable naming issue. I'm making a quick PR to this effect (hope that's okay). Since flops.py was linked in Issue #82 , it was the first file I stepped into in the library, so I suspect others may have the same experience as me in the future.

  2. Update docs with recommended usage for obtaining number of FLOPs and module summary. I wasn't sure at the beginning of the preferred way to get the number of FLOPs. The first thing I tried was the example code in FlopTensorDispatchMode, but it seems like the preferred way would be to use torcheval.tools.get_summary_table on a module. In this vein, I think we should direct users away from FlopTensorDispatchMode somehow.

Thanks for considering these suggestions; I hope to continue to be engaged in this!

Suggest a potential alternative/fix

  1. PR will be made shortly after this issue is posted.

  2. My feeling is that when users search 'FLOPs' in the docs or wander into flops.py, they should be directed towards torcheval.tools.get_summary_table. Add example snippet to ModuleSummary docs, possibly in torcheval.tools.get_summary_table. Also possibly add guidance in the docstring of FlopTensorDispatchMode to direct users toward ModuleSummary.

Multiple metrics sharing the same state

๐Ÿš€ The feature

Metrics class that shares the state for multiple metrics.

Motivation, pitch

Usually we need to compute multiple metrics for a task. And it is very inefficient to store multiple copies of inputs & targets for each task.
It is more desired to have a metric class that share the states across multiple metrics.

Alternatives

No response

Additional context

from functools import partial
from math import nan
from typing import Any, Callable, Iterable, List, Mapping, Optional, Union

import torch
from chanfig import FlatDict
from torch import Tensor
from torch import distributed as dist
from torcheval.metrics import Metric
from torcheval.metrics import functional as tef
from torchmetrics import functional as tmf


class flist(list):  # pylint: disable=R0903
    def __format__(self, *args, **kwargs):
        return " ".join([x.__format__(*args, **kwargs) for x in self])


class Metrics(Metric):
    r"""
    Metric class that wraps around multiple metrics.

    Typically, there are many metrics that we want to compute.
    Computing them one by one is inefficient, especially when evaluating in distributed environment.
    This class wraps around multiple metrics and computes them at the same time.

    Attributes:
        metrics: A dictionary of metrics to be computed.
        input: The input tensor of latest batch.
        target: The target tensor of latest batch.
        inputs: All input tensors.
        targets: All target tensors.

    Args:
        *args: A single mapping of metrics.
        **metrics: Metrics.

    """

    metrics: FlatDict[str, Callable]
    _input: Tensor
    _target: Tensor
    _inputs: List[Tensor]
    _targets: List[Tensor]
    index: str
    best_fn: Callable

    def __init__(self, *args, **metrics: FlatDict[str, Callable]):
        self.metrics = FlatDict()
        super().__init__()
        self._add_state("_input", torch.empty(0))  # pylint: disable=E1101
        self._add_state("_target", torch.empty(0))  # pylint: disable=E1101
        self._add_state("_inputs", [])
        self._add_state("_targets", [])
        if len(args) == 1 and isinstance(args[0], Mapping):
            self.metrics.merge(args[0])
        elif len(args) != 0:
            raise ValueError("Metrics only accepts a single mapping as positional argument")
        self.metrics.merge(metrics)

    @torch.inference_mode()
    def update(self, input: Any, target: Any) -> None:  # pylint: disable=W0622
        if not isinstance(input, torch.Tensor):
            input = torch.tensor(input)  # pylint: disable=E1101
        if not isinstance(target, torch.Tensor):
            target = torch.tensor(target)  # pylint: disable=E1101
        input, target = input.to(self.device), target.to(self.device)
        self._input, self._target = input, target
        self._inputs.append(input)
        self._targets.append(target)

    @property
    def val(self) -> FlatDict[str, float]:
        return self.compute()

    @property
    def avg(self) -> FlatDict[str, float]:
        return self.average()

    def compute(self) -> FlatDict[str, float]:
        ret = FlatDict()
        for name, metric in self.metrics.items():
            ret[name] = self.calculate(metric, self.input, self.target)
        return ret

    def average(self) -> FlatDict[str, float]:
        ret = FlatDict()
        for name, metric in self.metrics.items():
            ret[name] = self.calculate(metric, self.inputs, self.targets)
        return ret

    @staticmethod
    @torch.inference_mode()
    def calculate(func, input: Tensor, target: Tensor) -> Union[flist, float]:  # pylint: disable=W0622
        if input.numel() == 0 == target.numel():
            return nan
        score = func(input, target)
        return score.item() if score.numel() == 1 else flist(score.tolist())

    @torch.inference_mode()
    def merge_state(self, metrics: Iterable):
        raise NotImplementedError()

    @property
    @torch.inference_mode()
    def input(self):
        if not dist.is_initialized() or dist.get_world_size() == 1:
            return self._input
        synced_input = [None for _ in range(dist.get_world_size())]
        dist.all_gather_object(synced_input, self._input)
        return torch.cat([t.to(self.device) for t in synced_input], 0)  # pylint: disable=E1101

    @property
    @torch.inference_mode()
    def target(self):
        if not dist.is_initialized() or dist.get_world_size() == 1:
            return self._target
        synced_target = [None for _ in range(dist.get_world_size())]
        dist.all_gather_object(synced_target, self._target)
        return torch.cat([t.to(self.device) for t in synced_target], 0)  # pylint: disable=E1101

    @property
    @torch.inference_mode()
    def inputs(self):
        if not self._inputs:
            return torch.empty(0)  # pylint: disable=E1101
        if not dist.is_initialized() or dist.get_world_size() == 1:
            return torch.cat(self._inputs, 0)  # pylint: disable=E1101
        synced_inputs = [None for _ in range(dist.get_world_size())]
        dist.all_gather_object(synced_inputs, self._inputs)
        return torch.cat([t.to(self.device) for i in synced_inputs for t in i], 0)  # pylint: disable=E1101

    @property
    @torch.inference_mode()
    def targets(self):
        if not self._targets:
            return torch.empty(0)  # pylint: disable=E1101
        if not dist.is_initialized() or dist.get_world_size() == 1:
            return torch.cat(self._targets, 0)  # pylint: disable=E1101
        synced_targets = [None for _ in range(dist.get_world_size())]
        dist.all_gather_object(synced_targets, self._targets)
        return torch.cat([t.to(self.device) for i in synced_targets for t in i], 0)  # pylint: disable=E1101

    def __repr__(self):
        keys = tuple(i for i in self.metrics.keys())
        return f"{self.__class__.__name__}{keys}"

    def __format__(self, format_spec):
        val, avg = self.compute(), self.average()
        return "\n".join(
            [f"{key}: {val[key].__format__(format_spec)} ({avg[key].__format__(format_spec)})" for key in self.metrics]
        )


class IndexMetrics(Metrics):
    r"""
    IndexMetrics is a subclass of Metrics that supports scoring.

    Score is a single value that best represents the performance of the model.
    It is the core metrics that we use to compare different models.
    For example, in classification, we usually use auroc as the score.

    IndexMetrics requires two additional arguments: `index` and `best_fn`.
    `index` is the name of the metric that we use to compute the score.
    `best_fn` is a function that takes a list of values and returns the best value.
    `best_fn` is only not used by IndexMetrics, it is meant to be accessed by other classes.

    Attributes:
        index: The name of the metric that we use to compute the score.
        best_fn: A function that takes a list of values and returns the best value.

    Args:
        *args: A single mapping of metrics.
        index: The name of the metric that we use to compute the score. Defaults to the first metric.
        best_fn: A function that takes a list of values and returns the best value. Defaults to `max`.
        **metrics: Metrics.
    """

    index: str
    best_fn: Callable

    def __init__(
        self, *args, index: Optional[str] = None, best_fn: Optional[Callable] = max, **metrics: FlatDict[str, Callable]
    ):
        super().__init__(*args, **metrics)
        self.index = index or next(iter(self.metrics.keys()))
        self.metric = self.metrics[self.index]
        self.best_fn = best_fn or max

    def score(self, scope: str) -> Union[float, flist]:
        if scope == "batch":
            return self.batch_score()
        if scope == "average":
            return self.average_score()
        raise ValueError(f"Unknown scope: {scope}")

    def batch_score(self) -> Union[float, flist]:
        return self.calculate(self.metric, self.input, self.target)

    def average_score(self) -> Union[float, flist]:
        return self.calculate(self.metric, self.inputs, self.targets)


def binary_metrics():
    return Metrics(auroc=tef.binary_auroc, auprc=tef.binary_auprc, acc=tef.binary_accuracy)


def multiclass_metrics(num_classes: int):
    auroc = partial(tef.multiclass_auroc, num_classes=num_classes)
    auprc = partial(tef.multiclass_auprc, num_classes=num_classes)
    acc = partial(tef.multiclass_accuracy, num_classes=num_classes)
    return Metrics(auroc=auroc, auprc=auprc, acc=acc)


def multilabel_metrics(num_labels: int):
    auroc = partial(tmf.classification.multilabel_auroc, num_labels=num_labels)
    auprc = partial(tef.multilabel_auprc, num_labels=num_labels)
    return Metrics(auroc=auroc, auprc=auprc, acc=tef.multilabel_accuracy)


def regression_metrics():
    return Metrics(
        pearson=tmf.pearson_corrcoef,
        spearman=tmf.spearman_corrcoef,
        r2=tef.r2_score,
        mse=tef.mean_squared_error,
    )

The FID result cannot be aligned with pytorch-fid/torch-fidelity

๐Ÿ› Describe the bug

Just use FrechetInceptionDistance on any images.

import torch
from torcheval.metrics import FrechetInceptionDistance
imgs_1 = ...
imgs_2 = ...
fid = FrechetInceptionDistance(device=device)
fid.update(imgs_1 , True)
fid.update(imgs_2 , False)
print(fid.compute())

I have found the causes.

  1. The model and weight of InceptionV3 is not strictly same to the TensorFlow version, which is used by most papers. Now I replace it with InceptionV3 model from pytorch-fid.
  2. The insufficient precision of torcheval.metrics.image.fid.FrechetInceptionDistance._calculate_frechet_distance leads to wrong results. I fix this by setting all variables (fake_sum, real_cov_sum, ...) to float64.

With these modifications, the FID results are aligned. I suggest the maintainers reimplement the FID to align it with other libraries, otherwise researchers won't use torcheval to calculate FID.

Versions

PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.4
[pip3] pytorch-fid==0.3.0
[pip3] pytorch-lightning==1.4.2
[pip3] torch==2.0.1
[pip3] torch-ema==0.3
[pip3] torch-fidelity==0.3.0
[pip3] torchaudio==2.0.2
[pip3] torcheval==0.0.7
[pip3] torchmetrics==0.5.0
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.24.4 pypi_0 pypi
[conda] pytorch-fid 0.3.0 pypi_0 pypi
[conda] pytorch-lightning 1.4.2 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 2.0.1 pypi_0 pypi
[conda] torch-ema 0.3 pypi_0 pypi
[conda] torch-fidelity 0.3.0 pypi_0 pypi
[conda] torchaudio 2.0.2 pypi_0 pypi
[conda] torcheval 0.0.7 pypi_0 pypi
[conda] torchmetrics 0.5.0 pypi_0 pypi
[conda] torchvision 0.15.2 pypi_0 pypi
[conda] triton 2.0.0 pypi_0 pypi

Multi Process Error

๐Ÿ› Describe the bug

When I run "distributed_example. py", I find that if the number of processes exceeds 3, an error will occur. I find that "gathered_metric_list" will always remain None. I have 7 GPUs. The same error occurs in the "test_toolkit. py" under the test folder. So what might be the cause of this problem?

Versions

PyTorch version: 1.12.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.0 (default, Nov 6 2019, 21:49:08) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.4.0-142-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 10.1.243
CUDA_MODULE_LOADING set to:
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX TITAN X
GPU 3: GeForce GTX TITAN X
GPU 4: GeForce GTX TITAN X
GPU 5: GeForce GTX TITAN X
GPU 6: GeForce GTX TITAN X

Nvidia driver version: 460.56
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] torch==1.12.0
[pip3] torcheval==0.0.5
[pip3] torcheval-nightly==2022.11.19
[pip3] torchmetrics==0.10.3
[pip3] torchtnt==0.0.3
[conda] blas 1.0 mkl http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl 2020.2 256 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] numpy 1.23.4
[conda] pytorch 1.13.0 py3.8_cpu_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] pytorch-mutex 1.0 cpu http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torch 1.12.0
[conda] torcheval 0.0.5
[conda] torcheval-nightly 2022.11.19
[conda] torchmetrics 0.10.3
[conda] torchtnt 0.0.3

Add COCO mAP

๐Ÿš€ The feature

COCO mAP

COCO mAP (mean average precision) is a widely used evaluation metric for object detection models, especially for the COCO dataset. Unlike the PASCAL VOC evaluation, which has a single IoU (Intersection over Union) threshold for assessing the detection model, the COCO mAP evaluator averages the mAP of 80 classes over 10 IoU thresholds from 0.5 to 0.95 with a step size of 0.05 (AP@[0.5:0.05:0.95]). This is to avoid the bias that a single threshold may induce in the evaluation metric and to provide a more complete analysis of the detection model.

Motivation, pitch

COCO mAP has an official API, which lacks maintenance and has been outdated.

Alternatives

No response

Additional context

No response

Can't import under torch 1.11

๐Ÿ› Describe the bug

Import fails with torch 1.11

import torcheval
/github/home/venv/lib/python3.8/site-packages/fairseq2/callbacks/metrics.py:7: in <module>
    import torcheval
/github/home/venv/lib/python3.8/site-packages/torcheval/__init__.py:9: in <module>
    from . import metrics, tools
/github/home/venv/lib/python3.8/site-packages/torcheval/tools/__init__.py:7: in <module>
    from torcheval.tools.module_summary import (
/github/home/venv/lib/python3.8/site-packages/torcheval/tools/module_summary.py:17: in <module>
    from torcheval.tools.flops import (
/github/home/venv/lib/python3.8/site-packages/torcheval/tools/flops.py:152: in <module>
    aten.mm.default: _matmul_flop_jit,
E   AttributeError: 'builtin_function_or_method' object has no attribute 'default'
=============================== warnings summary ===============================
../../../github/home/venv/lib/python3.8/site-packages/torch/utils/tensorboard/__init__.py:4
../../../github/home/venv/lib/python3.8/site-packages/torch/utils/tensorboard/__init__.py:4
  /github/home/venv/lib/python3.8/site-packages/torch/utils/tensorboard/__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    if not hasattr(tensorboard, '__version__') or LooseVersion(tensorboard.__version__) < LooseVersion('1.15'):

../../../github/home/venv/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:326
  /github/home/venv/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:326: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
    np.bool8: (False, True),

AFAICT the issue is that aten.mm.default doesn't exist in this older Pytorch version:

https://github.com/pytorch/torcheval/blob/main/torcheval/tools/flops.py#L156-L162

Versions

can't run this in CI. You can see the CI details there https://github.com/fairinternal/fairseq2/actions/runs/3885387652/jobs/6629321543

Add Wasserstein Distance

๐Ÿš€ The feature

Wasserstein Distance

We'd like to start building out statistical metrics. In this issue we cover the Wasserstein distance, also called the Earth Mover's Distance, which is a measure of the similarity between two distributions.

The Wasserstein distance between two distributions is intuitively the minimum weight of soil (times distance moved) that would need to be moved if the two distributions were represented by two piles of soil. It is not tractable in high dimensions, so we will restrict ourselves to 1 dimensions for this issue.

How To

Make sure to take a look at sci-py's implementation. Also take a look at the quickstart guide which explains how to implement metrics and has a basic implementation of the KS-test statistic which is quite similar to the earth mover's distance (Note: we must implement this function in pure pytorch, the example using scipy in the quickstart is just for simplicity)

Requirements

Implement the wasserstein_distance function:

def wasserstein_1d(x: torch.Tensor, y: torch.Tensor, x_weights: Optional[torch.Tensor] = None, y_weights: Optional[torch.Tensor] = None) -> torch.Tensor:

And the class

class Wasserstein1D(Metric[torch.Tensor]):
  def __init__(self, device: Optional[torch.device] = None)

Class-based implementations keep internal states which can be accumulated as training occurs with calls to update(). The internal states can then be used to calculate the full metric with compute(), which returns the result. In addition, the class interface must have a merge_state() function which explains how to aggregate the internal state variables if they are being updated independently in different processes. For WD, the implementation will probably be similar to AUC, where the internal states of the class based implementation are just a list of all the samples. The only arg the constructor needs is device.

The Functional implementations just takes one set of samples from the first distribution (in x) and one set of samples form the second distribution (y) and returns the earth mover distance. To keep our implementations clean, we have a well defined set of input checks which can be seen below.

<metric>(input, target, *params) #Returns the computed metric for the given predictions (input) and target values

#supporting functions

_<metric>_param_check(...) #Checks the parameters (like number of classes) are valid
_<metric>_update(input, target) #Returns the intermediate variables used to calculate the metric, these should be the same as the state variables for the class based version
_<metric>_update_input_check(...) #Checks the input and target are congruent with the metric definition and parameters (e.g. they are the right shape for the given number of classes)
_<metric>_compute(*state_vars) #Computes the metric given the state variables

Examples:

Functional:

>>> from torcheval.metrics.functional import wasserstein_1d
>>> wasserstein_1d(torch.tensor([0,1,2]), torch.tensor([0,1,1]))
0.33333333333333337
>>> wasserstein_1d(torch.tensor([0,1,2]), torch.tensor([0,1,1]), torch.tensor([1,2,0]), torch.tensor([1,1,1]))
0.0
>>> wasserstein_1d(torch.tensor([0,1,2,2]), torch.tensor([0,1]))
0.75

Classy

>>> from torcheval.metrics import Wasserstein1D
>>> metric = Wasserstein1D()
>>> metric.update(torch.tensor([0,1,2,2]), torch.tensor([0,1]))
>>> metric.compute()
0.75
>>> metric = Wasserstein1D()
>>> metric.update(torch.tensor([0,1,2]), torch.tensor([0,1,1]), torch.tensor([1,2,0]), torch.tensor([1,1,1]))
>>> metric.compute()
0
>>> metric = Wasserstein1D()
>>> metric.update(torch.tensor([0,1,2]), torch.tensor([0,1,1]))
>>> metric.compute()
0.33333333333333337
>>> metric.update(torch.tensor([1,1,1]), torch.tensor([1,1,1]))
>>> metric.compute()
0.16666666666666663

Steps required

Create new wasserstein_1d function in new file fbcode/torcheval/metrics/functional/statistical/wasserstein.py
Create new Wasserstein1D Class in new file fbcode/torcheval/metrics/statistical/wasserstein.py
Add functions and class to init files for easy importing (functional: 1, 2) (class based: 3, 4)
Create new test cases to cover the new functional metric torcheval/tests/metrics/functional/statistical/test_wasserstein.py
Create new test cases to cover the new class metric `torcheval/tests/metrics/statistical/test_wasserstein.py

Testing your changes

Take a look at the contributors guide to see how to run unit tests.

A good suite of tests should have do following:

  • A random data test like this one in binned auprc -- be sure to add the random data getter to the random data module
  • you should use scipy's implementations with random inputs as a 1 to 1 comparison.
  • all input types and shapes should be utilized
  • every arg should be utilized, arg combinations that interact in any way should also be given their own specific test.
  • If your input needs special characteristics to test some cases/arg combinations (e.g. the input must be sorted) be sure to hard code inputs and outputs.
  • The idea is to make sure every feature of the code you wrote is tested.
  • All error endpoints triggered and checked with assertRaisesRegex

Use MetricClassTester to test the class based code. Make sure to utilize multiple updates across different machines by setting the number of elements in the update lists to be a multiple of num_processes (4 per proc is a good target normally)

Please feel free to ask any questions!

Disagreement for macro f1 with torchmetrics and sklearn

๐Ÿ› Describe the bug

Torcheval gives a different answer from sklearn and torchmetrics for macro MulticlassF1Score when there are classes in the target that are not included in the prediction:

Example:

from sklearn.metrics import f1_score
from torch import tensor
from torchmetrics.classification import MulticlassF1Score
from torcheval.metrics.functional import multiclass_f1_score

target = tensor([2, 1, 0, 4])
preds = tensor([2, 1, 0, 1])
print('Preds: ', preds)
print('Target: ', target)
n_classes = 5
metric = MulticlassF1Score(num_classes=n_classes, average='macro')
torchmetrics_f1 = metric(preds, target)
scikit_f1 = f1_score(target.tolist(), preds.tolist(), average='macro')
torcheval_f1 = multiclass_f1_score(preds, target, average='macro', num_classes=n_classes)
print(f"Num classes: {n_classes:d}, torchmetrics f1 = {torchmetrics_f1:8.6f}, sklearn f1: {scikit_f1:8.6f}, torcheval: {torcheval_f1:8.6f}")

Output:

Preds:  tensor([2, 1, 0, 1])
Target:  tensor([2, 1, 0, 4])
WARNING:root:Warning: Some classes do not exist in the target. F1 scores for these classes will be cast to zeros.
Num classes: 5, torchmetrics f1 = 0.666667, sklearn f1: 0.666667, torcheval: 0.777778

Expectation is that the torcheval result would match torchmetrics and sklearn, which are consistent with calculating this by hand.

Package versions:
torch : 2.1.0a0+gita014d1b
torchmetrics: 1.0.3
torcheval : 0.0.6
sklearn : 1.2.2

Versions

Collecting environment information...
PyTorch version: 2.1.0a0+gita014d1b
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 5.4.22801-aaa1e3d8

OS: SUSE Linux Enterprise Server 15 SP4 (x86_64)
GCC version: (SUSE Linux) 7.5.0
Clang version: 15.0.0 (324a8e7de6a18594c06a0ee5d8c0eda2109c6ac6)
CMake version: version 3.20.4
Libc version: glibc-2.31

Python version: 3.9.16 (main, Jan 11 2023, 16:05:54) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.14.21-150400.24.46_12.0.83-cray_shasta_c-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 5.4.22801
MIOpen runtime version: 2.19.0
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7763 64-Core Processor
CPU family: 25
Model: 1
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 1
Stepping: 1
BogoMIPS: 4890.91
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca
Virtualization: AMD-V
L1d cache: 2 MiB (64 instances)
L1i cache: 2 MiB (64 instances)
L2 cache: 32 MiB (64 instances)
L3 cache: 256 MiB (8 instances)
NUMA node(s): 4
NUMA node0 CPU(s): 0-15,64-79
NUMA node1 CPU(s): 16-31,80-95
NUMA node2 CPU(s): 32-47,96-111
NUMA node3 CPU(s): 48-63,112-127
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.0
[pip3] torch==2.1.0a0+gita014d1b
[pip3] torcheval==0.0.6
[pip3] torchmetrics==1.0.3
[pip3] torchtnt==0.2.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py39h5eee18b_1
[conda] mkl_fft 1.3.6 py39h417a72b_1
[conda] mkl_random 1.2.2 py39h417a72b_1
[conda] numpy 1.22.4 pypi_0 pypi
[conda] torch 2.1.0a0+gita014d1b pypi_0 pypi
[conda] torcheval 0.0.6 pypi_0 pypi
[conda] torchmetrics 1.0.3 pypi_0 pypi
[conda] torchtnt 0.2.0 pypi_0 pypi

Throughput metric is not taking into account the number of processes

๐Ÿ› Describe the bug

I'm using torcheval.metrics.Throughput to compute a number of "tokens per second" for my training loop.
To avoid synchronization overhead I don't sync the metric every time I log it, but only once in a while.
In this examples I'm using a 8 process job with 8 GPUs

metrics["tps"] = torcheval.metrics.Throughput(device=device)
...

metrics["tps"].update(tgt_num_tokens, elapsed_time_sec=state.timer.interval_time_seconds)
... 

should_log = (step % freq == 0)
should_sync = step % self.sync_frequency == 0
if should_log:
    if should_sync:
        val = torcheval.metrics.toolkit.sync_and_compute(metrics["tps"]).item()
    else:
        val = metrics["tps"].compute().item()
    self.log_metric({"train/tps": val})

image

This creates big spikes in the log graph, because throughput is not averaging over the number of workers, so when I sync I get x8 TPS.

Is this the expected behavior ? I would find it less surprising if sync was returning the average throughput, or if non-sync was returning an estimated global throughput.

Versions

[conda] torch                     2.0.0+cu117              pypi_0    pypi
[conda] torchaudio                2.0.1+cu117              pypi_0    pypi
[conda] torcheval                 0.0.5                    pypi_0    pypi
[conda] torchsnapshot             0.1.0                    pypi_0    pypi
[conda] torchsnapshot-nightly     2022.11.28               pypi_0    pypi
[conda] torchtnt                  0.0.7                    pypi_0    pypi
[conda] torchx                    0.5.0                    pypi_0    pypi
[conda] triton                    2.0.0                    pypi_0    pypi

BinaryAccuracy over 1 when labels are boolean.

๐Ÿ› Describe the bug

The BinaryAccuracy does not behave equally when the labels (targets) are boolean than when they are ints.

ba = BinaryAccuracy()
outputs  = [torch.randn(32) for _ in range(10)]
labels = [torch.randint(0, 2, (32,)).bool() for _ in range(10)]
for out, lbl in zip(outputs, labels):
    ba.update(out, lbl)
ba.compute()
> tensor(15.2000)

compared to

ba = BinaryAccuracy()
outputs  = [torch.randn(32) for _ in range(10)]
labels = [torch.randint(0, 2, (32,)) for _ in range(10)]. # no bool casting...
for out, lbl in zip(outputs, labels):
    ba.update(out, lbl)
ba.compute()
> tensor(0.5344)

Versions

PyTorch version: 1.12.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N[/A](https://vscode-remote+ssh-002dremote-002bcape-002dv100.vscode-resource.vscode-cdn.net/A)

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 15:55:03)  [GCC 10.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-1030-gcp-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.7.99
GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB
Nvidia driver version: 515.86.01
cuDNN version: Could not collect
HIP runtime version: N[/A](https://vscode-remote+ssh-002dremote-002bcape-002dv100.vscode-resource.vscode-cdn.net/A)
MIOpen runtime version: N[/A](https://vscode-remote+ssh-002dremote-002bcape-002dv100.vscode-resource.vscode-cdn.net/A)
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] adan-pytorch==0.1.0
...
[conda] torcheval                 0.0.5                    pypi_0    pypi
[conda] torchmetrics              0.9.3                    pypi_0    pypi
[conda] torchtnt                  0.0.4                    pypi_0    pypi
[conda] torchvision               0.14.1               py39_cu117    pytorch

Disagreement for auroc1 with sklearn

๐Ÿ› Describe the bug

Torcheval gives a different answer from sklearn for roc auc

mse_roc_auc_trh=0.553, mse_roc_auc_sk=0.596

here is my code repository https://github.com/crazyn2/mini-ad
please set download=True in datamodules/cifar10.py and run this command

python main/cifar10/msd/aev1v3msdv1.py --seed 0 --pre_epochs 200 --progress_bar --visual --epochs 20 --normal_class 1 --log_path /home/zby/Workspaces/mini-ad --batch_size 100 --n_trials 2 --sampler random --monitor mse

Versions

PyTorch version: 2.1.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.2.0-39-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3080 Ti
GPU 1: NVIDIA GeForce RTX 3080 Ti
GPU 2: NVIDIA GeForce RTX 3090

Nvidia driver version: 535.129.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             24
On-line CPU(s) list:                0-23
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz
CPU family:                         6
Model:                              85
Thread(s) per core:                 2
Core(s) per socket:                 12
Socket(s):                          1
Stepping:                           7
CPU max MHz:                        4800.0000
CPU min MHz:                        1200.0000
BogoMIPS:                           6999.82
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512_vnni md_clear flush_l1d arch_capabilities
Virtualization:                     VT-x
L1d cache:                          384 KiB (12 instances)
L1i cache:                          384 KiB (12 instances)
L2 cache:                           12 MiB (12 instances)
L3 cache:                           19.3 MiB (1 instance)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-23
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:             Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; TSX disabled

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] kmeans-pytorch==0.3
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.2
[pip3] pytorch-lightning==2.0.9.post0
[pip3] torch==2.1.1
[pip3] torch-tb-profiler==0.4.3
[pip3] torchaudio==2.1.0
[pip3] torchdata==0.7.1
[pip3] torcheval==0.0.7
[pip3] torchmetrics==1.2.1
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.16.1
[pip3] torchvision==0.16.1
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl    defaults
[conda] kmeans-pytorch            0.3                      pypi_0    pypi
[conda] mkl                       2023.1.0         h213fc3f_46344    defaults
[conda] mkl-service               2.4.0           py311h5eee18b_1    defaults
[conda] mkl_fft                   1.3.8           py311h5eee18b_0    defaults
[conda] mkl_random                1.2.4           py311hdb19cb5_0    defaults
[conda] numpy                     1.26.2          py311h08b1b3b_0    defaults
[conda] numpy-base                1.26.2          py311hf175353_0    defaults
[conda] pytorch-cuda              12.1                 ha16c6d3_5    pytorch
[conda] pytorch-lightning         2.0.9.post0              pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     2.1.1                    pypi_0    pypi
[conda] torch-tb-profiler         0.4.3                    pypi_0    pypi
[conda] torchaudio                2.1.0               py311_cu121    pytorch
[conda] torchdata                 0.7.1                    pypi_0    pypi
[conda] torcheval                 0.0.7                    pypi_0    pypi
[conda] torchmetrics              1.2.1                    pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchtext                 0.16.1                   pypi_0    pypi
[conda] torchvision               0.16.1                   pypi_0    pypi
[conda] triton                    2.1.0                    pypi_0    pypi

can not find package

๐Ÿ› Describe the bug

from torchtnt.utils.version import is_torch_version_geq_1_12
I found that this package does not exist. What is the version of torchtnt?
Or is something else wrong?

Versions

from torchtnt.utils.version import is_torch_version_geq_1_12
I found that this package does not exist. What is the version of torchtnt?
Or is something else wrong?

Bug in MulticlassRecall example from when adding one additional class

๐Ÿ› Describe the bug

The example from the docs leads to a bug when modified slightly: https://pytorch.org/torcheval/stable/generated/torcheval.metrics.MulticlassRecall.html#torcheval.metrics.MulticlassRecall

>>> metric = MulticlassRecall(num_classes=4)
>>> input = torch.tensor([[0.9, 0.1, 0, 0], [0.1, 0.2, 0.4, 0.3], [0, 1.0, 0, 0], [0, 0, 0.2, 0.8]])
>>> target = torch.tensor([0, 1, 2, 3])
>>> metric.update(input, target)
>>> metric.compute()
tensor(0.5000)

Adding an extra class and specifying a "macro" average leads to a bug:

metric = MulticlassRecall(num_classes=5, average="macro")
input = torch.tensor([[0.9, 0.1, 0, 0, 0], [0.1, 0.2, 0.4, 0.3, 0], [0, 1.0, 0, 0, 0], [0, 0, 0.2, 0.8, 0]])
target = torch.tensor([0, 1, 2, 3])
metric.update(input, target)
metric.compute()

Yields:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/me/projects/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/projects/.venv/lib/python3.11/site-packages/torcheval/metrics/classification/recall.py", line 243, in compute
    return _recall_compute(
           ^^^^^^^^^^^^^^^^
  File "/Users/me/projects/.venv/lib/python3.11/site-packages/torcheval/metrics/functional/classification/recall.py", line 195, in _recall_compute
    recall = num_tp / num_labels
             ~~~~~~~^~~~~~~~~~~~
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 0

Versions

python collect_env.py                                                                                    ๎‚ฒ ๏€Œ ๎‚ฒ 9854 ๎‚ฒ 17:14:34 ๏€— 

Collecting environment information...
PyTorch version: 2.1.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.6.2 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.0.40.1)
CMake version: version 3.22.2
Libc version: N/A

Python version: 3.11.6 (main, Nov  2 2023, 04:39:43) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime)
Python platform: macOS-13.6.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Max

Versions of relevant libraries:
[pip3] numpy==1.26.2
[pip3] torch==2.1.1
[pip3] torchaudio==2.1.1
[pip3] torchdata==0.7.1
[pip3] torcheval==0.0.7
[pip3] torcheval-nightly==2023.12.21
[pip3] torchtext==0.16.1
[pip3] torchvision==0.16.1
[conda] numpy                     1.24.3          py310hb93e574_0  
[conda] numpy-base                1.24.3          py310haf87e8b_0  
[conda] torch                     2.0.1                    pypi_0    pypi

RuntimeError: "bitwise_and_cpu" not implemented for 'Float' when using binary_precison & binary_recall

๐Ÿ› Describe the bug

I face RuntimeError: "bitwise_and_cpu" not implemented for 'Float' when using binary_precison & binary_recall


import torch
from torcheval.metrics.functional import binary_accuracy, binary_precision, binary_recall, binary_f1_score

label = torch.FloatTensor([1., 1., 1., 1., 1., 1.])
pred = torch.FloatTensor([0.5539, 0.5593, 0.5662, 0.4550, 0.4690, 0.4465])

acc = binary_accuracy(input=pred, target=label)
f1 = binary_f1_score(input=pred, target=label)
prec = binary_precision(input=pred, target=label)
rec = binary_recall(input=pred, target=label)


Traceback (most recent call last):
File "test.py", line 11, in
prec = binary_precision(input=pred, target=label)
File "/home/kidpaul/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/kidpaul/.local/lib/python3.8/site-packages/torcheval/metrics/functional/classification/precision.py", line 50, in binary_precision
num_tp, num_fp, num_label = _binary_precision_update(input, target, threshold)
File "/home/kidpaul/.local/lib/python3.8/site-packages/torcheval/metrics/functional/classification/precision.py", line 228, in _binary_precision_update
num_tp = (input & target).sum()
RuntimeError: "bitwise_and_cpu" not implemented for 'Float'

Versions

--2023-04-22 20:23:49-- https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8003::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-04-22 20:23:49 ERROR 404: Not Found.


Python: 3.8.10
torch: 1.11.0+cu102
torcheval: 0.0.6

get_module_summary assertion fail

๐Ÿ› Describe the bug

I have a sub-module shared by different modules, like the l below. get_module_summary reports an Assertion Error for this situation.

import torch
from torch import nn
from torcheval.tools.module_summary import get_module_summary

l = nn.Linear(10, 10)
s1 = nn.Sequential(l)
s2 = nn.Sequential(l)
s = nn.Sequential(s1, s2)

ms = get_module_summary(s, module_args=(torch.randn(10, 100, 10, dtype=torch.float32),))
flops_forward_eval, flops_back_eval = ms.flops_forward, ms.flops_backward
params_eval = ms.num_parameters

print(flops_forward_eval, flops_back_eval, params_eval)

The error:

Traceback (most recent call last):
  File "/mnt/home/x/projects/Y/test_x.py", line 10, in <module>
    ms = get_module_summary(s, module_args=(torch.randn(10, 100, 10, dtype=torch.float32),))
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torcheval/tools/module_summary.py", line 348, in get_module_summary
    module_summary_data = _get_module_flops_and_activation_sizes(
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torcheval/tools/module_summary.py", line 258, in _get_module_flops_and_activation_sizes
    res = module(*module_args, **module_kwargs)
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1215, in _call_impl
    hook_result = hook(self, input, result)
  File "/mnt/home/x/miniconda3/lib/python3.9/site-packages/torcheval/tools/flops.py", line 306, in f
    assert parents[-1] == name
AssertionError

Versions

Collecting environment information...
PyTorch version: 1.13.1
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: glibc-2.17

Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB

Nvidia driver version: 530.30.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7742 64-Core Processor
Stepping: 0
CPU MHz: 2250.000
CPU max MHz: 2250.0000
CPU min MHz: 1500.0000
BogoMIPS: 4491.63
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-15
NUMA node1 CPU(s): 16-31
NUMA node2 CPU(s): 32-47
NUMA node3 CPU(s): 48-63
NUMA node4 CPU(s): 64-79
NUMA node5 CPU(s): 80-95
NUMA node6 CPU(s): 96-111
NUMA node7 CPU(s): 112-127
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca

Versions of relevant libraries:
[pip3] flake8==3.7.9
[pip3] mypy==0.971
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.5
[pip3] pytorch-lightning==2.0.0
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.13.1
[pip3] torch-complex==0.4.3
[pip3] torch-optimizer==0.3.0
[pip3] torch-stoi==0.1.2
[pip3] torch-tb-profiler==0.4.0
[pip3] torchaudio==0.13.1
[pip3] torchdata==0.4.1
[pip3] torcheval==0.0.6
[pip3] torchinfo==1.7.2
[pip3] torchmetrics==0.11.4
[pip3] torchtnt==0.0.7
[pip3] torchvision==0.14.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.6.0 hecad31d_10 conda-forge
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.23.5 py39h14f4228_0
[conda] numpy-base 1.23.5 py39h31eccc5_0
[conda] pytorch 1.13.1 py3.9_cuda11.6_cudnn8.3.2_0 pytorch
[conda] pytorch-cuda 11.6 h867d48c_1 pytorch
[conda] pytorch-lightning 2.0.0 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch-complex 0.4.3 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torch-stoi 0.1.2 pypi_0 pypi
[conda] torch-tb-profiler 0.4.0 pypi_0 pypi
[conda] torchaudio 0.13.1 py39_cu116 pytorch
[conda] torchdata 0.4.1 pypi_0 pypi
[conda] torcheval 0.0.6 pypi_0 pypi
[conda] torchinfo 1.7.2 pypi_0 pypi
[conda] torchmetrics 0.11.4 pypi_0 pypi
[conda] torchtnt 0.0.7 pypi_0 pypi
[conda] torchvision 0.14.1 py39_cu116 pytorch

Mask applied twice for metrics with 'weighted' average option

๐Ÿ› Describe the bug

Hi, I wonder if I caught a small bug. When using the "weighted" average option for both multi-class f1 and recall, it appears the mask is applied twice. First when applying the mask, then when doing the weighted average. I could be wrong. Please let me know if this is the case. I can submit a PR, or feel free to do it.

This is coming up when the num_classes is larger than the number of unique values in both input and target.

Unweighted works fine:

import torch
from torcheval.metrics.functional import multiclass_f1_score
input = torch.tensor([0, 2, 1, 4])
target = torch.tensor([0, 1, 2, 4])
multiclass_f1_score(input, target, num_classes=5)

tensor(0.5000)

This is fine

input = torch.tensor([0, 2, 1, 4])
target = torch.tensor([0, 1, 2, 3])
multiclass_f1_score(input, target, num_classes=5, average='weighted')

tensor(0.2500)

This is not

input = torch.tensor([0, 2, 1, 4])
target = torch.tensor([0, 1, 2, 4])
multiclass_f1_score(input, target, num_classes=5, average='weighted')

WARNING:root:Warning: Some classes do not exist in the target. F1 scores for these classes will be cast to zeros.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[15], line 3
      1 input = torch.tensor([0, 2, 1, 4])
      2 target = torch.tensor([0, 1, 2, 4])
----> 3 multiclass_f1_score(input, target, num_classes=5, average='weighted')

File /python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /python3.11/site-packages/torcheval/metrics/functional/classification/f1_score.py:115, in multiclass_f1_score(input, target, num_classes, average)
    111 _f1_score_param_check(num_classes, average)
    112 num_tp, num_label, num_prediction = _f1_score_update(
    113     input, target, num_classes, average
    114 )
--> 115 return _f1_score_compute(num_tp, num_label, num_prediction, average)

File /python3.11/site-packages/torcheval/metrics/functional/classification/f1_score.py:228, in _f1_score_compute(num_tp, num_label, num_prediction, average)
    226     return f1.mean()
    227 elif average == "weighted":
--> 228     return (f1 * (num_label[mask] / num_label.sum())).sum()
    229 else:  # average is None
    230     return f1

IndexError: The shape of the mask [5] at index 0 does not match the shape of the indexed tensor [4] at index 0

Versions

NA

Updating `Mean` with 0 leads to 'No calls to update() have been made...' warning

๐Ÿ› Describe the bug

When updating a Mean metric with a zero, calling compute() generates warning that no calls to update() have been made.

from torcheval import metrics
import torch


mean = metrics.Mean()
mean.update(torch.tensor(0.0))

# Generates the warning 'No calls to update() have been made.'
mean.compute()

While the result is obviously correct, the warning is not. It suggests that the code that should be updating the metric is not in fact updating it. Yet the metric is updated as it should be and ergo it should not complain.

Versions

The bug is present in the latest commit to main: a975ef6.

Collecting environment information...
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Manjaro Linux (x86_64)
GCC version: (GCC) 13.2.1 20230801
Clang version: 16.0.6
CMake version: version 3.27.5
Libc version: glibc-2.38

Python version: 3.11.5 (main, Sep  2 2023, 14:16:33) [GCC 13.2.1 20230801] (64-bit runtime)
Python platform: Linux-6.4.16-1-MANJARO-x86_64-with-glibc2.38
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060 6GB
Nvidia driver version: 535.104.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      43 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             12
On-line CPU(s) list:                0-11
Vendor ID:                          AuthenticAMD
Model name:                         AMD Ryzen 5 2600X Six-Core Processor
CPU family:                         23
Model:                              8
Thread(s) per core:                 2
Core(s) per socket:                 6
Socket(s):                          1
Stepping:                           2
BogoMIPS:                           7201.75
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization:                     AMD-V
L1d cache:                          192 KiB (6 instances)
L1i cache:                          384 KiB (6 instances)
L2 cache:                           3 MiB (6 instances)
L3 cache:                           16 MiB (2 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-11
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; untrained return thunk; SMT vulnerable
Vulnerability Spec rstack overflow: Mitigation; safe RET
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] torch==2.0.1
[pip3] torcheval==0.0.7
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0
[conda] Could not collect

More precise definition of perplexity when ignore index is not None

When we compute perplexity, we usually aggregate all token level nlls and divide by the number of tokens.

It is simple and straight forward to implement only when we attend all tokens into the computations.

If we allow ignore index, I think the implementation provides different values that is not the perplexity.

Let's think when the input ids has two different-length inputs.

16,22,14,-100,-100
88,74,69,87,77

And the token-level loss (I just masked out ignored tokens)

0.1,0.2,0.1,0.0,0.0
0.3,0.5,0.1,0.4,0.2

Then, the concept of ppl is defined in sequence-level, the perplexity for each sequence moght be the following.

Exp(0.4/3)=a
Exp(1.5/5)=b

And then, reduce by mean.

(a+b)/2

However, to the best of my knowledge, your implementation would give us the following.

Exp((0.4+1.5)/8)

If you think the definition of perplexity when the ignore index is involved is the last one, it doesn't matter.

However, I think that the former is much way precise implementation of perplexity.

What do you think?

Error in masking in the function multiclass_recall

๐Ÿ› Describe the bug

I think I found a bug at around this part of the code in the functional multiclass_recall. When one of the class is missing for both prediction and label, only the num_tp is masked and not the num_labels, which causes a mismatch between the shape of num_tp and num_labels. For example,

import torch
from torcheval.metrics.functional import multiclass_recall

pred = torch.tensor([0,1,2,5])
label = torch.tensor([0,2,1,3])

num_class = 6

multiclass_recall(pred, label, num_classes=num_class, average="macro")

will get an error of

Traceback (most recent call last):
  File "playground.py", line 1205, in <module>
    multiclass_recall(input1, label1, num_classes=num_class, average="macro")
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torcheval/metrics/functional/classification/recall.py", line 151, in multiclass_recall
    return _recall_compute(num_tp, num_labels, num_predictions, average)
  File "/opt/conda/lib/python3.7/site-packages/torcheval/metrics/functional/classification/recall.py", line 193, in _recall_compute
    recall = num_tp / num_labels
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 0

Versions

Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.17

Python version: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.19.91-26.al7.x86_64-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: 11.3.109
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 470.103.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
Stepping: 4
CPU MHz: 2499.998
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat

Versions of relevant libraries:
[pip3] mypy==1.2.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torcheval==0.0.6
[pip3] torchtext==0.13.1
[pip3] torchtnt==0.0.7
[pip3] torchvision==0.13.1
[pip3] triton==2.0.0.post1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 ha36c431_9 nvidia
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.5 py37he7a7128_2
[conda] numpy-base 1.21.5 py37hf524024_2
[conda] pytorch 1.12.1 py3.7_cuda11.3_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torcheval 0.0.6 pypi_0 pypi
[conda] torchtext 0.13.1 py37 pytorch
[conda] torchtnt 0.0.7 pypi_0 pypi
[conda] torchvision 0.13.1 py37_cu113 pytorch
[conda] triton 2.0.0.post1 pypi_0 pypi

The score computed by `multiclass_f1_score` for binary classification is wrong. It is not f1 score but accuracy.

๐Ÿ› Describe the bug

The score computed by multiclass_f1_score for binary classification is wrong. It is not f1 score but accuracy, as shown in following code:

import torch
from torcheval.metrics.functional import multiclass_f1_score, binary_f1_score

actual = torch.repeat_interleave(torch.tensor([1, 0]), repeats=torch.tensor([100, 100]))
pred = torch.repeat_interleave(torch.tensor([1, 0, 1, 0]), repeats=torch.tensor([55, 45, 34, 66]))

multiclass_f1_score(pred, actual, num_classes=2)
# tensor(0.6050)

(actual == pred).sum()/200
# tensor(0.6050)

binary_f1_score(pred, actual)
# tensor(0.5820)

Versions

torcheval 0.0.7

[Proposal] Verify that Input Tensor is a probability in classification metrics

๐Ÿš€ The feature

Add the verification to a series of functions like _update_input_check in clasification metrics that input tensor is a probability, i.e., all elements satisfy 0 <= x <= 1.

Motivation, pitch

Hi torcheval community!
This is an enhancement suggestions I came up with while using torcheval for an image classification project.

Here is a binary_accuracy(classification) documentation. Judging from the default argument threshold = 0.5, torcheval assumes that input is a probability. However, there is no statement or intent in the documentation that input must be a probability. There is no validation in _input_check functions, too.

There is a problem when the model does not include the Sigmoid or Softmax layer. This can often happen when they use torch Loss classes like BCEWithLogitLoss or CrossEntropyLoss, which include sigmoid/softmax functions in themselves.

Here is a code including a bug.

class Trainer:
    def __init__(self, model: nn.Module):
        self._loss = nn.BCEWithLogitLoss() # performs `sigmoid` in itself
        self._model = model # model does not end with `sigmoid` layer.
...
    def train(self, batch):
        img, correct_label = batch
        out = self._model(img) 
        loss = self._loss(out, correct_label)
        
        # out is not 'probability', but `torcheval.binary_accuracy` function can be run.
        acc = binary_accuracy(out, correct_label)

IMO, this wrong code can be written in a natural manner, because the scope of responsibility of binary_accuracy function is unclear.
Clarifying the responsibility of classification metric funcions in torcheval will remove these bug cases.

Alternatives

Approach 1: Limit use cases by including a cautionary note in the Document

Adding note that "input is assumed to be a probability" to doc.

Pros: easy, no effect for existing code.
Cons: user must be responsible that input is a probability.

Approach 2: Adding some assert codes

Adding a verification 0 <= input <= 1 to _input_check family.

Pros: Confusion written in the Background section can be avoided.
Cons: This needs to be implemented. may have effects on existing codes.

If this proposal is deemed to be viable, I am willing to contribute it!.

Thank you.

Additional context

No response

Add CTR metric

๐Ÿš€ The feature

Objective
We want to implement a functional metric and class metric to calculate CTR (click through rate) based on inputs
CTR = sum( I(click) * weight) / sum(weights)

Examples:

input: torch.tensor([0, 1, 0, 1, 1, 0, 0, 1])
weight: 1.0 (default value)
ctr: torch.tensor(0.5)

input: torch.tensor([0, 1, 0, 1, 1, 0, 0, 1])
weight: torch.tensor([1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0])
ctr: torch.tensor(0.58333)

For multi-tasks situation
input: torch.tensor([[0, 1, 0, 1], [1, 0, 0, 1]])
weight: 1.0 (default value)
ctr: torch.tensor([0.5, 0.05])

Motivation, pitch

Click-through rates are important for evaluating model performance and a commonly expected out-of-the-box metric someone might want to get started with.

Alternatives

No response

Additional context

No response

Docs return description of binary_confusion_matrix incorrect

Not sure if I'm being daft here but I think this line is incorrect:

Compute binary confusion matrix, a 2 by 2 tensor with counts ( (true positive, false negative) , (false positive, true negative) )

It says that the returned tensor contains the values:

( (true positive, false negative) , (false positive, true negative) )

But if you look at the examples, it shows (e.g.):

>>> input = torch.tensor([0, 1, 0.7, 0.6])
>>> target = torch.tensor([0, 1, 1, 0])
>>> binary_confusion_matrix(input, target)
        tensor([[1, 1],
                [0, 2]])

Which for those inputs, I count:

tn = 1 #indices = 0
tp = 2 #indices = 1,2
fp = 1 #indices = 3
fn = 0

So I believe that would mean the actual tensor being returned is either [[fp, tn], [fn, tp]] or [[tn, fp], [fn, tp]]. From my own experiments, I'm pretty sure it's the latter.

Been scratching my head all morning about why my results look wrong and I think this is why.

Torcheval pointing to wrong directory for nvrtc-builtins64_121.dll file.

๐Ÿ› Describe the bug

Attempting to use Torcheval r2_score, but isn't finding the file in Conda environment. I had to copy the file from anaconda3/envs/{myenv}/bin to the local GPU computing toolkit/CUDA/V12.4/bin for it to work. Local GPU toolkit was downloaded from NVIDIA

import torch
from torcheval.metrics.functional import r2_score
# code training lope
# prediction == <class 'torch.Tensor'>
#target == <class 'torch.Tensor'>
r2 = r2_score(prediction, target)
# remaining code 

traceback

			"outputType": "error",
			"originalError": {
				"output_type": "error",
				"ename": "RuntimeError",
				"evalue": "nvrtc: error: failed to open nvrtc-builtins64_121.dll.\n  Make sure that nvrtc-builtins64_121.dll is installed correctly.\nnvrtc compilation failed: \n\n#define NAN __int_as_float(0x7fffffff)\n#define POS_INFINITY __int_as_float(0x7f800000)\n#define NEG_INFINITY __int_as_float(0xff800000)\n\n\ntemplate<typename T>\n__device__ T maximum(T a, T b) {\n  return isnan(a) ? a : (a > b ? a : b);\n}\n\ntemplate<typename T>\n__device__ T minimum(T a, T b) {\n  return isnan(a) ? a : (a < b ? a : b);\n}\n\nextern \"C\" __global__\nvoid fused_sub_div_neg_add(float* trss_1, float* tsum_squared_obs_1, float* tv_, float* aten_add, float* aten_sub) {\n{\n  float tsum_squared_obs_1_1 = __ldg(tsum_squared_obs_1 + 0ll);\n  float tv__1 = __ldg(tv_ + 0ll);\n  aten_sub[0ll] = tsum_squared_obs_1_1 - tv__1;\n  float v = __ldg(trss_1 + 0ll);\n  aten_add[0ll] = (0.f - v / (tsum_squared_obs_1_1 - tv__1)) + 1.f;\n}\n}\n",
				"traceback": [
					"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
					"\u001b[1;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
					"Cell \u001b[1;32mIn[16], line 62\u001b[0m\n\u001b[0;32m     59\u001b[0m \u001b[38;5;66;03m#################################   \u001b[39;00m\n\u001b[0;32m     60\u001b[0m    \u001b[38;5;66;03m#r2 score\u001b[39;00m\n\u001b[0;32m     61\u001b[0m    \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;28mtype\u001b[39m(prediction))\n\u001b[1;32m---> 62\u001b[0m    R2 \u001b[38;5;241m=\u001b[39m \u001b[43mr2_score\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprediction\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtarget\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     63\u001b[0m    train_r2_list\u001b[38;5;241m.\u001b[39mappend(R2) \u001b[38;5;66;03m# all scores recorded\u001b[39;00m\n\u001b[0;32m     64\u001b[0m    mean_train_epoch_r2 \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m R2 \u001b[38;5;66;03m# per batch list\u001b[39;00m\n",
					"File \u001b[1;32mc:\\Users\\snedd\\anaconda3\\envs\\ptorch\\Lib\\site-packages\\torch\\utils\\_contextlib.py:115\u001b[0m, in \u001b[0;36mcontext_decorator.<locals>.decorate_context\u001b[1;34m(*args, **kwargs)\u001b[0m\n\u001b[0;32m    112\u001b[0m \u001b[38;5;129m@functools\u001b[39m\u001b[38;5;241m.\u001b[39mwraps(func)\n\u001b[0;32m    113\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mdecorate_context\u001b[39m(\u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[0;32m    114\u001b[0m     \u001b[38;5;28;01mwith\u001b[39;00m ctx_factory():\n\u001b[1;32m--> 115\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
					"File \u001b[1;32mc:\\Users\\snedd\\anaconda3\\envs\\ptorch\\Lib\\site-packages\\torcheval\\metrics\\functional\\regression\\r2_score.py:79\u001b[0m, in \u001b[0;36mr2_score\u001b[1;34m(input, target, multioutput, num_regressors)\u001b[0m\n\u001b[0;32m     75\u001b[0m _r2_score_param_check(multioutput, num_regressors)\n\u001b[0;32m     76\u001b[0m sum_squared_obs, sum_obs, sum_squared_residual, num_obs \u001b[38;5;241m=\u001b[39m _r2_score_update(\n\u001b[0;32m     77\u001b[0m     \u001b[38;5;28minput\u001b[39m, target\n\u001b[0;32m     78\u001b[0m )\n\u001b[1;32m---> 79\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_r2_score_compute\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m     80\u001b[0m \u001b[43m    \u001b[49m\u001b[43msum_squared_obs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     81\u001b[0m \u001b[43m    \u001b[49m\u001b[43msum_obs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     82\u001b[0m \u001b[43m    \u001b[49m\u001b[43msum_squared_residual\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     83\u001b[0m \u001b[43m    \u001b[49m\u001b[43mnum_obs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     84\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmultioutput\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     85\u001b[0m \u001b[43m    \u001b[49m\u001b[43mnum_regressors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     86\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
					"File \u001b[1;32mc:\\Users\\snedd\\anaconda3\\envs\\ptorch\\Lib\\site-packages\\torcheval\\metrics\\functional\\regression\\r2_score.py:126\u001b[0m, in \u001b[0;36m_r2_score_compute\u001b[1;34m(sum_squared_obs, sum_obs, rss, num_obs, multioutput, num_regressors)\u001b[0m\n\u001b[0;32m    121\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m num_regressors \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m num_obs \u001b[38;5;241m-\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[0;32m    122\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[0;32m    123\u001b[0m         \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mThe `num_regressors` must be smaller than n_samples - 1, \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    124\u001b[0m         \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mgot num_regressors=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mnum_regressors\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, n_samples=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mnum_obs\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m    125\u001b[0m     )\n\u001b[1;32m--> 126\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_compute\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m    127\u001b[0m \u001b[43m    \u001b[49m\u001b[43msum_squared_obs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    128\u001b[0m \u001b[43m    \u001b[49m\u001b[43msum_obs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    129\u001b[0m \u001b[43m    \u001b[49m\u001b[43mrss\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    130\u001b[0m \u001b[43m    \u001b[49m\u001b[43mnum_obs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    131\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmultioutput\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    132\u001b[0m \u001b[43m    \u001b[49m\u001b[43mnum_regressors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    133\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
					"\u001b[1;31mRuntimeError\u001b[0m: nvrtc: error: failed to open nvrtc-builtins64_121.dll.\n  Make sure that nvrtc-builtins64_121.dll is installed correctly.\nnvrtc compilation failed: \n\n#define NAN __int_as_float(0x7fffffff)\n#define POS_INFINITY __int_as_float(0x7f800000)\n#define NEG_INFINITY __int_as_float(0xff800000)\n\n\ntemplate<typename T>\n__device__ T maximum(T a, T b) {\n  return isnan(a) ? a : (a > b ? a : b);\n}\n\ntemplate<typename T>\n__device__ T minimum(T a, T b) {\n  return isnan(a) ? a : (a < b ? a : b);\n}\n\nextern \"C\" __global__\nvoid fused_sub_div_neg_add(float* trss_1, float* tsum_squared_obs_1, float* tv_, float* aten_add, float* aten_sub) {\n{\n  float tsum_squared_obs_1_1 = __ldg(tsum_squared_obs_1 + 0ll);\n  float tv__1 = __ldg(tv_ + 0ll);\n  aten_sub[0ll] = tsum_squared_obs_1_1 - tv__1;\n  float v = __ldg(trss_1 + 0ll);\n  aten_add[0ll] = (0.f - v / (tsum_squared_obs_1_1 - tv__1)) + 1.f;\n}\n}\n"
				]
			}

Versions

PyTorch version: 2.2.2
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:28:07) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-11-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.4.131
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070
Nvidia driver version: 552.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3792
DeviceID=CPU0
Family=198
L2CacheSize=2048
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3792
Name=Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.2
[pip3] torchaudio==2.2.2
[pip3] torcheval==0.0.7
[pip3] torchvision==0.17.2
[conda] blas                      1.0                         mkl
[conda] mkl                       2023.1.0         h6b88ed4_46358
[conda] mkl-service               2.4.0           py312h2bbff1b_1
[conda] mkl_fft                   1.3.8           py312h2bbff1b_0
[conda] mkl_random                1.2.4           py312h59b6b97_0
[conda] numpy                     1.26.4          py312hfd52020_0
[conda] numpy-base                1.26.4          py312h4dde369_0
[conda] pytorch                   2.2.2           py3.12_cuda12.1_cudnn8_0    pytorch
[conda] pytorch-cuda              12.1                 hde6ce7c_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                2.2.2                    pypi_0    pypi
[conda] torcheval                 0.0.7                    pypi_0    pypi
[conda] torchvision               0.17.2                   pypi_0    pypi

Comparison to torchmetrics

Hello! torcheval looks great!

I'd be interested to know how torcheval compares to torchmetrics. Are there certain shortcomings in torchmetrics that torcheval hopes to address? Any other insights into what inspired the creation of torcheval might help users understand what makes this project unique ๐Ÿ˜„

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.