GithubHelp home page GithubHelp logo

yyzharry / imbalanced-regression Goto Github PK

View Code? Open in Web Editor NEW
801.0 19.0 128.0 9.43 MB

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Home Page: http://dir.csail.mit.edu

License: MIT License

Python 84.81% Jupyter Notebook 15.19%
imbalanced-data imbalanced-learning imbalanced-classification imbalance regression long-tail imbalanced-regression icml icml-2021 computer-vision natural-language-processing healthcare

imbalanced-regression's Introduction

Delving into Deep Imbalanced Regression

This repository contains the implementation code for paper:
Delving into Deep Imbalanced Regression
Yuzhe Yang, Kaiwen Zha, Ying-Cong Chen, Hao Wang, Dina Katabi
38th International Conference on Machine Learning (ICML 2021), Long Oral
[Project Page] [Paper] [Video] [Blog Post]



Deep Imbalanced Regression (DIR) aims to learn from imbalanced data with continuous targets,
tackle potential missing data for certain regions, and generalize to the entire target range.

Beyond Imbalanced Classification: Brief Introduction for DIR

Existing techniques for learning from imbalanced data focus on targets with categorical indices, i.e., the targets are different classes. However, many real-world tasks involve continuous and even infinite target values. We systematically investigate Deep Imbalanced Regression (DIR), which aims to learn continuous targets from natural imbalanced data, deal with potential missing data for certain target values, and generalize to the entire target range.

We curate and benchmark large-scale DIR datasets for common real-world tasks in computer vision, natural language processing, and healthcare domains, ranging from single-value prediction such as age, text similarity score, health condition score, to dense-value prediction such as depth.

Usage

We separate the codebase for different datasets into different subfolders. Please go into the subfolders for more information (e.g., installation, dataset preparation, training, evaluation & models).

Highlights

(1) ✔️ New Task: Deep Imbalanced Regression (DIR)

(2) ✔️ New Techniques:

image image
Label distribution smoothing (LDS) Feature distribution smoothing (FDS)

(3) ✔️ New Benchmarks:

  • Computer Vision: 💡 IMDB-WIKI-DIR (age) / AgeDB-DIR (age) / NYUD2-DIR (depth)
  • Natural Language Processing: 📋 STS-B-DIR (text similarity score)
  • Healthcare: 🏥 SHHS-DIR (health condition score)
IMDB-WIKI-DIR AgeDB-DIR NYUD2-DIR STS-B-DIR SHHS-DIR
image image image image image

Apply LDS and FDS on Other Datasets / Models

We provide examples of how to apply LDS and FDS on other customized datasets and/or models.

LDS

To apply LDS on your customized dataset, you will first need to estimate the effective label distribution:

from collections import Counter
from scipy.ndimage import convolve1d
from utils import get_lds_kernel_window

# preds, labels: [Ns,], "Ns" is the number of total samples
preds, labels = ..., ...
# assign each label to its corresponding bin (start from 0)
# with your defined get_bin_idx(), return bin_index_per_label: [Ns,] 
bin_index_per_label = [get_bin_idx(label) for label in labels]

# calculate empirical (original) label distribution: [Nb,]
# "Nb" is the number of bins
Nb = max(bin_index_per_label) + 1
num_samples_of_bins = dict(Counter(bin_index_per_label))
emp_label_dist = [num_samples_of_bins.get(i, 0) for i in range(Nb)]

# lds_kernel_window: [ks,], here for example, we use gaussian, ks=5, sigma=2
lds_kernel_window = get_lds_kernel_window(kernel='gaussian', ks=5, sigma=2)
# calculate effective label distribution: [Nb,]
eff_label_dist = convolve1d(np.array(emp_label_dist), weights=lds_kernel_window, mode='constant')

With the estimated effective label distribution, one straightforward option is to use the loss re-weighting scheme:

from loss import weighted_mse_loss

# Use re-weighting based on effective label distribution, sample-wise weights: [Ns,]
eff_num_per_label = [eff_label_dist[bin_idx] for bin_idx in bin_index_per_label]
weights = [np.float32(1 / x) for x in eff_num_per_label]

# calculate loss
loss = weighted_mse_loss(preds, labels, weights=weights)

FDS

To apply FDS on your customized data/model, you will first need to define the FDS module in your network:

from fds import FDS

config = dict(feature_dim=..., start_update=0, start_smooth=1, kernel='gaussian', ks=5, sigma=2)

def Network(nn.Module):
    def __init__(self, **config):
        super().__init__()
        self.feature_extractor = ...
        self.regressor = nn.Linear(config['feature_dim'], 1)  # FDS operates before the final regressor
        self.FDS = FDS(**config)

    def forward(self, inputs, labels, epoch):
        features = self.feature_extractor(inputs)  # features: [batch_size, feature_dim]
        # smooth the feature distributions over the target space
        smoothed_features = features    
        if self.training and epoch >= config['start_smooth']:
            smoothed_features = self.FDS.smooth(smoothed_features, labels, epoch)
        preds = self.regressor(smoothed_features)
        
        return {'preds': preds, 'features': features}

During training, you will need to update the FDS statistics after each training epoch:

model = Network(**config)

for epoch in range(num_epochs):
    for (inputs, labels) in train_loader:
        # standard training pipeline
        ...

    # update FDS statistics after each training epoch
    if epoch >= config['start_update']:
        # collect features and labels for all training samples
        ...
        # training_features: [num_samples, feature_dim], training_labels: [num_samples,]
        training_features, training_labels = ..., ...
        model.FDS.update_last_epoch_stats(epoch)
        model.FDS.update_running_stats(training_features, training_labels, epoch)

Updates

  • [06/2021] We provide a hands-on tutorial of DIR. Check it out!
  • [05/2021] We create a Blog post for this work (version in Chinese is also available here). Check it out for more details!
  • [05/2021] Paper accepted to ICML 2021 as a Long Talk. We have released the code and models. You can find all reproduced checkpoints via this link, or go into each subfolder for models for each dataset.
  • [02/2021] arXiv version posted. Please stay tuned for updates.

Citation

If you find this code or idea useful, please cite our work:

@inproceedings{yang2021delving,
  title={Delving into Deep Imbalanced Regression},
  author={Yang, Yuzhe and Zha, Kaiwen and Chen, Ying-Cong and Wang, Hao and Katabi, Dina},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2021}
}

Contact

If you have any questions, feel free to contact us through email ([email protected] & [email protected]) or Github issues. Enjoy!

imbalanced-regression's People

Contributors

eliasmgprado avatar kaiwenzha avatar yyzharry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imbalanced-regression's Issues

Questions about smoothing function

Dear Author,
I noticed that in FDS.smooth code,there exists:
"feature[labels == label] = self.calibrate_mean_var(feature[labels == label], ...)"
Does the operation an inplace operation when backwards? features receive gradient from regressor decoder and backward.This inplace operation may have error?
Looking forward for your reply!Thanks!
I found in 2021,some issues also pointed out the potential bug when apllied FDS to their work.I think we could easily use torch.masked_scatter to fix the problem like that:
features = torch.masked_scatter(features,(labels == label).unsqueeze(1).repeat(1,features.shape[1]),self.calibrate_mean_var( features[labels == label], self.running_mean_last_epoch[int(label - self.bucket_start)], self.running_var_last_epoch[int(label - self.bucket_start)], self.smoothed_mean_last_epoch[int(label - self.bucket_start)], self.smoothed_var_last_epoch[int(label - self.bucket_start)]))
Is this right?

Using FDS in ML project

Hello, in your paper on the problem of deep imbalance regression, I have the privilege of learning about the smoothing methods of LDS and FDS. In one of my machine learning projects predicting convective cloud precipitation, I wanted to use FDS to play a role in it because of the imbalance between non-precipitation samples and precipitation samples. I wonder how to smooth the feature statistic without knowing its label in the test set, in my data, my data is very unbalanced (70% of the data without precipitation), which results in the characteristic statistics of each label interval are particularly similar (about 98%), so that the smoothing effect is still not significant, and if there are some important points to pay attention to if using FDS in machine learning?

Gaussian kernel size and SQRT inverse reweighting for LDS

Hi, congratulations for your ICML paper, it sounds very useful and I loved the insight of Figure 2. I am trying to implement the paper right now in one of my projects. I have a couple questions regarding LDS if you don't mind me asking here.

First, I am a bit puzzled at this line in your code:

kernel_window = gaussian_filter1d(base_kernel, sigma=sigma) / max(gaussian_filter1d(base_kernel, sigma=sigma))

If I understand correctly, you are using gaussian_filter1d to create a gaussian kernel of a small size (e.g. 5 in the paper) and then you convolve this with the label distribution using convolve1d. But isn't gaussian_filter1d supposed to do this (with the full window as the kernel) in the first place? Looking on the Internet I find that the reason why people use small gaussian kernels in e.g. image processing is usually computational : after a width of about 3 standard deviation, a larger kernel would be useless. However, in the paper, it appears that you actually get better results with small kernels? Could you elaborate on this a little bit please?

My second question is about this line:

value_dict = {k: np.sqrt(v) for k, v in value_dict.items()}

In the paper, I found a place where you talk about reweighting the loss by something proportional to the inverse of the smoothed label distribution (Algorithm 1 in the appendix), but nothing about this reweighting by the inverse sqrt as you seem to be doing here by default. Could you also elaborate a bit on this, please?

Thank you for your time!

The problem of regression to the pixel value of the picture

I have the honor to read your paper, which is very solid. Now I am doing a task. label is a 256 * 256 picture, which needs to predict each pixel value of it (unbalanced continuous value between 0 and 1). Now I want to apply your LDS strategy, but my task is slightly different from yours. Although my data set is only 1W pieces, if each pixel is regarded as a label, the label will be very large(256 * 256 *10000). Do you think your method is still applicable?
I did an experiment where I calculated the value distribution for each label(256 * 256 images) separately and then applied the LDS strategy. It didn't work out too well.

The covariance and variance in FDS

In the papaer, is defined as the covariance int target bin. But in the code "utils.py", " fds.py", the variable used is variance not the covariance.

def calibrate_mean_var(matrix, m1, v1, m2, v2, clip_min=0.1, clip_max=10):
    if torch.sum(v1) < 1e-10:
        return matrix
    if (v1 == 0.).any():
        valid = (v1 != 0.)
        factor = torch.clamp(v2[valid] / v1[valid], clip_min, clip_max)
        matrix[:, valid] = (matrix[:, valid] - m1[valid]) * torch.sqrt(factor) + m2[valid]
        return matrix

    factor = torch.clamp(v2 / v1, clip_min, clip_max)
    return (matrix - m1) * torch.sqrt(factor) + m2

def update_running_stats(self, features, labels, epoch):
        if epoch < self.epoch:
            return

        assert self.feature_dim == features.size(1), "Input feature dimension is not aligned!"
        assert features.size(0) == labels.size(0), "Dimensions of features and labels are not aligned!"

        for label in torch.unique(labels):
            if label > self.bucket_num - 1 or label < self.bucket_start:
                continue
            elif label == self.bucket_start:
                curr_feats = features[labels <= label]
            elif label == self.bucket_num - 1:
                curr_feats = features[labels >= label]
            else:
                curr_feats = features[labels == label]
            curr_num_sample = curr_feats.size(0)
            curr_mean = torch.mean(curr_feats, 0)
            curr_var = torch.var(curr_feats, 0, unbiased=True if curr_feats.size(0) != 1 else False)

            self.num_samples_tracked[int(label - self.bucket_start)] += curr_num_sample
            factor = self.momentum if self.momentum is not None else \
                (1 - curr_num_sample / float(self.num_samples_tracked[int(label - self.bucket_start)]))
            factor = 0 if epoch == self.start_update else factor
            self.running_mean[int(label - self.bucket_start)] = \
                (1 - factor) * curr_mean + factor * self.running_mean[int(label - self.bucket_start)]
            self.running_var[int(label - self.bucket_start)] = \
                (1 - factor) * curr_var + factor * self.running_var[int(label - self.bucket_start)]

And in the paper "Return of frustratingly easy domain adaptation" ,in the whitening and recoloring procedure, the covariance is used. Which one should i use in feature statistics calibration?
Or , maybe i got it wrong. The variable v1, v2 in "calibrate_mean_var" function is the covariance of target bin?

how to update statistics in FDS

when update statistics in FDS, do we need to calculate the mean and variance of all sample in each bin after every epoch finishes? do we need to update the sum of z and z^2 after each mini-batch?

Implementation of SMOGN and RRT for deep regression

Hi,

Could you help clarify your implementation for SMOGN and RRT in the paper?

  • For SMOGN, did you do the interpolation based on raw images? I.e. simply interpolating between the raw pixel values?
  • For RRT, how did you train the encoder step? Do you train the entire model first, and then freeze the encoder and only train the last layer?

Thanks for clarifying!

Question about validation

Dear Author:
I hope this issue finds you well! First of all, your work is constructive.
I want to use FDS in my project but I found so many bins are empty near bins which minority samples located. Could the method be used in such case?
Also, i find a possible more correct sequence for running the code. Why FDS.update_running_stats is after FDS.update_last_epoch_stats? if FDS.update_running_stats is after FDS.update_last_epoch_stats, it leads model uses smoothed running statistics the epoch before last epoch but not the running statistics updated last epoch. So if it is correct to exchange the sequence of FDS.update_running_stats and FDS.update_last_epoch_stats?
Looking forward for your reply.Thanks!

About SHHS-DIR Code

hi,SHHS-DIR 的数据需要授权,不便上传。论文中基于该数据集的相关代码方便上传下吗?
谢谢

bucket_num bucket_start

Would you please tell me how to choose the best bucket_num and bucket_start parameters?The range of labels in my dataset is (-2, 16)

Hi @YyzHarry,

Hi @YyzHarry,
I want to use your code of LDS to solve my problem,now I have a question to ask you:whether the input data format of LDS must be csv?Is npz format data OK?can it be applied to high-dimensional data? I would appreciate it if you could give me some guidance.

move get_transform to __init__

Hi,

Thank you for your great work.

A little suggestion, probably you can move the data transforms to the init of the dataset class instead of invoking the get_transform each time get data.

I think this would be better to improve the speed of preparing the data and hope this is helpful for you.

Thank you.

Ningtao

School of AI, Xidian University, China
Robarts Research Institute, Western University, Canada

FDS performs worse than vanilla

Hi !

Thank you for sharing your work on LDS and FDS. It realy inspired me in my work.

I'm trying to predict price (normalized with Z-score) given features about restaurant and delivery (time, location, distance , etc.). My data is imbalanced and I want my model to perform good whatever the price value is.
To help me understand your work, I re-implemented LDS and FDS (based on your advice). I can see LDS improve my model performance but FDS did worse (see graph below).
newplot(1)

Moreover, when I look how statistics are smoothed, I can see that LDS smooths better than FDS (see graph below).
newplot(2)

Did you ever experienced cases/data where FDS performs worse than vanilla ?

Thank you in advance for your help !

My code:

# imports
import time

import numpy as np
import torch
import torch.nn.functional as F

from pandas import DataFrame, Series
from scipy.ndimage import convolve1d, gaussian_filter1d
from torch import nn
from torch.utils.data import TensorDataset, DataLoader


# classes
class FDSLayer(nn.Module):
    """

    """
    def __init__(self, input_dim, n_bins: int = 100, kernel_size: int = 5, alpha: float = .9, start_smooth: int = 1):
        super(FDSLayer, self).__init__()
        self.register_buffer('mean', torch.zeros((n_bins, input_dim)))
        self.register_buffer('var', torch.ones((n_bins, input_dim)))
        self.register_buffer('smoothed_mean', torch.zeros((n_bins, input_dim)))
        self.register_buffer('smoothed_var', torch.ones((n_bins, input_dim)))

        self.input_dim = input_dim
        self.alpha = alpha
        self.n_bins = n_bins
        self.kernel_size = kernel_size
        self.half_kernel_size = (kernel_size - 1) // 2
        self.sigma = 2
        self.start_smooth = start_smooth

    def smooth(self, inputs, labels, epoch):
        if epoch < self.start_smooth:
            return inputs
        else:
            labels = labels.squeeze()
            bin_indexes = self._get_bin_indexes(labels)

            factor = torch.clamp(torch.sqrt(self.smoothed_var / self.var), .1, 10)
            return (inputs - self.mean[bin_indexes]) * factor[bin_indexes] + self.smoothed_mean[bin_indexes]

    def _get_bin_indexes(self, labels):
        _, bin_edges = torch.histogram(labels, self.n_bins, density=True)

        return torch.bucketize(labels, bin_edges[1:-1])

    def _get_kernel_window(self):
        base_kernel = [0.] * self.half_kernel_size + [1.] + [0.] * self.half_kernel_size
        kernel_window = gaussian_filter1d(base_kernel, sigma=self.sigma)
        kernel_window = kernel_window / sum(kernel_window)

        return torch.FloatTensor(kernel_window)

    def update_running_stats(self, features, labels):
        labels = labels.squeeze()
        bin_indexes = self._get_bin_indexes(labels)

        new_mean = torch.zeros((self.n_bins, self.input_dim))
        new_var = torch.ones((self.n_bins, self.input_dim))
        for b in torch.unique(bin_indexes):
            a = features[bin_indexes == b]
            if features.size() != 0:
                new_mean[b] = torch.mean(a, dim=0)
                new_var[b] = torch.var(a, dim=0, unbiased=True if a.size(0) != 1 else False)

        self.mean = self.alpha * self.mean + (1 - self.alpha) * new_mean
        self.var = self.alpha * self.var + (1 - self.alpha) * new_var

        fds_kernel_window = self._get_kernel_window()

        smoothed_mean = F.conv1d(F.pad(self.mean.view(1, self.n_bins, self.input_dim).permute(2, 0, 1),
                                       pad=(self.half_kernel_size, self.half_kernel_size),
                                       mode='reflect'),
                                 weight=fds_kernel_window.view(1, 1, -1),
                                 padding=0).permute(2, 0, 1).squeeze()
        self.smoothed_mean = smoothed_mean

        smoothed_var = F.conv1d(F.pad(self.var.view(1, self.n_bins, self.input_dim).permute(2, 0, 1),
                                      pad=(self.half_kernel_size, self.half_kernel_size),
                                      mode='reflect'),
                                weight=fds_kernel_window.view(1, 1, -1),
                                padding=0).permute(2, 0, 1).squeeze()
        self.smoothed_var = smoothed_var


class MLPNetwork(nn.Module):
    def __init__(self, input_dim: int, hidden_units: tuple = (128, 128, 128), lds: bool = False, fds: bool = False,
                 n_bins: int = 20, kernel_size: int = 5, alpha: float = .9, start_smooth: int = 1):
        super(MLPNetwork, self).__init__()

        self.lds = lds
        self.fds = fds
        self.n_bins = n_bins
        self.kernel_size = kernel_size
        self.half_kernel_size = kernel_size // 2
        self.sigma = 2

        input_layer = nn.Linear(input_dim, hidden_units[0])
        self.layers = nn.ModuleList(
            [input_layer] +
            [nn.Linear(hidden_units[i - 1], hidden_units[i]) for i in range(1, len(hidden_units))]
        )
        if self.fds:
            self.fds_layer = FDSLayer(hidden_units[-1], n_bins=n_bins, kernel_size=kernel_size, alpha=alpha,
                                      start_smooth=start_smooth)
        self.output_layer = nn.Linear(hidden_units[-1], 1)

    def forward(self, inputs, labels=None, epoch=None):
        x = inputs
        for layer in self.layers:
            x = torch.relu(layer(x))
        smoothed_features = x.view(x.size(0), -1)
        smoothed_features_ = smoothed_features
        if self.training and self.fds:
            smoothed_features_ = self.fds_layer.smooth(smoothed_features_, labels, epoch)
        x = self.output_layer(smoothed_features_)
        return x, smoothed_features

    def fit(self, inputs, labels, val_inputs, val_labels, epochs: int = 200, batch_size: int = 1024):
        if isinstance(inputs, DataFrame):
            inputs = inputs.values
        if isinstance(val_inputs, DataFrame):
            val_inputs = val_inputs.values
        if isinstance(labels, Series):
            labels = labels.values
        if isinstance(val_labels, Series):
            val_labels = val_labels.values

        # Create train dataloader
        inputs = torch.FloatTensor(inputs)
        labels = torch.FloatTensor(labels)
        if self.lds:
            empirical_label_distribution, bin_edges = torch.histogram(labels, bins=self.n_bins, density=True)
            lds_kernel_window = self._get_kernel_window()
            effective_label_distribution = F.conv1d(F.pad(empirical_label_distribution.view(1, 1, -1),
                                                          pad=(self.half_kernel_size, self.half_kernel_size),
                                                          mode='reflect'),
                                                    weight=lds_kernel_window.view(1, 1, -1),
                                                    padding=0).squeeze()
            weights = 1 / effective_label_distribution[torch.bucketize(labels, bin_edges[1:-1])]
        else:
            weights = torch.ones(labels.size())
        train_dataset = TensorDataset(inputs, labels.view(-1, 1), weights)
        train_dataloader = DataLoader(train_dataset, batch_size, shuffle=True, num_workers=4)

        # Create validation dataloader
        val_inputs = torch.FloatTensor(val_inputs)
        val_labels = torch.FloatTensor(val_labels).view(-1, 1)
        val_dataset = TensorDataset(val_inputs, val_labels)
        val_dataloader = DataLoader(val_dataset, batch_size, shuffle=True, num_workers=4)

        loss_fn = self.weighted_mae_loss
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)

        # Train
        for epoch in range(epochs):
            print(f'Epoch {epoch + 1}/{epochs}')
            t0 = time.time()
            train_loss = 0.

            self.train(True)
            if self.fds:
                latent_feature_record, label_record = [], []
            for i, (train_inputs, train_labels, train_weights) in enumerate(train_dataloader, 1):
                optimizer.zero_grad(set_to_none=True)

                if self.fds:
                    predictions, feat = self(train_inputs, train_labels, epoch)
                    latent_feature_record.extend(feat.data.squeeze().numpy())
                    label_record.extend(train_labels.data.squeeze().numpy())
                else:
                    predictions, _ = self(train_inputs, train_labels)

                loss = loss_fn(predictions, train_labels, train_weights)
                loss.backward()
                optimizer.step()

                train_loss += loss.item()
            self.train(False)

            train_loss = train_loss / i

            val_loss = 0.
            for j, (val_inputs, val_labels) in enumerate(val_dataloader, 1):
                with torch.no_grad():
                    predictions = self(val_inputs, val_labels, epoch)[0]

                    loss = loss_fn(predictions, val_labels)
                    val_loss += loss
            val_loss = val_loss / j

            if self.fds:
                latent_features = torch.from_numpy(np.vstack(latent_feature_record))
                labels_ = torch.from_numpy(np.hstack(label_record))
                self.fds_layer.update_running_stats(latent_features, labels_)

            print(f'time: {time.time() - t0:.1f}s - loss: {train_loss:.4f} - val_loss: {val_loss:.4f}\n')

    def _get_kernel_window(self):
        base_kernel = [0.] * self.half_kernel_size + [1.] + [0.] * self.half_kernel_size
        kernel_window = gaussian_filter1d(base_kernel, sigma=self.sigma)
        kernel_window = kernel_window / sum(kernel_window)

        return torch.FloatTensor(kernel_window)

    def predict(self, inputs):
        if isinstance(inputs, DataFrame):
            inputs = inputs.values

        inputs = torch.FloatTensor(inputs)

        return self(inputs)

    @staticmethod
    def weighted_mae_loss(predictions, targets, weights=None):
        loss = torch.abs(predictions - targets).squeeze()
        if weights is not None:
            loss *= weights.squeeze().expand_as(loss)
        loss = torch.mean(loss)
        return loss

Negative Age

Hi, I run inference with your checkpoint but all my outputs are negative ?
Y said that You hadn't used any preprocess with labels right ?

How Interpolation and Extrapolation works?

Hi @YyzHarry,

I hope this message finds you well. I am reaching out to you regarding your paper, specifically about the discussion on interpolation and extrapolation of sample labels. While reviewing the code, I couldn't understand whether this functionality is implemented as a separate module or if the same LDS and FDS symmetric kernel is utilized for this purpose as well.

Could you kindly provide clarification or guidance on this matter? Many thanks.

Question for Algorithm 1

Hi, I have a question for the pseudo code in Supplementary A section.

For the Algorithm 1, the LDS is only used to compute the weights for loss inverse re-weighting scheme. Why not use the smoothed labels to train the models if LDS captures the real imbalance that affects regression problems?

In addition, could you provide code for computing the effective label density distribution?

Thank you.

Incorrect Focal-R mse loss?

Hi authors,

page 6 from your paper:
Precisely, Focal-R loss based on L1 distance can be written as 1/n∑n i=1 σ(|βei|)γ ei, where ei is the L1 error for i-th sample, σ(·)

  • QUESTION 1 : in the focal_mse loss:
    (2 * torch.sigmoid(beta * torch.abs(inputs - targets)) - 1) ** gamma

    should be torch.abs((inputs - targets)**2) and not only torch.abs(inputs - targets), am I correct?
  • QUESTION 2 : why is there 2*torch.abs(...)-1 ? you do not have and -1 or 2* in the function in your paper?

Smooth Z calculation

In equation 6 of the paper (calculating the smoothed z), what is the subscript b referring to? From my understanding, it is the target bucket for the specific example from which z was calculated. If so, how is this implemented at test time? Wouldn't we require knowledge of the bin to which the example belongs?

It appears that the FDS layer accesses the label too:

labels = labels.squeeze(1)

Feels like I'm misunderstanding something

Hi, question about the Appendix

According to the paper, there are a lot of details in the appendix. But I can't find it on Google. Can you do me a favor? tks~

about test error

Hello, I have some questions about the error pdf. Can I know how to get the right error pdf? Each labels have different numbers of samples, so should I apply a mean method for each label error or make them have the same amount?

Code availability

Hi, I would like to test your solution for a dataset of imbalanced distribution that I have. Do you think that you could make it available?

Applying LDS/FDS to classic machine learning models

Hi! This work is really fantastic! However, I found it hard to apply LDS/FDS to classic machine learning models like random forest. For example, after getting the effective label density with LDS, how should I use this?

SHHS dataset

Excuse me.
Is it convenient to provide the SHHS data and its preprocessing scripts?
Thank you advance!

prediction value processing

hello, I read your paper interestingly and want to ask you a question about the prediction result processing. I would like to ask how to limit the last prediction y^ to be between 0 and 99, or to get it directly from the regression function without any processing?

validation question

hello, I read your paper interestingly and want to apply it to my custom data.
In this regard, two problems occurred.

first, when i evaluate my custom data, i can suffer this error
1
do you know what is?

And can i extract each prediction results, not average prediction result?.

thank you

Bug when using huber/focal loss + LDS

image

In loss.py, "weights" is the last parameter in function of weighted_huber_loss or weighted_focal

image

but in train.py,

loss = globals()[f"weighted_{args.loss}_loss"](outputs, targets, weights)

no atrribute name for weights

maybe it should be

loss = globals()[f"weighted_{args.loss}_loss"](outputs, targets, weights = weights)

or it would make mistakes when using huber + LDS or focal + LDS

feature smoothing when doing backward propogation

Hi, thanks for sharing this wonderful project! There's a small question I wanna ask for. I got the following error when applying the FDS module:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64]], which is output 0 of SelectBackward, is at version 7; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

When I add feature.detach(), the error was gone, but if it is correct to do so? From my understanding, this module is updating feature with previous means and vairances, does it affect the BP part? Thanks in advance for any help!

Bins in FDS and LDS - not usable in general approach, only for given datasets

Hi Team,
I liked the ideas in your paper, but from reading the paper and provided code it sounds like the provided FDS and LDS code can be applied to any dataset/model? Is it really true?

  • It looks like you are using only integers(as you are predicting age) to make a dictionary of histogram bins in both FDS and LDS. In the paper you say :
    "We use a minimum bin size of 1, i.e., yb+1 − yb = 1, and group features with the same target value in the same bin."
    I imagine this makes a lot of things easier but if you are facing imbalanced regression problem and your labels are float between 0 and 5 this version of code won't help you. Do you by any chance have code with general approach?
  • I did not find an explanation for this clipping(maybe empirically it gave better results?):
    value_dict = {k: np.clip(v, 5, 1000) for k, v in value_dict.items()} # clip weights for inverse re-weight
  • there is also another clipping here(I guess again better empirical results?):
    factor = torch.clamp(v2[valid] / v1[valid], clip_min, clip_max)

Note:
I like the ideas in the paper, but due to lack of documentation/explanation I am right now spending a lot of time on generalizing the code and trying to figure out why you made some of the operations(eg. clippings)

About FDS in the test time

Hi, thanks for the awesome job! I have a question about how FDS behaves during test time. It seems like that feature smoothing is disabled during the test since FDS module is only called during training for agedb-dir as follows:

if self.training and self.fds:
            if epoch >= self.start_smooth:
                encoding_s = self.FDS.smooth(encoding_s, targets, epoch)

Could you explain why you don't do feature smooth during testing?

About SHHS-DIR dataset

Thanks a lot for your contribution, your works are really awesome.
I am very interested in your work. However, during the code reading, I did not find the SHHS-DIR dataset.
Could you publish the SHHS-DIR dataset or its sampling method, thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.