umbertogriffo / focal-loss-keras Goto Github PK

Binary and Categorical Focal loss implementation in Keras.

Python 95.56% Makefile 4.44%

deep-learning deep-neural-networks keras loss-functions cross-entropy-loss binary-classification categorical-cross-entropy

focal-loss-keras's Introduction

Hi 👋, I'm Umberto

A passionate Data/Machine Learning Engineer based in Lisbon

🦷 I’m currently working at Promaton.
🚘 I developed an MLOps framework at Cazoo to support the data science community with deploying, updating and maintaining models at scale.
🔭 I was working at Bose where I was enhancing the Bose Data Platform in leading and supporting Analytics & AI/ML workloads and applying MLOps best practices to help Data Scientists taking their models to production.
🚌 I have been working for a while at tb.lx by Daimler Trucks & Buses where I implemented scalable Data Pipelines in batch and streaming to feed event-driven applications as well as developed Machine Learning models for Predictive Maintenance solutions.
👨‍💻 All of my open-source projects are available here on GitHub
💬 Ask me about anything, I am happy to help

Guides

Let's connect and chat!

focal-loss-keras's People

Contributors

Stargazers

Watchers

focal-loss-keras's Issues

In the loss, shouldn't it be 1-y_pred ?

I am not sure, but I think there might be an error here:
pt_1 = array_ops.where(y_true > 0, y_pred, tf.ones_like(y_pred))
pt_0 =array_ops.where(y_true == 0, 1-y_pred, tf.zeros_like(y_pred))

There is some conditions that categorical focal loss does not improve from start

This may not be a problem in your code but in the algorithm itself. The loss function (categorical focal loss, CFL) worked well in some model/data but did not converge at all in other cases. For example, I tried CFL with ResNet50 for bacterial detection which didn't converge at all with 26% train accuracy for 4 classes while categorical cross entropy didn't have this problem with 90% test accuracy. But CFL for 1D-CNN with the same data converged well with 99% test accuracy.

About Alpha parameter in focal loss

From the paper, alpha's are weights for each example. So why alpha=0.25 is kept? Does this mean giving equal weight to all the examples?

I may be wrong but this what I understood from the paper.

ValueError: Unknown loss function: categorical_focal_loss_fixed

Hi, I was using this library but I got this error when reloading (after successfully training).

File "/home/THEUSER/anaconda3/envs/tf_gpu/lib/python3.9/site-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
File "/home/THEUSER/anaconda3/envs/tf_gpu/lib/python3.9/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 199, in load_model_from_hdf5
model.compile(**saving_utils.compile_args_from_training_config(
File "/home/THEUSER/anaconda3/envs/tf_gpu/lib/python3.9/site-packages/tensorflow/python/keras/saving/saving_utils.py", line 218, in compile_args_from_training_config
loss = _deserialize_nested_config(losses.deserialize, loss_config)
File "/home/THEUSER/anaconda3/envs/tf_gpu/lib/python3.9/site-packages/tensorflow/python/keras/saving/saving_utils.py", line 259, in _deserialize_nested_config
return deserialize_fn(config)
File "/home/THEUSER/anaconda3/envs/tf_gpu/lib/python3.9/site-packages/tensorflow/python/keras/losses.py", line 1854, in deserialize
return deserialize_keras_object(
File "/home/THEUSER/anaconda3/envs/tf_gpu/lib/python3.9/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 377, in deserialize_keras_object
raise ValueError(
ValueError: Unknown loss function: categorical_focal_loss_fixed

Do you know how to fix it ? Thanks a lot.

a question on "alpha" param for categorical cross entropy loss

From the categorical entropy loss implementation, it seems that the "alpha" array is used to weight the losses among different categories. I.e, the bigger the value is, the loss for that category is weighted more than other categories.

For example, if we have C=3 (from C1 ~ C3), and alpha = [0.2,0.4,0.2], does it mean that loss of C1 is weighted the same as that of C3, while the loss of C2 is weighted twice as much as those of C1 and C3 ?

In this case, alpha=[0.2,0.4,0.2] should have the same effect as alpha=[0.4,0.8,0.4]

Wrong computation of alpha in multiclass scenario

Hi,

Firstly, sorry for my missclick that sent this incomplete.

Secondly, I believe you are calculating the alpha wrongly in multiclass scenario. It is just constant applied to all the cases, while in binary scenario it really does highlight imbalance scenario, with alpha and 1 - alpha.
Shouldn't we therefore choose different alpha for every class, for example using class weights from scikit learn?

I believe, for multiclass scenario there should be additional parameter called weights or so, which for example could be the dictionary outputed by class weights from scikit-learn.

alpha = weights[numpy.argmax(y_true)]

This line would select proper alpha for current class. For smaller classes it would be big number, for bigger classes small number.

Bug in categorical_focal_loss ?

I have tested the code of 'categorica_focal_loss' and I think I have found a bug. If I understand it well, focal_loss should be equal to crossentropy if gamma=0.0 and alpha=1.0. However that is not the case. If I compare the output obtained against keras crossentropy, the result is equal up to a factor.

a = tf.constant([1., 0., 0., 0., 1., 0., 0., 0., 1.], shape=[3,3])
print(a)

b = tf.constant([.9, .05, .05, .5, .89, .6, .05, .01, .94], shape=[3,3])
print(b)

loss = tf.keras.backend.categorical_crossentropy(a, b)
print(np.around(loss, 5))

cfl = categorical_focal_loss(gamma=0., alpha=1.)
loss = cfl(a, b)

It turns out that your function should return the sum of the loss, instead of the mean. That is

return K.sum(loss, axis=-1)

Am I right ?

Multi-label Classification task

I just wanted to know if this can be applied to the case where there is multi-label classification problem with sigmoid output activation unit. As in, there are multiple labels that can be 1 at the same time and hence, the sum of probabilities is not necessarily equal to 1 (as is the case with softmax).

I came to your repo from this issue. Please let me know which loss function I can use in this scenario. I actually saw the code but wasn't entirely sure that the binary_focal_loss function is suitable in this problem. It looked to me as if it's only for binary classification and not for multi-label classification task.

A fix so that you can weight specific pixels as in a 2D segmentation problem with only partially / weakly labeled pixels.

Let's assume 2D multi-category segmentation problem with batches are size [b, h, w, 1] and y_pred is [b,h,w,c] where c is number of classes.
Now, suppose you only have labels for some of the pixels, call this mask w and is of size [b,w,h] in {0,1} as an indicator variable if label not present / present.

Tf keras computes weighted metrics by loss() * w.

To have tf keras take advantage of this weight, you have to remove K.mean() from the end of the categorical loss. It should be just:

# Compute mean loss in mini_batch
return K.sum(loss, axis=-1)

This results in an output which is [b, w, h] which is the same as the shape of w.

different batchsize different loss value

the keras custom loss function input shape require (batchsize,outputshape) or (outputshape,batchsize), but your code is calculate the loss of one output picture.
thus leading the loss value vary from the batchsize.
the first is calculated with batchsize=1
the second is calculated with batchsize=10

Is the categorical one implemented right?

I think the cross entropy part is wrong.

def categorical_focal_loss_fixed(y_true, y_pred):
        """
        :param y_true: A tensor of the same shape as `y_pred`
        :param y_pred: A tensor resulting from a softmax
        :return: Output tensor.
        """

        # Scale predictions so that the class probas of each sample sum to 1
        y_pred /= K.sum(y_pred, axis=-1, keepdims=True)

        # Clip the prediction value to prevent NaN's and Inf's
        epsilon = K.epsilon()
        y_pred = K.clip(y_pred, epsilon, 1. - epsilon)

        # Calculate Cross Entropy
        cross_entropy = -y_true * K.log(y_pred)

        # Calculate Focal Loss
        loss = alpha * K.pow(1 - y_pred, gamma) * cross_entropy

        # Compute mean loss in mini_batch
        return K.mean(K.sum(loss, axis=-1))

    return categorical_focal_loss_fixed

So currently cross_entropy = -y_true * K.log(y_pred). This already ignored all the true negative cases. I think the right implementation should be below:

...
loss = - y_true * K.log(y_pred) * K.pow(1 - y_pred, gamma) * alpha \
                        - (1 - y_true) * K.log(1 - y_pred) * K.pow(y_pred, gamma) * (1 - alpha)
...

The loss should be implemented like this. Let me know if this makes sense. When I'm testing using this loss on my model. this new one works much better than the old one.

categorical focal loss

Hi,
Thanks for all. Can you do the same for categorical focal loss ? Others package don't work.

Sparse Categorical Labels

Does this works with sparse categorical labels?

Multi-class classification case

In the retina net paper, gamma is a unique parameter, but we see in the formula that for each class i, a different class weight alpha_i is given in order to account for class imbalance.

According to this implementation, alpha is given also as a unique value, just like gamma, instead of an alpha value for each class, and I can not understand why.

obviously I'm talking about the multi-class case, not the binary one. In a binary case, a unique value of alpha is enough, as one class would be weighted with alpha, and the second one with (1-alpha). But I'm wondering what happens in a three class classification ?

Did I miss something here ? Thank you for your help.

can i use it in Tensorflow 1.*?

I tried to use it in my model which is based on tensorflow V1. I tried but it throws an error. i have one hot encoded multiclass labels. How can i use focal loss in my code.

Error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[36,525] = 38 is not in [0, 38)
[[{{node GatherV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

A cleaner pattern to make custom_objects simplier

class categorical_focal_loss:                             
    '''
    Softmax version of focal loss.

           m
      FL = sum  -alpha * (1 - p_o,c)^gamma * y_o,c * log(p_o,c)
          c=1

      where m = number of classes, c = class and o = observation

    Parameters:
      alpha -- the same as weighing factor in balanced cross entropy
      gamma -- focusing parameter for modulating factor (1-p)

    Default value:
      gamma -- 2.0 as mentioned in the paper
      alpha -- 0.25 as mentioned in the paper

    References:
        Official paper: https://arxiv.org/pdf/1708.02002.pdf
        https://www.tensorflow.org/api_docs/python/tf/keras/backend/categorical_crossentropy

    Usage:
     model.compile(loss=[categorical_focal_loss(alpha=.25, gamma=2)], metrics=["accuracy"], optimizer=adam)
    '''
    def __init__(self, gamma=2., alpha=.25):
        self._gamma = gamma
        self._alpha = alpha
        self.__name__ = 'categorical_focal_loss'
        
    def __int_shape(self, x):
        return tf.keras.backend.int_shape(x) if self.backend == 'tensorflow' else tf.keras.backend.shape(x)    
    
    def  __call__(self, y_true, y_pred):        
        '''
        :param y_true: A tensor of the same shape as `y_pred`
        :param y_pred: A tensor resulting from a softmax
        :return: Output tensor.
        '''

        # Scale predictions so that the class probas of each sample sum to 1
        y_pred /= tf.keras.backend.sum(y_pred, axis=-1, keepdims=True)

        # Clip the prediction value to prevent NaN's and Inf's
        epsilon = tf.keras.backend.epsilon()
        y_pred = tf.keras.backend.clip(y_pred, epsilon, 1. - epsilon)

        # Calculate Cross Entropy
        cross_entropy = -y_true * tf.keras.backend.log(y_pred)

        # Calculate Focal Loss
        loss = self._alpha * tf.keras.backend.pow(1 - y_pred, self._gamma) * cross_entropy

        # Sum the losses in mini_batch
        return tf.keras.backend.sum(loss, axis=1)

With this pattern, I don't need dill when using load_model.

Line 77 is unecessary

Good code.

One little remark:

In line 77 a divisoin over sum is done, however this is unecessary since the input y_pred is already softmax, which means the sum of all elements is 1.

y_pred /= K.sum(y_pred, axis=-1, keepdims=True)

umbertogriffo / focal-loss-keras Goto Github PK

focal-loss-keras's Introduction

Hi 👋, I'm Umberto

A passionate Data/Machine Learning Engineer based in Lisbon

Guides

focal-loss-keras's People

Contributors

Stargazers

Watchers

Forkers

focal-loss-keras's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs