GithubHelp home page GithubHelp logo

focal-loss's Introduction

focal-loss

The code is unofficial version for focal loss for Dense Object Detection. https://arxiv.org/abs/1708.02002

this is implementtd using mxnet python layer.

The retina-net is in https://github.com/unsky/RetinaNet

usage

Assue that you have put the focal_loss.py in your operator path

you can use:

from your_operators.focal_loss import *

cls_prob = mx.sym.Custom(op_type='FocalLoss', name = 'cls_prob', data = cls_score, labels = label, alpha =0.25, gamma= 2)

focal loss with softmax on kitti(10 cls)

this is my experiments on kitti 10 cls, the performance on hard cls is great!!

[email protected] car van Truck cyclist pedestrian person_sitting tram misc dontcare
base line(faster rcnn + ohem(1:2)) 0.7892 0.7462 0.8465 0.623 0.4254 0.1374 0.5035 0.5007 0.1329
faster rcnn + focal loss with softmax 0.797 0.874 0.8959 0.7914 0.5700 0.2806 0.7884 0.7052 0.1433

image

about parameters in this expriment

#5

note!!

very important!!!

in my experiment, i have to use the strategy in paper section 3.3.

LIKE:

image

Uder such an initialization, in the presence of class imbalance, the loss due to the frequent class can dominate total loss and cause instability in early training.

##AND YOU CAN TRY MY INSTEAD STRATEGY:

train the model using the classical softmax for several times (for examples 3 in kitti dataset)

choose a litti learning rate:

and the traing loss will work well:

image

about alpha

#4

now focal loss with softmax work well

focal loss value is not used in focal_loss.py, becayse we should forward the cls_pro in this layer, the major task of focal_loss.py is to backward the focal loss gradient.

the focal loss vale should be calculated in metric.py and use normalization in it.

and this layer is not support use_ignore

for example :

class RCNNLogLossMetric(mx.metric.EvalMetric):
    def __init__(self, cfg):
        super(RCNNLogLossMetric, self).__init__('RCNNLogLoss')
        self.e2e = cfg.TRAIN.END2END
        self.ohem = cfg.TRAIN.ENABLE_OHEM
        self.pred, self.label = get_rcnn_names(cfg)

    def update(self, labels, preds):
        pred = preds[self.pred.index('rcnn_cls_prob')]
        if self.ohem or self.e2e:
            label = preds[self.pred.index('rcnn_label')]
        else:
            label = labels[self.label.index('rcnn_label')]

        last_dim = pred.shape[-1]
        pred = pred.asnumpy().reshape(-1, last_dim)
        label = label.asnumpy().reshape(-1,).astype('int32')

        # filter with keep_inds
        keep_inds = np.where(label != -1)[0]
        label = label[keep_inds]
        cls = pred[keep_inds, label]

        cls += 1e-14
        gamma = 2
        alpha = 0.25

        cls_loss = alpha*(-1.0 * np.power(1 - cls, gamma) * np.log(cls))

        cls_loss = np.sum(cls_loss)/len(label)
        #print cls_loss
        self.sum_metric += cls_loss
        self.num_inst += label.shape[0]

the value must like

forward value

image

backward gradient value

image

you can check the gradient value in your debug(if need). By the way

this is my derivation about backward, if it has mistake, please note to me.

softmax activation:

image

cross entropy with softmax

image

Focal loss with softmax

image

focal-loss's People

Contributors

unsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

focal-loss's Issues

多分类时参数α的意义是什么?

二分类时,α的作用是,让数量少的类别权重增大。
那么多分类时,同理,应该为每个类别设置一个α。
一个统一的α值如何能起到这个作用呢?

Is this the correct formula for delta focal loss?

Hi @unsky
Please, can you check, did I read your Focal Loss formulas correctly?

For CE, delta is:

  • if (i == j) then delta = 1-p
  • if (i != j) then delta = -p

For Focal Loss (when gamm=2), delta is:

  • if (i == j) then delta = (1-p)* alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
  • if (i != j) then delta = (-p)* alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)

Where are:

  • pt = softmax(i) - is a probability of the correct class id.
  • p = softmax(j)
    where is i = label truth class id.

Focal Loss in Keras

Hi,
Is there any implementation of Focal Loss in Keras or an ongoing effort? I really appreciate cooperation in such a work, please e-mail me if so.

train error

I get the file focal_loss.py under the operator_py ,and add the context in symbol file,in the function get_symbol_rfcn() such as following:
if cfg.TRAIN.ENABLE_FOCAL_LOSS: 458 cls_prob = mx.sym.Custom(op_type = 'FocalLoss',name = 'cls_prob',data = cls_score,alpha = 0.25,gamma=2, 459 cls_score=cls_score, bbox_pred=bbox_pred, labels=label, 460 bbox_targets=bbox_target, bbox_weights=bbox_weight) 461 bbox_loss_ = bbox_weight * mx.sym.smooth_l1(name='bbox_loss_', scalar=2.0, data=(bbox_pred - bbox_target)) 462 bbox_loss = mx.sym.MakeLoss(name='bbox_loss', data=bbox_loss_, grad_scale=1.0 / cfg.TRAIN.BATCH_ROIS_FOCAL_LOSS) 463 label = labels
,but I get the error
experiments/rfcn/../../rfcn/../lib/bbox/bbox_transform.py:129: RuntimeWarning: overflow encountered in exp pred_w = np.exp(dw) * widths[:, np.newaxis] experiments/rfcn/../../rfcn/../lib/bbox/bbox_transform.py:130: RuntimeWarning: overflow encountered in exp pred_h = np.exp(dh) * heights[:, np.newaxis] Epoch[0] Batch [100] Speed: 12.93 samples/sec Train-RPNAcc=0.856513, RPNLogLoss=0.000257, RPNL1Loss=12776018.528416, RCNNAcc=0.871269, RCNNLogLoss=nan, RCNNL1Loss=nan, Epoch[0] Batch [200] Speed: 12.80 samples/sec Train-RPNAcc=0.870054, RPNLogLoss=0.000180, RPNL1Loss=6419790.438247, RCNNAcc=0.870754, RCNNLogLoss=nan, RCNNL1Loss=nan, Epoch[0] Batch [300] Speed: 12.76 samples/sec Train-RPNAcc=0.873430, RPNLogLoss=0.000149, RPNL1Loss=4286969.716520, RCNNAcc=0.867086, RCNNLogLoss=nan, RCNNL1Loss=nan, Epoch[0] Batch [400] Speed: 12.66 samples/sec Train-RPNAcc=0.875658, RPNLogLoss=0.000133, RPNL1Loss=3217899.976960, RCNNAcc=0.863909,

what did I do wrong?

About the training result

Thanks for your open code!
I tried your method on my own dataset, just set everything as you said step by step. But the result dropped sharply. (70%->22%)
The loss curve also like yours. And I was puzzled about the RCNNFocalLoss, it is almost zero!?

Is there something wrong when i !=j?

In the code ,

dx = self._alpha*np.power(1 - pt, self._gamma - 1) * (self._gamma * (-1 * pt * pro_) * np.log(pt) + pro_ * (1 - pt)) * 1.0

however, in my inference, they should be
dx = self._alpha*np.power(1 - pt, self._gamma - 1) * (self._gamma * (-1 * pt * pro_) * np.log(pt) + pt * (1 - pt)) * 1.0

Is that right?

how to implemented it in faster rcnn?

@unsky
Hi,thanks for your cool code.
but I have little confused about how to use it. I want implemented it in faster rcnn,should I just change the softmax into focal loss or others?can you tell me how to use it?
thanks so much.

How does it work?

Hi, unsky:
thank you for sharing your code , and I want to know is the Focal Loss work well?? How much improve than before?

test on pascal voc dataset?

hi @unsky,

have u tested focal loss in pascal voc dataset? btw, can u share your parameters? like the hyperparamter in solver.prototxt and some parameters of rpn and fast rcnn?

thanks.

初始化问题

我按照论文3.3进行了初始化,但是loss非常大,请问你是如何解决的呢?

Initialization

What is your initialization of the detector? Is that exactly the same with original paper to set bias = -log((1-pi)/pi) or use normal softmax for several times like your previous claimed?

About the focal loss layer

Hi @unsky ,
The performance in your experiment is amazing. By the way, did you only replace the SoftmaxWithLoss with the focal loss layer in RPN layer or in both RPN and Fast RCNN?

Apply focal loss on sampled training examples?

I have seen your implementation code of focal loss in README. I notice that the examples with label == -1 are filtered before the loss computation. Do you perform sampling in training the classification subnet, I mean, like faster rcnn, make the positive: negative = 1 : 3, then apply the focal loss ? I'm a little confused, because the in the paper, the author said they take all the examples to compute the focal loss.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.