unsky / focal-loss Goto Github PK

View Code? Open in Web Editor NEW

483.0 25.0 124.0 6.23 MB

Focal loss for Dense Object Detection

Python 54.08% C++ 6.52% Cuda 38.25% Shell 0.03% Makefile 0.04% C 1.08%

focal-loss's Introduction

focal-loss

The code is unofficial version for focal loss for Dense Object Detection. https://arxiv.org/abs/1708.02002

this is implementtd using mxnet python layer.

The retina-net is in https://github.com/unsky/RetinaNet

usage

Assue that you have put the focal_loss.py in your operator path

you can use:

from your_operators.focal_loss import *

cls_prob = mx.sym.Custom(op_type='FocalLoss', name = 'cls_prob', data = cls_score, labels = label, alpha =0.25, gamma= 2)

focal loss with softmax on kitti(10 cls)

this is my experiments on kitti 10 cls, the performance on hard cls is great!!

[email protected]	car	van	Truck	cyclist	pedestrian	person_sitting	tram	misc	dontcare
base line(faster rcnn + ohem(1:2))	0.7892	0.7462	0.8465	0.623	0.4254	0.1374	0.5035	0.5007	0.1329
faster rcnn + focal loss with softmax	0.797	0.874	0.8959	0.7914	0.5700	0.2806	0.7884	0.7052	0.1433

about parameters in this expriment

note!!

very important!!!

~~in my experiment, i have to use the strategy in paper section 3.3.~~

~~LIKE:~~

~~Uder such an initialization, in the presence of class imbalance, the loss due to the frequent class can dominate total loss and cause instability in early training.~~

~~##AND YOU CAN TRY MY INSTEAD STRATEGY:~~

~~train the model using the classical softmax for several times (for examples 3 in kitti dataset)~~

~~choose a litti learning rate:~~

~~and the traing loss will work well:~~

about alpha

now focal loss with softmax work well

focal loss value is not used in focal_loss.py, becayse we should forward the cls_pro in this layer, the major task of focal_loss.py is to backward the focal loss gradient.

the focal loss vale should be calculated in metric.py and use normalization in it.

and this layer is not support use_ignore

for example :

class RCNNLogLossMetric(mx.metric.EvalMetric):
    def __init__(self, cfg):
        super(RCNNLogLossMetric, self).__init__('RCNNLogLoss')
        self.e2e = cfg.TRAIN.END2END
        self.ohem = cfg.TRAIN.ENABLE_OHEM
        self.pred, self.label = get_rcnn_names(cfg)

    def update(self, labels, preds):
        pred = preds[self.pred.index('rcnn_cls_prob')]
        if self.ohem or self.e2e:
            label = preds[self.pred.index('rcnn_label')]
        else:
            label = labels[self.label.index('rcnn_label')]

        last_dim = pred.shape[-1]
        pred = pred.asnumpy().reshape(-1, last_dim)
        label = label.asnumpy().reshape(-1,).astype('int32')

        # filter with keep_inds
        keep_inds = np.where(label != -1)[0]
        label = label[keep_inds]
        cls = pred[keep_inds, label]

        cls += 1e-14
        gamma = 2
        alpha = 0.25

        cls_loss = alpha*(-1.0 * np.power(1 - cls, gamma) * np.log(cls))

        cls_loss = np.sum(cls_loss)/len(label)
        #print cls_loss
        self.sum_metric += cls_loss
        self.num_inst += label.shape[0]

the value must like

forward value

backward gradient value

you can check the gradient value in your debug(if need). By the way

this is my derivation about backward, if it has mistake, please note to me.

softmax activation:

cross entropy with softmax

Focal loss with softmax

focal-loss's People

Contributors

Stargazers

Watchers

Forkers

clcarwin keyky ml-lab wanjinchang tpys wangzhe0623 whaozl 10183308 benjamesbabala ronyuzhang peratham shiyongde zsyprich zbxzc35 walkoncross boosting wh-forker zt706 justdolearning chiehchiu soledad89 arsenluca whyou5945 lijuan123 opencvfun felicia126 rongchangzhao dreadlord1984 amore-hdu qingsong99 lly2111101 shengyucaihua xychen9459 y04149 trigrass2 unity-dl guker hawksword leempan merlin2013 liu3xing3long shihmengli holygen xinhandi taihulight lyz0305 ieee820 deeprrl wujiachen987 aa12356jm queenie88 jzyztzn sunting78 tongyoungg sinianyutian wyvern92 feitiandemiaomi grabber beacandler larsoncs hdjsjyl hphp zgsxwsdxg zhang405744522 kashiani greenteahua zdwong lanthlove chriszhenghaochen lcj820 nickshawn jkznst chineseqsc glc12125 zqdeepbluesky yan-song mingdingzhiai zkghit ixhorse zhangminlin adajass srinu844 newmesc styjb shubhampachori12110095 zfxu denethor1997 back2yes xiaowangba phexic wenmingmeng jiachen0212 yueping123 wuxiangchao legolas123 yemenr siffi26 amirunpri2018 xiaodongdreams zhang11wu4

focal-loss's Issues

多分类时参数α的意义是什么？

二分类时，α的作用是，让数量少的类别权重增大。
那么多分类时，同理，应该为每个类别设置一个α。
一个统一的α值如何能起到这个作用呢？

Is this the correct formula for delta focal loss?

Hi @unsky
Please, can you check, did I read your Focal Loss formulas correctly?

For CE, delta is:

if (i == j) then delta = 1-p
if (i != j) then delta = -p

For Focal Loss (when gamm=2), delta is:

if (i == j) then delta = (1-p)* alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
if (i != j) then delta = (-p)* alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)

Where are:

pt = softmax(i) - is a probability of the correct class id.
p = softmax(j)
where is i = label truth class id.

Does it sample fixed number of Rois for RCNN training just like the original one, or use much more Rois?

In the original RCNN, cfg.TRAIN.BATCH_ROIS rois are selected.
Does this implementation use the same setting and just modify the cls loss?
Or, much more Rois are selected, following the spirit of DENSE mentioned in the paper?

Focal Loss in Keras

Hi,
Is there any implementation of Focal Loss in Keras or an ongoing effort? I really appreciate cooperation in such a work, please e-mail me if so.

Focal loss derivative is wrong when i!=j.

train error

I get the file focal_loss.py under the operator_py ，and add the context in symbol file,in the function get_symbol_rfcn() such as following:
if cfg.TRAIN.ENABLE_FOCAL_LOSS: 458 cls_prob = mx.sym.Custom(op_type = 'FocalLoss',name = 'cls_prob',data = cls_score,alpha = 0.25,gamma=2, 459 cls_score=cls_score, bbox_pred=bbox_pred, labels=label, 460 bbox_targets=bbox_target, bbox_weights=bbox_weight) 461 bbox_loss_ = bbox_weight * mx.sym.smooth_l1(name='bbox_loss_', scalar=2.0, data=(bbox_pred - bbox_target)) 462 bbox_loss = mx.sym.MakeLoss(name='bbox_loss', data=bbox_loss_, grad_scale=1.0 / cfg.TRAIN.BATCH_ROIS_FOCAL_LOSS) 463 label = labels
,but I get the error
experiments/rfcn/../../rfcn/../lib/bbox/bbox_transform.py:129: RuntimeWarning: overflow encountered in exp pred_w = np.exp(dw) * widths[:, np.newaxis] experiments/rfcn/../../rfcn/../lib/bbox/bbox_transform.py:130: RuntimeWarning: overflow encountered in exp pred_h = np.exp(dh) * heights[:, np.newaxis] Epoch[0] Batch [100] Speed: 12.93 samples/sec Train-RPNAcc=0.856513, RPNLogLoss=0.000257, RPNL1Loss=12776018.528416, RCNNAcc=0.871269, RCNNLogLoss=nan, RCNNL1Loss=nan, Epoch[0] Batch [200] Speed: 12.80 samples/sec Train-RPNAcc=0.870054, RPNLogLoss=0.000180, RPNL1Loss=6419790.438247, RCNNAcc=0.870754, RCNNLogLoss=nan, RCNNL1Loss=nan, Epoch[0] Batch [300] Speed: 12.76 samples/sec Train-RPNAcc=0.873430, RPNLogLoss=0.000149, RPNL1Loss=4286969.716520, RCNNAcc=0.867086, RCNNLogLoss=nan, RCNNL1Loss=nan, Epoch[0] Batch [400] Speed: 12.66 samples/sec Train-RPNAcc=0.875658, RPNLogLoss=0.000133, RPNL1Loss=3217899.976960, RCNNAcc=0.863909,

what did I do wrong?

About the training result

Thanks for your open code!
I tried your method on my own dataset, just set everything as you said step by step. But the result dropped sharply. (70%->22%)
The loss curve also like yours. And I was puzzled about the RCNNFocalLoss, it is almost zero!?

Is there something wrong when i !=j?

In the code ,

dx = self._alpha*np.power(1 - pt, self._gamma - 1) * (self._gamma * (-1 * pt * pro_) * np.log(pt) + pro_ * (1 - pt)) * 1.0

however, in my inference, they should be
dx = self._alpha*np.power(1 - pt, self._gamma - 1) * (self._gamma * (-1 * pt * pro_) * np.log(pt) + pt * (1 - pt)) * 1.0

Is that right?

how to implemented it in faster rcnn?

@unsky
Hi,thanks for your cool code.
but I have little confused about how to use it. I want implemented it in faster rcnn,should I just change the softmax into focal loss or others?can you tell me how to use it?
thanks so much.

How does it work？

Hi， unsky:
thank you for sharing your code , and I want to know is the Focal Loss work well?? How much improve than before?

How to use it

please give a demo

focal loss layer not support multi_output.

Your alpha hyper parameter is wrong.

The truth is that alpha for class 1, (1- alpha) for class 0.

怎么在faster rcnn 实现focal loss来重现你在readme中所展示的检测结果？

你好，感谢你的代码，我很好奇如何在faster rcnn中实现focal loss ，是要把softmax换成focal loss吗？你有提供在faster rcnn中实现它的代码吗？非常感谢。

test on pascal voc dataset?

hi @unsky,

have u tested focal loss in pascal voc dataset? btw, can u share your parameters? like the hyperparamter in solver.prototxt and some parameters of rpn and fast rcnn?

thanks.

初始化问题

我按照论文3.3进行了初始化，但是loss非常大，请问你是如何解决的呢？

doubts about cross entropy with softmax

why CE is not this:

by my formula,derivatives is different from yours.

Is there a result uploaded to the kitti official website?

How to use focal loss in Caffe Models?

Just like YOLO or SSD, how to replace original loss function with focal loss in Caffe? Tanks for your answer,this question puzzle me ..

Initialization

What is your initialization of the detector? Is that exactly the same with original paper to set bias = -log((1-pi)/pi) or use normal softmax for several times like your previous claimed?

use to trainFasterRCNN on Detectrone2 system

hello
thanks for implementation sharng

is it possible to use your implementation in FasterRCNN in Detectrone2 object detection system?
I mean to have Focal loss as loss function in FasterRCNN architecture.
(detectrine2 : https://github.com/facebookresearch/detectron2)

About the focal loss layer

Hi @unsky ,
The performance in your experiment is amazing. By the way, did you only replace the SoftmaxWithLoss with the focal loss layer in RPN layer or in both RPN and Fast RCNN?

AttributeError: 'FocalLoss' object has no attribute 'mean'

这种问题如何解决？

Apply focal loss on sampled training examples?

I have seen your implementation code of focal loss in README. I notice that the examples with label == -1 are filtered before the loss computation. Do you perform sampling in training the classification subnet, I mean, like faster rcnn, make the positive: negative = 1 : 3, then apply the focal loss ? I'm a little confused, because the in the paper, the author said they take all the examples to compute the focal loss.