kuangliu / pytorch-retinanet Goto Github PK

View Code? Open in Web Editor NEW

989.0 35.0 249.0 11.9 MB

RetinaNet in PyTorch

Python 100.00%

pytorch-retinanet's Introduction

PyTorch-RetinaNet

Train RetinaNet with Focal Loss in PyTorch.

Reference:
[1] Focal Loss for Dense Object Detection

pytorch-retinanet's People

Contributors

Stargazers

Watchers

Forkers

bityangke ml-lab cclauss dengshuo soledad89 walkoncross thomasdic2000 benjamesbabala wwwanghao likeucode zbxzc35 boosting hedgefair hyzcn ivorytwoer yuleichin acgtyrant dref360 pustar zgsxwsdxg jeffalmaden shubhampachori12110095 liu3xing3long tarrencev searobbersduck eunji94 lukeandshuo marvis brianlan xzl8028 bestsunlin tangyoubao shafiahmed alxndrkalinin xdang13 jplnasa5 lonestar686 aust-hansen zhengfangwu daijucug baby47 liketheflower smahliivaza hq-liu b2220333 linnjie wzheng1983 hyfine mrwhitehomeman medyuanxiao shunsunsun xiaoliang008 kaihuatang qigtang yskim041 q1nwu feitiandemiaomi lincn chenhongming elfpattern vitvicky zenozhouzhao chisyliu bikong2 silvanac maokeai ftorres11 lengweiping1983 lingeo hepinghu alexanderparkin mojjjiii dreamway harvard-ml-courses pemami4911 ai3dvision xuzheyuan624 junan007 hefeicyp qmiwang afcarl xzf125244170 samschulter chicm chenxinpeng netphone fanofjava sdsy888 akirasosa ddeeppnneett sonyeric mbjoseph cq70605 cltdevelop fatfishzhao hustxun mrgo2008 mousechen wenbank xwjbupt

pytorch-retinanet's Issues

When training with voc datset, why is the output channel of the classifier subnet 920HW instead of 921HW?

Code state/documentation

Hi @kuangliu, does this code follow the paper exactly? Also, are you able to document it sufficiently so that we can get started on it using VOC?

Some information for custom dataset and Result

Could you please add some section about how to train on custom dataset. And also results that you got on standard dataset like PASCAL VOC or COCO dataset would be a great help.

Changing the base network (to VGG, ResNet, DenseNet, etc.)

It will be nice to be able to train using other standard networks as well, since FPN requires too much memory. Right now, changing the base network in retinanet.py (e.g., to VGG) does not work out of the box.

Has anybody tried and got it working?

questtion about voc2012_val.txt

Most of pics in voc2012_val.txt only contains person while objects of other 19 classes are not included.

Can I get the same result with the paper?

Hi. Thank you for sharing your implementation.

Is it possible to get the same (or similar) mAP with the paper?
I would be really appreciated if you could tell me the final performance.

how to load `model_final.pkl`

hello, I want to run test,py and I get the model from the model zoo here, but how to load the model_final.pkl?

How to solve 'ZeroDivisionError: float division by zero' problem?

My GPU memory is 2G.
Therefore, I use batch=1 to run this program.
However, I meet the error force me to stop.
What should I do to get similar result of this paper in my configuration?
Thank you~

Error:
Traceback (most recent call last):
File "./t.py", line 1098, in
train(epoch)
File "./t.py", line 1059, in train
loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "./t.py", line 984, in forward
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.data[0]/num_pos, cls_loss.data[0]/num_pos), end=' | ')
ZeroDivisionError: float division by zero

why exclude background?

pytorch-retinanet/loss.py

Line 30 in 2d7c663

t = t[:,1:] # exclude background

Loss is coming nan

Various Small Fixes

New convolution layer's initializations doesn't match the original RetinaNet paper. The bias should be set to 0 and the weights to a normal distribution weight fill with σ = 0.01.
The last layer of classification and object detection SubNets don't have proper initialization (this might explain the NaN errors, as they noted in the paper that not doing this causes them). The bias of each should be set to (according to the π value selected in the paper):

π = 0.01
 − log((1 − π)/π) = -4.59511985013459

ResNet isn't pretrained. This seems very fixable. My current solution is to subclass ResNet and then override the forward method. This allows for the weights to be loaded, but for the network to have a completely different output. You structured your model different, so I'm not sure if this would work for your structure.
COCO dataloader is missing. Seems like something that would be good to see if works, as VOC seems to work great (judging from loss).
A detection script that can predict boxes on an image would be helpful to see the network's outputs. I cannot figure out how to do this in an elegant way, so would love your input on this. I'm confused how this implementation works with anchor points and how to transform the predictions back to image space.
~~Activation function should be Sigmoid~~ You fixed this 👍
Learning rate training schedule does not match the paper, but the paper was working with a pretrained ResNet and training on the COCO dataset.
The RetinaNet paper recommends flip dataset augmentation - seems very easy to add.

Anway, just let me know what you think and I can open a pull request for these things if you would like. I have written most of these fixes for work and would love to merge into your repo.

focal loss back propagation

This is not an issue but a question.

I think the the term (1-p)^gamma and p^gamma in focal loss are for weighing only. They should not be back propagated during gradient descent. Am I correct?

If so, do you need to detach() your variables for computing the weight terms in your focal loss function?

cannot find the val split image data in voc2012

i just cannot find the val split images in voc2012 dataset download from http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar,
where can i download the validation data?

performance of this repo

Have anyone test this repo?

Questions about datagen.py

Hi, I met a question that When I want to use this project to train WIDER-Face-Detection dataset. And I have create label to meet your script like this(name.jpg xmin ymin xmax ymax), When I want to debug to see whether I am right, I saw it print the data shape from dataloader like this.
(1L, 3L, 600L, 600L)
(1L, 67995L, 4L)
(1L, 67995L)
why it becomes 67995 dimension, I found that DataSet getitem method create data shape is
(1L, 3L, 600L, 600L)
(1L, 4L)
(1L)
I did'n use pytorch before, is something wrong with pytorch? or my label file?

transform_issues

boxes[:,1::2].clamp_(min=0, max=h-1) The boxes ymax not compressed

关于rpn网络的问题

请问您的代码中是没有用rpn网络吗？，感觉是直接就在最后的每个feature map的所有像素点上进行最终的分类和回归，剪掉了中间的rpn网络。

focal_loss_alt VS focal_loss

Why there are two version of focal loss methods in _"class FocalLoss(nn.Module): URL: https://github.com/kuangliu/pytorch-retinanet/blob/2199fd9711fd787ae409800a499db73e6d466fd7/loss.py" ????

MAP calculation

Hi, Can you provide for evaluation also...to get MAP after training. Thanks

Loss is NaN

When I'm training the retinanet,cls_loss is easy to become NaN.Does anyone know the reason?

loc_loss: 0.084 | cls_loss: nan | train_loss: nan | avg_loss: nan

cls_loss is NAN

how to run train.py???

python train.py
==> Preparing data..

Epoch: 0
Traceback (most recent call last):
File "train.py", line 114, in
train(epoch)
File "train.py", line 75, in train
loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)
File "/home/hs/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/hs/hs/pytorch/pytorch-retinanet-580/loss.py", line 99, in forward
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item()/num_pos, cls_loss.item()/num_pos), end=' | ')
File "/home/hs/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 320, in rdiv
return self.reciprocal() * other
RuntimeError: reciprocal is not implemented for type torch.cuda.LongTensor

Why do you freeze batch norm parameters when training?

would it be better to let batch norm parameters adapt to your current data?

Batch size 1 crashes in datagen.py

Lines

max_size, _ = torch.IntTensor([im.size() for im in imgs]).max(0)
max_h, max_w = max_size[1], max_size[2]

where to download pretrained FPN101 torch model

Is there any pretrain model available?

This pytorch version is quite helpful for research! I want to do a transfer learning on other datasets. Is there anyone willing to share the pretrain model on voc or coco? Thanks very much!

alternative loss is numerically not stable

loss = -w*pt.log() / 2

because of this line, that loss function is numerically not stable.
for instance, pt.log() would be -inf when pt value is going to zero.
pt value is sigmoid encoding value. so it can be zero.

测试时遇到问题

print('Loading model..')
net = RetinaNet()
net.load_state_dict(torch.load('./checkpoint/params.pth')['net'])
net.eval()

Specify retain_graph=True when calling backward the first time

hello, when I run retinanet.py, it shows error:
Traceback (most recent call last): File "retinanet.py", line 58, in <module> test() File "retinanet.py", line 56, in test cls_preds.backward(cls_grads) File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 89, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
how to fix it? can you give some advises? thankyou

Undefined names: TOTAL_BAR_LENGTH, last_time, term_width

Perhaps these four lines should not be commented out. https://github.com/kuangliu/pytorch-retinanet/blob/master/utils.py#L201-L204

flake8 testing of https://github.com/kuangliu/pytorch-retinanet on Python 2.7.13

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./utils.py:211:19: F821 undefined name 'TOTAL_BAR_LENGTH'
    cur_len = int(TOTAL_BAR_LENGTH*current/total)
                  ^

./utils.py:212:20: F821 undefined name 'TOTAL_BAR_LENGTH'
    rest_len = int(TOTAL_BAR_LENGTH - cur_len) - 1
                   ^

./utils.py:223:28: F821 undefined name 'last_time'
    step_time = cur_time - last_time
                           ^

./utils.py:235:20: F821 undefined name 'term_width'
    for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3):
                   ^

./utils.py:235:35: F821 undefined name 'TOTAL_BAR_LENGTH'
    for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3):
                                  ^

./utils.py:239:20: F821 undefined name 'term_width'
    for i in range(term_width-int(TOTAL_BAR_LENGTH/2)):
                   ^

./utils.py:239:35: F821 undefined name 'TOTAL_BAR_LENGTH'
    for i in range(term_width-int(TOTAL_BAR_LENGTH/2)):
                                  ^

7       F821 undefined name 'TOTAL_BAR_LENGTH'

可以用来训练自己的数据集吗

please help me，I don't know how to run the train.py

can you tell me ?

Traceback (most recent call last):
File "train.py", line 15, in
from loss import FocalLoss
File "H:\datasets\tianchi_lvcai\tianchi_lvcai_fusai\pytorch-retinanet-master\loss.py", line 7, in
from utils import one_hot_embedding
File "H:\datasets\tianchi_lvcai\tianchi_lvcai_fusai\pytorch-retinanet-master\utils.py", line 242, in
_, term_width = os.popen('stty size', 'r').read().split()
ValueError: not enough values to unpack (expected 2, got 0)

encoder

@kuangliu , i am using pytorch 0.12, but i find that in DataEncoder class, the encoder function, "boxes=boxes[max_idx]" can not work, do you mean that this operation can enlarge the size of boxes tensor to anchor size? I can not use this operation due to the different version??

Thanks

index out of range

When I use this model to train coco2017, the num_classes is set to 80.But there is an error likes this:
RuntimeError: index out of range at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/TH/generic/THTensorMath.c:277
which is occured at “pytorch-retinanet/utils.py", line 230, in one_hot_embedding” return y[labels].
So what happened,why?How can I solve this problem?

What is the license?

I would like to close my fork.

where to get the initial net.pth?

where to get the initial net.pth? from model zoo?

Anchor proposal dimensions larger than input

I noticed that the encoder generated anchor proposals with dimensions larger than the input. i.e. some have width > 600px. Is this intended?

Expected object of type torch.LongTensor but found type torch.FloatTensor

the torch version:torch 0.4.1
When i use python 2.7 to train the project ,found a problem as follow:
`` python train.py
==> Preparing data..

Epoch: 0
Traceback (most recent call last):
File "train.py", line 114, in
train(epoch)
File "train.py", line 68, in train
for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/wq/retinaNet_pytorch/pytorch-retinanet-master_old/datagen.py", line 120, in collate_fn
loc_target, cls_target = self.encoder.encode(boxes[i].long(), labels[i].long(), input_size=(w,h))
File "/home/wq/retinaNet_pytorch/pytorch-retinanet-master_old/encoder.py", line 78, in encode
anchor_boxes = self._get_anchor_boxes(input_size)
File "/home/wq/retinaNet_pytorch/pytorch-retinanet-master_old/encoder.py", line 52, in _get_anchor_boxes
xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2)
RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other

I trained this project with voc2012 ,downloaded by myself , i tried to correct this problem, but caused some others problem similarly .

Question about `anchor_areas`

This init func in encoder.py first sets the anchor area of the corresponding feature map (p3 --> p7):
self.anchor_areas = [32 * 32., 64 * 64., 128 * 128., 256 * 256., 512 * 512.]
and then combines with the anchor location:

wh = self.anchor_wh[i].view(1, 1, 9, 2).expand(fm_h, fm_w, 9, 2)
box = torch.cat([xy, wh], 3)

I do think the anchor areas should be adjusted by the actual object size, especially when the input image is small. Given that we encode the boxes in advance, we should take care of the setting of anchor areas.

Is this understanding right?

value cannot be converted to type double without overflow:

Is it required to install specific pytorch or do some trick to get rid of this?

==> Preparing data..

Epoch: 0
loc_loss: 0.116 | cls_loss: 3791.763 | train_loss: 3791.878 | avg_loss: 3791.878
loc_loss: 0.088 | cls_loss: 1283.638 | train_loss: 1283.725 | avg_loss: 2537.802
loc_loss: 0.093 | cls_loss: 8380.014 | train_loss: 8380.107 | avg_loss: 4485.237
loc_loss: 0.095 | cls_loss: 2.312 | train_loss: 2.407 | avg_loss: 3364.530
Traceback (most recent call last):
File "train.py", line 114, in
train(epoch)
File "train.py", line 75, in train
loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/xiangyong/Workbench/pytorch-retinanet-kuangliu/loss.py", line 92, in forward
cls_loss = self.focal_loss_alt(masked_cls_preds, cls_targets[pos_neg])
File "/home/xiangyong/Workbench/pytorch-retinanet-kuangliu/loss.py", line 60, in focal_loss_alt
return loss.sum()
RuntimeError: value cannot be converted to type double without overflow: inf

Issue collection

Hey guys, I'm super busy the two weeks. Finally I get some time working on this.
For now, let's fix the issue one by one.

@njtuzzy:

@kuangliu @Mendel1 In the encoder file, the output of "get_anchor_boxes" is the "xcenter,ycenter, xwidth, ywidth" format, it seems that it does not need to change to xxwh(I guess you mean xywh) using change_box_order function?

anchor_boxes is ordered as xywh,
boxes is changed from xyxy to xywh with change_box_order:

boxes = change_box_order(boxes, 'xyxy2xywh')

Now they are both xywh. Any problems?

Out of memory

@kuangliu I am using a titian X graphic card with 12G memory , somehow, when i use the bach_size=2, there are still out or memory, may i ask the configuration of hardware and software you are using? thanks

smoothL1loss size_average

In F.smoothL1loss, if "size_average" is True, loss / (samples_num * location_vector_length), you set "size_average" to False, but only devided by samples_num. Why is it?

How to solve [Loss = "Nan"] problem??

When I trained with the most recent version(commit : fda946, using focal_loss_alt()), Loss is stiil Nan.
same result as using the previous focal_loss()...

Is there any additional settings ?
Is there any solution ?

Thank you.

Use Sigmoid instead of Softmax in Focal Loss

@kuangliu Could you please tell me why you use log_softmax to compute the focal loss instead of the sigmod layer mentioned in the paper?Or I made a mistake in understanding?

Focal loss is very less

Hi,

I have implemented your code and it worked properly but have the following concerns

My sudo code works like this
cls_targets = [batch_size, anchor_boxes, classes] # classes is 21 (voc_labels+background) [16, 67995, 21]
cls_preds = [batch_size, anchor_boxes] # anchor_boxes number ranges from -1 to 20 [67995, 21]

Now I remove all the anchor boxes with -1 (ignore_boxes)
cls_targets = [batch_size * valid_anchor_boxes, classes] # [54933, 21]
cls_preds = [batch_size * valid_anchor_boxes, classes] # [54933, 21] This is one hot encoding vector

Now, I followed your code and implemented focal loss as it is but My loss values are coming very less. Like random values is giving a score of 0.12 and quickly the loss is going 0.0012 and small

is der I am missing something:

class FocalLoss_tensorflow(nn.Module):
    def __init__(self, num_classes=20,
                focusing_param = 2.0, 
                balance_param=0.25):
        super(FocalLoss_2, self).__init__()
        self.num_classes = num_classes
        self.focusing_param = focusing_param
        self.balance_param = balance_param 
    
    def focal_loss(self, x, y):
        """
        """
        x  = x[:, 1:]
        sigmoid_p = F.sigmoid(x)
        anchors, classes = x.shape 
        
        t = torch.FloatTensor(anchors, classes+1)
        t.zero_()
        t.scatter_(1, y.data.cpu().view(-1, 1), 1)
        t = Variable(t[:, 1:]).cuda()
        
        zeros = Variable(torch.zeros(sigmoid_p.size())).cuda()
        pos_p_sub = ((t >= sigmoid_p).float() * (t-sigmoid_p)) + ((t < sigmoid_p).float() * zeros)
        neg_p_sub = ((t >= zeros).float() * zeros) + ((t <= zeros).float() * sigmoid_p)
        
        per_entry_cross_ent = (-1) * self.balance_param * (pos_p_sub ** self.focusing_param) * torch.log(torch.clamp(sigmoid_p, 1e-8, 1.0)) -(1-self.balance_param) * (neg_p_sub ** self.focusing_param) * torch.log(torch.clamp(1.0-sigmoid_p, 1e-8, 1.0))
        return per_entry_cross_ent.mean()
        
        
    
    def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
        batch_size, num_boxes = cls_targets.size()
        pos = cls_targets > 0
        num_pos = pos.data.long().sum()

        mask = pos.unsqueeze(2).expand_as(loc_preds)
        masked_loc_preds = loc_preds[mask].view(-1,4)
        masked_loc_targets = loc_targets[mask].view(-1,4)
        loc_loss = F.smooth_l1_loss(masked_loc_preds, masked_loc_targets, size_average=False)
        loc_loss = loc_loss/num_pos

        pos_neg = cls_targets > -1
        mask = pos_neg.unsqueeze(2).expand_as(cls_preds)
        masked_cls_preds = cls_preds[mask].view(-1, self.num_classes)
        cls_loss = self.focal_loss(masked_cls_preds, cls_targets[pos_neg])
        return loc_loss, cls_loss

Question1:
I am still not getting quite write, if I should use 0 as my background class and how normalization is done while focal loss is applied.