GithubHelp home page GithubHelp logo

pytorch-retinanet's Introduction

pytorch-retinanet's People

Contributors

tarrencev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-retinanet's Issues

Code state/documentation

Hi @kuangliu, does this code follow the paper exactly? Also, are you able to document it sufficiently so that we can get started on it using VOC?

Changing the base network (to VGG, ResNet, DenseNet, etc.)

It will be nice to be able to train using other standard networks as well, since FPN requires too much memory. Right now, changing the base network in retinanet.py (e.g., to VGG) does not work out of the box.

Has anybody tried and got it working?

Can I get the same result with the paper?

Hi. Thank you for sharing your implementation.

Is it possible to get the same (or similar) mAP with the paper?
I would be really appreciated if you could tell me the final performance.

How to solve 'ZeroDivisionError: float division by zero' problem?

My GPU memory is 2G.
Therefore, I use batch=1 to run this program.
However, I meet the error force me to stop.
What should I do to get similar result of this paper in my configuration?
Thank you~

Error:
Traceback (most recent call last):
File "./t.py", line 1098, in
train(epoch)
File "./t.py", line 1059, in train
loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "./t.py", line 984, in forward
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.data[0]/num_pos, cls_loss.data[0]/num_pos), end=' | ')
ZeroDivisionError: float division by zero

Various Small Fixes

  • New convolution layer's initializations doesn't match the original RetinaNet paper. The bias should be set to 0 and the weights to a normal distribution weight fill with σ = 0.01.
  • The last layer of classification and object detection SubNets don't have proper initialization (this might explain the NaN errors, as they noted in the paper that not doing this causes them). The bias of each should be set to (according to the π value selected in the paper):
π = 0.01
 − log((1 − π)/π) = -4.59511985013459
  • ResNet isn't pretrained. This seems very fixable. My current solution is to subclass ResNet and then override the forward method. This allows for the weights to be loaded, but for the network to have a completely different output. You structured your model different, so I'm not sure if this would work for your structure.
  • COCO dataloader is missing. Seems like something that would be good to see if works, as VOC seems to work great (judging from loss).
  • A detection script that can predict boxes on an image would be helpful to see the network's outputs. I cannot figure out how to do this in an elegant way, so would love your input on this. I'm confused how this implementation works with anchor points and how to transform the predictions back to image space.
  • Activation function should be Sigmoid You fixed this 👍
  • Learning rate training schedule does not match the paper, but the paper was working with a pretrained ResNet and training on the COCO dataset.
  • The RetinaNet paper recommends flip dataset augmentation - seems very easy to add.

Anway, just let me know what you think and I can open a pull request for these things if you would like. I have written most of these fixes for work and would love to merge into your repo.

focal loss back propagation

This is not an issue but a question.

I think the the term (1-p)^gamma and p^gamma in focal loss are for weighing only. They should not be back propagated during gradient descent. Am I correct?

If so, do you need to detach() your variables for computing the weight terms in your focal loss function?

Questions about datagen.py

Hi, I met a question that When I want to use this project to train WIDER-Face-Detection dataset. And I have create label to meet your script like this(name.jpg xmin ymin xmax ymax), When I want to debug to see whether I am right, I saw it print the data shape from dataloader like this.
(1L, 3L, 600L, 600L)
(1L, 67995L, 4L)
(1L, 67995L)
why it becomes 67995 dimension, I found that DataSet getitem method create data shape is
(1L, 3L, 600L, 600L)
(1L, 4L)
(1L)
I did'n use pytorch before, is something wrong with pytorch? or my label file?

transform_issues

boxes[:,1::2].clamp_(min=0, max=h-1) The boxes ymax not compressed

关于rpn网络的问题

请问您的代码中是没有用rpn网络吗?,感觉是直接就在最后的每个feature map的所有像素点上进行最终的分类和回归,剪掉了中间的rpn网络。

MAP calculation

Hi, Can you provide for evaluation also...to get MAP after training. Thanks

Loss is NaN

When I'm training the retinanet,cls_loss is easy to become NaN.Does anyone know the reason?

loc_loss: 0.084 | cls_loss: nan | train_loss: nan | avg_loss: nan

how to run train.py???

python train.py
==> Preparing data..

Epoch: 0
Traceback (most recent call last):
File "train.py", line 114, in
train(epoch)
File "train.py", line 75, in train
loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)
File "/home/hs/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/hs/hs/pytorch/pytorch-retinanet-580/loss.py", line 99, in forward
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item()/num_pos, cls_loss.item()/num_pos), end=' | ')
File "/home/hs/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 320, in rdiv
return self.reciprocal() * other
RuntimeError: reciprocal is not implemented for type torch.cuda.LongTensor

Is there any pretrain model available?

This pytorch version is quite helpful for research! I want to do a transfer learning on other datasets. Is there anyone willing to share the pretrain model on voc or coco? Thanks very much!

alternative loss is numerically not stable

loss = -w*pt.log() / 2

because of this line, that loss function is numerically not stable.
for instance, pt.log() would be -inf when pt value is going to zero.
pt value is sigmoid encoding value. so it can be zero.

测试时遇到问题

print('Loading model..')
net = RetinaNet()
net.load_state_dict(torch.load('./checkpoint/params.pth')['net'])
net.eval()

Specify retain_graph=True when calling backward the first time

hello, when I run retinanet.py, it shows error:
Traceback (most recent call last): File "retinanet.py", line 58, in <module> test() File "retinanet.py", line 56, in test cls_preds.backward(cls_grads) File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 89, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
how to fix it? can you give some advises? thankyou

Undefined names: TOTAL_BAR_LENGTH, last_time, term_width

Perhaps these four lines should not be commented out. https://github.com/kuangliu/pytorch-retinanet/blob/master/utils.py#L201-L204

flake8 testing of https://github.com/kuangliu/pytorch-retinanet on Python 2.7.13

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./utils.py:211:19: F821 undefined name 'TOTAL_BAR_LENGTH'
    cur_len = int(TOTAL_BAR_LENGTH*current/total)
                  ^

./utils.py:212:20: F821 undefined name 'TOTAL_BAR_LENGTH'
    rest_len = int(TOTAL_BAR_LENGTH - cur_len) - 1
                   ^

./utils.py:223:28: F821 undefined name 'last_time'
    step_time = cur_time - last_time
                           ^

./utils.py:235:20: F821 undefined name 'term_width'
    for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3):
                   ^

./utils.py:235:35: F821 undefined name 'TOTAL_BAR_LENGTH'
    for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3):
                                  ^

./utils.py:239:20: F821 undefined name 'term_width'
    for i in range(term_width-int(TOTAL_BAR_LENGTH/2)):
                   ^

./utils.py:239:35: F821 undefined name 'TOTAL_BAR_LENGTH'
    for i in range(term_width-int(TOTAL_BAR_LENGTH/2)):
                                  ^

7       F821 undefined name 'TOTAL_BAR_LENGTH'

please help me,I don't know how to run the train.py

can you tell me ?

Traceback (most recent call last):
File "train.py", line 15, in
from loss import FocalLoss
File "H:\datasets\tianchi_lvcai\tianchi_lvcai_fusai\pytorch-retinanet-master\loss.py", line 7, in
from utils import one_hot_embedding
File "H:\datasets\tianchi_lvcai\tianchi_lvcai_fusai\pytorch-retinanet-master\utils.py", line 242, in
_, term_width = os.popen('stty size', 'r').read().split()
ValueError: not enough values to unpack (expected 2, got 0)

encoder

@kuangliu , i am using pytorch 0.12, but i find that in DataEncoder class, the encoder function, "boxes=boxes[max_idx]" can not work, do you mean that this operation can enlarge the size of boxes tensor to anchor size? I can not use this operation due to the different version??

Thanks

index out of range

When I use this model to train coco2017, the num_classes is set to 80.But there is an error likes this:
RuntimeError: index out of range at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/TH/generic/THTensorMath.c:277
which is occured at “pytorch-retinanet/utils.py", line 230, in one_hot_embedding” return y[labels].
So what happened,why?How can I solve this problem?

Expected object of type torch.LongTensor but found type torch.FloatTensor

the torch version:torch 0.4.1
When i use python 2.7 to train the project ,found a problem as follow:
`` python train.py
==> Preparing data..

Epoch: 0
Traceback (most recent call last):
File "train.py", line 114, in
train(epoch)
File "train.py", line 68, in train
for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/wq/retinaNet_pytorch/pytorch-retinanet-master_old/datagen.py", line 120, in collate_fn
loc_target, cls_target = self.encoder.encode(boxes[i].long(), labels[i].long(), input_size=(w,h))
File "/home/wq/retinaNet_pytorch/pytorch-retinanet-master_old/encoder.py", line 78, in encode
anchor_boxes = self._get_anchor_boxes(input_size)
File "/home/wq/retinaNet_pytorch/pytorch-retinanet-master_old/encoder.py", line 52, in _get_anchor_boxes
xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2)
RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other

I trained this project with voc2012 ,downloaded by myself , i tried to correct this problem, but caused some others problem similarly .

Question about `anchor_areas`

This init func in encoder.py first sets the anchor area of the corresponding feature map (p3 --> p7):
self.anchor_areas = [32 * 32., 64 * 64., 128 * 128., 256 * 256., 512 * 512.]
and then combines with the anchor location:

wh = self.anchor_wh[i].view(1, 1, 9, 2).expand(fm_h, fm_w, 9, 2)
box = torch.cat([xy, wh], 3)

I do think the anchor areas should be adjusted by the actual object size, especially when the input image is small. Given that we encode the boxes in advance, we should take care of the setting of anchor areas.

Is this understanding right?

value cannot be converted to type double without overflow:

Is it required to install specific pytorch or do some trick to get rid of this?

==> Preparing data..

Epoch: 0
loc_loss: 0.116 | cls_loss: 3791.763 | train_loss: 3791.878 | avg_loss: 3791.878
loc_loss: 0.088 | cls_loss: 1283.638 | train_loss: 1283.725 | avg_loss: 2537.802
loc_loss: 0.093 | cls_loss: 8380.014 | train_loss: 8380.107 | avg_loss: 4485.237
loc_loss: 0.095 | cls_loss: 2.312 | train_loss: 2.407 | avg_loss: 3364.530
Traceback (most recent call last):
File "train.py", line 114, in
train(epoch)
File "train.py", line 75, in train
loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/xiangyong/Workbench/pytorch-retinanet-kuangliu/loss.py", line 92, in forward
cls_loss = self.focal_loss_alt(masked_cls_preds, cls_targets[pos_neg])
File "/home/xiangyong/Workbench/pytorch-retinanet-kuangliu/loss.py", line 60, in focal_loss_alt
return loss.sum()
RuntimeError: value cannot be converted to type double without overflow: inf

Issue collection

Hey guys, I'm super busy the two weeks. Finally I get some time working on this.
For now, let's fix the issue one by one.

@njtuzzy:

@kuangliu @Mendel1 In the encoder file, the output of "get_anchor_boxes" is the "xcenter,ycenter, xwidth, ywidth" format, it seems that it does not need to change to xxwh(I guess you mean xywh) using change_box_order function?

anchor_boxes is ordered as xywh,
boxes is changed from xyxy to xywh with change_box_order:

boxes = change_box_order(boxes, 'xyxy2xywh')

Now they are both xywh. Any problems?

Out of memory

@kuangliu I am using a titian X graphic card with 12G memory , somehow, when i use the bach_size=2, there are still out or memory, may i ask the configuration of hardware and software you are using? thanks

smoothL1loss size_average

In F.smoothL1loss, if "size_average" is True, loss / (samples_num * location_vector_length), you set "size_average" to False, but only devided by samples_num. Why is it?

How to solve [Loss = "Nan"] problem??

When I trained with the most recent version(commit : fda946, using focal_loss_alt()), Loss is stiil Nan.
same result as using the previous focal_loss()...

Is there any additional settings ?
Is there any solution ?

Thank you.

Focal loss is very less

Hi,

I have implemented your code and it worked properly but have the following concerns

My sudo code works like this
cls_targets = [batch_size, anchor_boxes, classes] # classes is 21 (voc_labels+background) [16, 67995, 21]
cls_preds = [batch_size, anchor_boxes] # anchor_boxes number ranges from -1 to 20 [67995, 21]

Now I remove all the anchor boxes with -1 (ignore_boxes)
cls_targets = [batch_size * valid_anchor_boxes, classes] # [54933, 21]
cls_preds = [batch_size * valid_anchor_boxes, classes] # [54933, 21] This is one hot encoding vector

Now, I followed your code and implemented focal loss as it is but My loss values are coming very less. Like random values is giving a score of 0.12 and quickly the loss is going 0.0012 and small

is der I am missing something:

class FocalLoss_tensorflow(nn.Module):
    def __init__(self, num_classes=20,
                focusing_param = 2.0, 
                balance_param=0.25):
        super(FocalLoss_2, self).__init__()
        self.num_classes = num_classes
        self.focusing_param = focusing_param
        self.balance_param = balance_param 
    
    def focal_loss(self, x, y):
        """
        """
        x  = x[:, 1:]
        sigmoid_p = F.sigmoid(x)
        anchors, classes = x.shape 
        
        t = torch.FloatTensor(anchors, classes+1)
        t.zero_()
        t.scatter_(1, y.data.cpu().view(-1, 1), 1)
        t = Variable(t[:, 1:]).cuda()
        
        zeros = Variable(torch.zeros(sigmoid_p.size())).cuda()
        pos_p_sub = ((t >= sigmoid_p).float() * (t-sigmoid_p)) + ((t < sigmoid_p).float() * zeros)
        neg_p_sub = ((t >= zeros).float() * zeros) + ((t <= zeros).float() * sigmoid_p)
        
        per_entry_cross_ent = (-1) * self.balance_param * (pos_p_sub ** self.focusing_param) * torch.log(torch.clamp(sigmoid_p, 1e-8, 1.0)) -(1-self.balance_param) * (neg_p_sub ** self.focusing_param) * torch.log(torch.clamp(1.0-sigmoid_p, 1e-8, 1.0))
        return per_entry_cross_ent.mean()
        
        
    
    def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
        batch_size, num_boxes = cls_targets.size()
        pos = cls_targets > 0
        num_pos = pos.data.long().sum()

        mask = pos.unsqueeze(2).expand_as(loc_preds)
        masked_loc_preds = loc_preds[mask].view(-1,4)
        masked_loc_targets = loc_targets[mask].view(-1,4)
        loc_loss = F.smooth_l1_loss(masked_loc_preds, masked_loc_targets, size_average=False)
        loc_loss = loc_loss/num_pos

        pos_neg = cls_targets > -1
        mask = pos_neg.unsqueeze(2).expand_as(cls_preds)
        masked_cls_preds = cls_preds[mask].view(-1, self.num_classes)
        cls_loss = self.focal_loss(masked_cls_preds, cls_targets[pos_neg])
        return loc_loss, cls_loss

Question1:
I am still not getting quite write, if I should use 0 as my background class and how normalization is done while focal loss is applied.

classification branch

@kuangliu HI

For focal loss. The classification branch uses the sigmoid function.

Why background class is considered in the classification branch?? such as coco, num_classes=80 instead of 81.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.