Focal Loss for Dense Rotation Object Detection

License: MIT License

Python 97.42% Makefile 0.01% C++ 0.05% Cuda 2.52%

retinanet_tensorflow_rotation's Issues

Please do not use baidu netdisk to store models

Your work is very helpful. Well done! But it is frustrating to wait for an hour to download a 100MB file. It makes me feel that I'm living in twenty years ago.

OOM using RTX 2080

I cropped 800x800, overlap=200, and get oom for a few steps(~ 300). I never meet this problem when training on your R2CNN project, even cropped to 1024x1024. Does RetinaNet need more memory? How to deal with this problem......

rbbox_overlaps计算的结果与cv2计算的不一样，而rotate_gpu_nms正常

Missing trained weights ?

Hi,

Many thanks for the great repo. However I have noticed that the link in the trained_weigths README.md ( https://github.com/DetectionTeamUCAS/Models/RetinaNet_Tensorflow ) seems to be a 404. Do you plan to publish the weights for your detector, now or in the future ?

Thanks again for the code :)

预训练模型文件

请问resnet101_v1d.ckpt和resnet50_v1d.ckpt这两个文件应该在哪下载，里面提供的链接下载不了了

When I train , the loss values often appear at 0.000 and Nan

When I train with my own data, the loss values often appear at 0.000 and Nan. It's strange that with the same data, it's okay to train the R2CNN you share.

训练程序卡在_, global_stepnp, summary_str = sess.run([train_op, global_step, summary_op])不动了

作者你好，我运行训练程序时，程序会卡在下面这句，为什么啊

数据类型的问题

在更换数据集时总是出现
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 8, current size 0) 错误，请问应该如何解决这个问题。

About load checkpoint

when i load the resnet50_v1d chekpoint, it shows that :

NotFoundError (see above for traceback)Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Tensor name "resnet50_v1d/C1/conv0/BatchNorm/beta" not found in checkpoint files './data/pretrained_weights/resnet50_v1d.ckpt'

I download the resnet50_v1.ckpt from the link in readme, can u give me some suggestions aobut this ?
thanks!

could you please tell me how to deal with the import problem？

ModuleNotFoundError: No module named 'libs.box_utils.rbbox_overlaps'

i see the relevant set in setup.py,but i cannot import the .cpp file，could you please tell me that what makes this problem?

关于 batch_size和epoch

您好！打扰了！我想请教一下两个问题问题。
在这份代码中：
一次 epoch 是否是遍历完一次所有sample？
如果是这样的话，遍历完一次所需要的 step 是否是等于 sample/batch_size？

因为我相修改batch_size～～
感谢！

about creating tfrecord from own dataset

hello! I have issue on using my own dataset.

tested the code on tensorflow 1.15.0 version.
cuda 10.1

I met

Out of range: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)

error with own dataset tfrecord file.
I used your convert_data_to_tfrecord.py file but I didn't do the data crop before that.
Is this error comes from not cropping my data to 600x600?
and yes my datasets are all have different size...

it would be very thankful to give me some help!

您好，请问何时能公布R3Det的源代码？

感谢！

GPU数量问题

如果是单卡，训练时，只需要把gpu的id改了就可以，还是训练的代码也要作一些相应的改动

训练自己数据出现了out of memory

我已经在DOTA数据集下跑通了默认配置，但是换了自己的数据，和DOTA数据差不多，也是遥感的，就会出现
out of memory
invalid argument
an illegal memory access was encountered
an illegal memory access was encountered

采用更新后的代码进行训练检测结果图片上没有画框结果

1.我采用的是更新后的代码进行训练，先将训练集图片进行裁剪裁剪后有两万多张由于我的gpu很差速度很慢目前只训练了5k步但我用训练结果进行测试图片上完全没有结果是哪里有问题吗（虽然没到一个epoch 但一个结果都没有总感觉有些奇怪）
2.我用tensorboard查看训练结果 totalloss降到了0.9左右 gtboxes_h和gtbox_r都有框但是final_detection完全没有框这种结果是正常的吗？

Sigmoid question

Little question: Why did you use sigmoid instead of softmax there?

RetinaNet_Tensorflow_Rotation/libs/networks/build_whole_network.py

Line 75 in b03a7ea

 rpn_box_probs = tf.sigmoid(rpn_box_scores, name='rpn_{}_classification_sigmoid'.format(level)) 

how to test a image

how to test a image with this net,and where should I download and put the weight file?
please ...

compile error

error information:

python setup.py build_ext --inplace
running build_ext
skipping 'bbox.c' Cython extension (up-to-date)
skipping 'nms.c' Cython extension (up-to-date)
building 'cython_bbox' extension
creating build
creating build/temp.linux-x86_64-3.7
{'gcc': ['-Wno-cpp', '-Wno-unused-function']}
gcc -pthread -B /home/huangwei/anaconda3/envs/tensorflow-R3det/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/huangwei/anaconda3/envs/tensorflow-R3det/lib/python3.7/site-packages/numpy/core/include -I/home/huangwei/anaconda3/envs/tensorflow-R3det/include/python3.7m -c bbox.c -o build/temp.linux-x86_64-3.7/bbox.o -Wno-cpp -Wno-unused-function
bbox.c: In function ‘__Pyx__ExceptionSave’:
bbox.c:9439:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’
*type = tstate->exc_type;
^
bbox.c:9440:20: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’
*value = tstate->exc_value;
^
bbox.c:9441:17: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’
*tb = tstate->exc_traceback;
^
bbox.c: In function ‘__Pyx__ExceptionReset’:

cfgs file for ResNet152_v1 ...

Hi,

Can you share the cfgs file for ResNet152_v1 the most accurate model ?
I would like to try this on my dataset.

About implementation of Feature refinement Module

Hello, thank you for your wonderful works!
I have an question about where can i find the implementation code of FRM, which can found in your paper, from your source!
thank you! :)

compile error

environments:
ubuntu 18.04 cuda 10.0 tensorflow-gpu1.13.1
curexc_type
bbox.c:9512:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
bbox.c:9513:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
error: command 'gcc' failed with exit status 1
Makefile:2: recipe for target 'all' failed
make: *** [all] Error 1

About train batchsize > 1 in one GPU?

Does this code support train in one GPU with 2 batchsize?

I find that in multi-gpu-read-tfrecord.py it does not limit the batchsize, while in the read-tf-record.py, it limits that batchsize must be 1.

So, can I train with 2 batchsize in one GPU?

error: (-215) intersection.size() <= 8 in function rotatedRectangleIntersection

训练过程中会随机出现这个错误，有的时候训练了200000step才出现，有时候很快就出现，我在github的opencv项目下看到也有其他人提到了这个问题，说是可以在intersection.cpp里面把float改成double，但是我并没有找到这个文件；另外我把opencv3.4.2版本给conda remove了，不知道程序为什么还能跑起来，也没报No module named 'cv2'的错误，不知道是什么原因。有谁遇到了这个问题么？

您好，请问非dota的自建数据集map如何计算，有没有参考的代码

from libs.box_utils.cython_utils.cython_bbox import bbox_overlaps报错

作者你好，这里代码报错是怎么回事

感谢你的工作！请问如何使用更强大的backbone，我遇到了问题！

Closed because of impoliteness

Mask-RCNN Model

Waiting for the Mask-RCNN implementation.
This is just 5% more accuracy than the previous best.

What is the difference between anchor_utils.make_anchors and generate_anchors.generate_anchors_pre?

I am freezing the trained weights to pb for C++ use, but the py_function 'generate_anchors_pre' can not be frozen.So i need to use a tf function of 'make_anchors'.
I found that in anchor_utils, there already has one, but you did not use it. So if there is any problem in the function 'anchor_utils.make_anchors' ?

出现了bbox被0除的问题

/home/lyy/hq/RetinaNet_Tensorflow_Rotation-master/libs/box_utils/bbox_transform.py:99: RuntimeWarning: divide by zero encountered in log
targets_dh = np.log(gt_rois[:, 3] / ex_rois[:, 3])
训练过程中出现了这种错误，请问怎么解决呢？

Simple question about training

I am successfully training with a minor issue under cuda 10 and TensorFlow 1.14. environment.
I want to ask two things

Does this code have Total epochs to automatically end the training or keep going unless I stop by myself?
but I can see It does save the weight in the middle of training and how many iterations are for one epoch.
------------------------------------------ Train config
RESTORE_FROM_RPN = False
FIXED_BLOCKS = 1 # allow 0~3
FREEZE_BLOCKS = [True, False, False, False, False] # for gluoncv backbone
USE_07_METRIC = True

MUTILPY_BIAS_GRADIENT = 2.0 # if None, will not multipy
GRADIENT_CLIPPING_BY_NORM = 10.0 # if None, will not clip

CLS_WEIGHT = 1.0
REG_WEIGHT = 1.0
USE_IOU_FACTOR = False

BATCH_SIZE = 1
EPSILON = 1e-5
MOMENTUM = 0.9
LR = 5e-4
DECAY_STEP = [SAVE_WEIGHTS_INTE12, SAVE_WEIGHTS_INTE16, SAVE_WEIGHTS_INTE20]
**MAX_ITERATION = SAVE_WEIGHTS_INTE20**
WARM_SETP = int(1.0 / 4.0 * SAVE_WEIGHTS_INTE)

Another question is about a warning issue, Do you have any idea if it is bad enough to influence the result or not? Thanks in advance!

Issue is below
WARNING:tensorflow:Entity <bound method Conv.call of <tensorflow.python.layers.convolutional.Conv2D object at 0x7fb846a0c550>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, setthe verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound methodConv.call of <tensorflow.python.layers.convolutional.Conv2D object at 0x7fb846a0c550>>: AssertionError: Bad argument numberfor Name: 3, expecting 4

compile error

给各位探探坑，讲下自己遇到并解决的问题

①tfrecord读取问题：
按照作者的方法生成tfrecord文件，首先要裁剪图片，生成xml文件，但是由于有的txt文件在坐标数据上面多了两行不需要的文字，这会导致生成xml文件出错，可以在代码中略过这两行。
②训练的时候提示shufflebatch=0：
网上百度的结果都是说什么未初始化，我从next_batch代码中分析，应该是未正确从tfrecord文件中读取图片和标签，我调试了一下，发现是相对路径无法找到，于是我改成了绝对路径就可以了。
③预训练模型问题：
要将完整的res101.ckpt，resnet101_v1d.ckpt.index.......等等这些文件全放入预训练文件夹
④找不到bbox啥函数或者文件：
要根据作者的方法编译setup文件，在win10是无法编译的。

关于IOU问题计算问题

你好，我在iou_rotate.py中看到，这里计算使用的是opencv来计算overlap，为什么没用gpu呢，这两个计算方式有什么差别吗

训练了550k steps，使用默认配置文件在DOTA v1.0上测试mAP只有0.49

很奇怪，我按照作者大佬提供的配置文件没有做修改，直接拿来训练DOTA，但是比正常情况低了10个点，然后我看了每一类的情况，像small-vehicle ，mAP应该能达到0.66，但我这里mAP只有0.18，之前做R2CNN也是small-vehicle的mAP非常低，然后整体mAP差不多低了10个点。我怀疑是否给的anchor的size太大了，检测不到小物体？另外是否还要手动改下cfg.py的参数才能达到给出的mAP？
我是用的一块RTX2080进行测试，下面是我的预测时保存的几张图片，有些效果很好基本都能检测到，有些就很明显一个small-vehicle都没检测到，想请教下可能是什么原因造成的？
（前两张图效果还行，后面两张图，所有small-vehicle都没检测到）

error in SCRDet link

hi,i can't find the SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects and the link of it is unused, so can you give some help.

Rotation is not accurate as R2CNN

I have trained a model but why is the output more than often making a plus sign. So the horizontal detection is coming along with another detection vertically making a plus sign. I don't know why is it happening ?

Import rbbx_overlaps fails

When I ran "python multi_gpu_train.py", I got an error importing rbbx_overlaps as follows

Traceback (most recent call last):
File "hello_world.py", line 17, in
from libs.networks import build_whole_network
File "../libs/networks/build_whole_network.py", line 13, in
from libs.losses import losses
File "../libs/losses/losses.py", line 9, in
from libs.box_utils.iou_rotate import iou_rotate_calculate2
File "../libs/box_utils/iou_rotate.py", line 10, in
from libs.box_utils.rbbox_overlaps import rbbx_overlaps
ImportError: ../libs/box_utils/rbbox_overlaps.cpython-35m-x86_64-linux-gnu.so: failed to map segment from shared object

All my packages have the correct versions. I have run the "python setup.py build_ext --inplace" commands to generate those .so files. Any suggestions?

By the way, I use English since my laptop has no Pinyin input. Please feel free to reply in Chinese. Thank you.

感谢作者的工作！训练自己的数据集仍旧会出现loss为0的情况！

感谢作者，前几天更新了代码，目前训练我的数据集loss不会出现Nan了，但是依旧会出现loss为0的情况，如下所示：

[2019-07-31 19:41:10] global_step:1035 current_step:1035 per_cost_time:1.036s
cls_loss:1.436 reg_loss:0.588 total_losses:2.024

[2019-07-31 19:41:15] global_step:1040 current_step:1040 per_cost_time:1.011s
cls_loss:1.184 reg_loss:0.614 total_losses:1.798

[2019-07-31 19:41:20] global_step:1045 current_step:1045 per_cost_time:1.066s
cls_loss:0.467 reg_loss:0.000 total_losses:0.467

[2019-07-31 19:41:26] global_step:1050 current_step:1050 per_cost_time:1.079s
cls_loss:1.418 reg_loss:0.469 total_losses:1.887

[2019-07-31 19:41:31] global_step:1055 current_step:1055 per_cost_time:1.070s
cls_loss:1.206 reg_loss:0.575 total_losses:1.781

[2019-07-31 19:41:36] global_step:1060 current_step:1060 per_cost_time:1.025s
cls_loss:1.335 reg_loss:0.463 total_losses:1.798

[2019-07-31 19:41:42] global_step:1065 current_step:1065 per_cost_time:1.039s
cls_loss:1.209 reg_loss:0.550 total_losses:1.759

[2019-07-31 19:41:47] global_step:1070 current_step:1070 per_cost_time:0.951s
cls_loss:1.000 reg_loss:0.512 total_losses:1.512

[2019-07-31 19:41:52] global_step:1075 current_step:1075 per_cost_time:1.074s
cls_loss:0.891 reg_loss:0.636 total_losses:1.526

the url to download the pretrained model weights are return 404..

Can you update the url or upload to baiduyun? Thanks! @yangxue0827

请问可不可以不需要预训练权重完全重新训练，具体应该如何操作。

训练中遇到了一些问题 out of memory invalid argument

作者你好，我在训练的时候也遇到了loss是nan或之regloss是0.0000的情况，另外就是一开始我用1024*1024的图训练出现了
out of memory
invalid argument
an illegal memory access was encountered
an illegal memory access was encountered
2019-07-26 12:37:44.321824: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321869: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7faecb400000
2019-07-26 12:37:44.321836: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321837: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321896: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7faec9800000
2019-07-26 12:37:44.321903: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7faeccc00000
2019-07-26 12:37:44.321892: E tensorflow/stream_executor/stream.cc:318] Error recording event in stream: error recording CUDA event on stream 0x55dba75d5170: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-07-26 12:37:44.321837: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321971: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7fb050800000
2019-07-26 12:37:44.321974: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2019-07-26 12:37:44.321981: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:206] Unexpected Event status: 1

Compile box_utils under window system.

在Linux下可以用默认的gcc的编译器编译出box_utils和cython_utils文件夹下的几个文件，但在windows下setup.py文件并不适用，是否有人能提供修改的setup.py文件，使得这个项目可以在windows下运行?

运行了代码但没有检测结果

安装并运行了代码，测试了几张无人机俯视拍摄的包含汽车的图片和dota数据集中的几张图片，使用--show_box成功保存处理后的图像，但是图上都没有画出框。检查了test_dota.py中间变量，发现det_boxes_r_均返回空数组。我的环境是anaconda3+ python3.5+ tensorflow1.12+cuda 9.1。cfgs.py文件中只修改了NET_NAME = 'resnet_v1_50'。换成其他预训练网结果相同：程序可以跑通但是没有检测到任何结果。

同样的图片在R2CNN_Faster-RCNN_Tensorflow算法上运行有结果，图上有画框。

请问有什么建议吗？

回归loss nan

你好，请问训练retinanet回归旋转框时，loss下降的很稳定，然后突然回归部分的loss变成nan或者inf,debug发现从backbone出来的特征图已经nan了，可能是什么原因呢？调低学习率也没有解决

配置文件问题

配置文件cfgs.py中 IMG_SHORT_SIDE_LEN ，IMG_MAX_LENGTH 两个参数是什么含义？如果我训练的数据集图片大小为1024*1024的话，这两个参数需要修改马？其他文件需要修改吗？

输入图片尺寸，学习率设置相关问题

作者你好，输入图片尺寸必须是600或者800吗，另外学习率在设置的时候与gpu的数量及batch_size有什么关系

Error of loading checkpoint when training finished

Here is the lines I changed in my cfgs.py

VERSION = 'resnet_v1_50_20190729' # a new name
NET_NAME = 'resnet50_v1d' # 'MobilenetV2'

Then I started to train and had a series of files in ./output/trained_weights/resnet_v1_50_20190729, including:
checkpoints
DOTA_1000model.ckpt.meta
DOTA_1000model.ckpt.index
DOTA_1000model.ckpt.data-00000-of-00001
etc.

Then I tested and got following errors:
2019-07-29 21:13:08.984880: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key resnet50_v1d/C1/conv0/BatchNorm/beta not found in checkpoint
...
tensorflow.python.framework.errors_impl.NotFoundError: Key resnet50_v1d/C1/conv0/BatchNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
...
tensorflow.python.framework.errors_impl.NotFoundError: Key resnet50_v1d/C1/conv0/BatchNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at ../libs/networks/build_whole_network.py:286) ]]
...
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
...

This error is similar to #14. I put those DOTA_1000model.ckpt files into ./data/pretrained_weights and it didn't help. Any suggestions?

the pretrained_weights about mobilenetv2

hello, can you provide the pretrained_weights about mobilenetv2

detectionteamucas / retinanet_tensorflow_rotation Goto Github PK

retinanet_tensorflow_rotation's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs