detectionteamucas / retinanet_tensorflow_rotation Goto Github PK
View Code? Open in Web Editor NEWFocal Loss for Dense Rotation Object Detection
License: MIT License
Focal Loss for Dense Rotation Object Detection
License: MIT License
Your work is very helpful. Well done! But it is frustrating to wait for an hour to download a 100MB file. It makes me feel that I'm living in twenty years ago.
I cropped 800x800, overlap=200, and get oom for a few steps(~ 300). I never meet this problem when training on your R2CNN project, even cropped to 1024x1024. Does RetinaNet need more memory? How to deal with this problem......
Hi,
Many thanks for the great repo. However I have noticed that the link in the trained_weigths README.md ( https://github.com/DetectionTeamUCAS/Models/RetinaNet_Tensorflow ) seems to be a 404. Do you plan to publish the weights for your detector, now or in the future ?
Thanks again for the code :)
请问resnet101_v1d.ckpt和resnet50_v1d.ckpt这两个文件应该在哪下载,里面提供的链接下载不了了
When I train with my own data, the loss values often appear at 0.000 and Nan. It's strange that with the same data, it's okay to train the R2CNN you share.
在更换数据集时总是出现
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 8, current size 0) 错误,请问应该如何解决这个问题。
when i load the resnet50_v1d chekpoint, it shows that :
NotFoundError (see above for traceback)Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Tensor name "resnet50_v1d/C1/conv0/BatchNorm/beta" not found in checkpoint files './data/pretrained_weights/resnet50_v1d.ckpt'
I download the resnet50_v1.ckpt from the link in readme, can u give me some suggestions aobut this ?
thanks!
ModuleNotFoundError: No module named 'libs.box_utils.rbbox_overlaps'
i see the relevant set in setup.py,but i cannot import the .cpp file,could you please tell me that what makes this problem?
您好!打扰了!我想请教一下两个问题问题。
在这份代码中:
一次 epoch 是否是遍历完一次 所有sample?
如果是这样的话,遍历完一次所需要的 step 是否是等于 sample/batch_size?
因为我相修改batch_size~~
感谢!
hello! I have issue on using my own dataset.
tested the code on tensorflow 1.15.0 version.
cuda 10.1
I met
Out of range: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
error with own dataset tfrecord file.
I used your convert_data_to_tfrecord.py file but I didn't do the data crop before that.
Is this error comes from not cropping my data to 600x600?
and yes my datasets are all have different size...
it would be very thankful to give me some help!
感谢!
如果是单卡,训练时,只需要把gpu的id改了就可以,还是训练的代码也要作一些相应的改动
我已经在DOTA数据集下跑通了默认配置,但是换了自己的数据,和DOTA数据差不多,也是遥感的,就会出现
out of memory
invalid argument
an illegal memory access was encountered
an illegal memory access was encountered
1.我采用的是更新后的代码进行训练,先将训练集图片进行裁剪 裁剪后有两万多张 由于我的gpu很差速度很慢目前只训练了5k步 但我用训练结果进行测试 图片上完全没有结果 是哪里有问题吗(虽然没到一个epoch 但一个结果都没有总感觉有些奇怪)
2.我用tensorboard查看训练结果 totalloss降到了0.9左右 gtboxes_h和gtbox_r都有框 但是final_detection完全没有框 这种结果是正常的吗?
Little question: Why did you use sigmoid instead of softmax there?
how to test a image with this net,and where should I download and put the weight file?
please ...
error information:
python setup.py build_ext --inplace
running build_ext
skipping 'bbox.c' Cython extension (up-to-date)
skipping 'nms.c' Cython extension (up-to-date)
building 'cython_bbox' extension
creating build
creating build/temp.linux-x86_64-3.7
{'gcc': ['-Wno-cpp', '-Wno-unused-function']}
gcc -pthread -B /home/huangwei/anaconda3/envs/tensorflow-R3det/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/huangwei/anaconda3/envs/tensorflow-R3det/lib/python3.7/site-packages/numpy/core/include -I/home/huangwei/anaconda3/envs/tensorflow-R3det/include/python3.7m -c bbox.c -o build/temp.linux-x86_64-3.7/bbox.o -Wno-cpp -Wno-unused-function
bbox.c: In function ‘__Pyx__ExceptionSave’:
bbox.c:9439:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’
*type = tstate->exc_type;
^
bbox.c:9440:20: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’
*value = tstate->exc_value;
^
bbox.c:9441:17: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’
*tb = tstate->exc_traceback;
^
bbox.c: In function ‘__Pyx__ExceptionReset’:
Hi,
Can you share the cfgs file for ResNet152_v1 the most accurate model ?
I would like to try this on my dataset.
Hello, thank you for your wonderful works!
I have an question about where can i find the implementation code of FRM, which can found in your paper, from your source!
thank you! :)
environments:
ubuntu 18.04 cuda 10.0 tensorflow-gpu1.13.1
curexc_type
bbox.c:9512:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
bbox.c:9513:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
error: command 'gcc' failed with exit status 1
Makefile:2: recipe for target 'all' failed
make: *** [all] Error 1
Does this code support train in one GPU with 2 batchsize?
I find that in multi-gpu-read-tfrecord.py it does not limit the batchsize, while in the read-tf-record.py, it limits that batchsize must be 1.
So, can I train with 2 batchsize in one GPU?
训练过程中会随机出现这个错误,有的时候训练了200000step才出现,有时候很快就出现,我在github的opencv项目下看到也有其他人提到了这个问题,说是可以在intersection.cpp里面把float改成double,但是我并没有找到这个文件;另外我把opencv3.4.2版本给conda remove了,不知道程序为什么还能跑起来,也没报No module named 'cv2'的错误,不知道是什么原因。有谁遇到了这个问题么?
Waiting for the Mask-RCNN implementation.
This is just 5% more accuracy than the previous best.
I am freezing the trained weights to pb for C++ use, but the py_function 'generate_anchors_pre' can not be frozen.So i need to use a tf function of 'make_anchors'.
I found that in anchor_utils, there already has one, but you did not use it. So if there is any problem in the function 'anchor_utils.make_anchors' ?
/home/lyy/hq/RetinaNet_Tensorflow_Rotation-master/libs/box_utils/bbox_transform.py:99: RuntimeWarning: divide by zero encountered in log
targets_dh = np.log(gt_rois[:, 3] / ex_rois[:, 3])
训练过程中出现了这种错误,请问怎么解决呢?
I am successfully training with a minor issue under cuda 10 and TensorFlow 1.14. environment.
I want to ask two things
MUTILPY_BIAS_GRADIENT = 2.0 # if None, will not multipy
GRADIENT_CLIPPING_BY_NORM = 10.0 # if None, will not clip
CLS_WEIGHT = 1.0
REG_WEIGHT = 1.0
USE_IOU_FACTOR = False
BATCH_SIZE = 1
EPSILON = 1e-5
MOMENTUM = 0.9
LR = 5e-4
DECAY_STEP = [SAVE_WEIGHTS_INTE12, SAVE_WEIGHTS_INTE16, SAVE_WEIGHTS_INTE20]
**MAX_ITERATION = SAVE_WEIGHTS_INTE20**
WARM_SETP = int(1.0 / 4.0 * SAVE_WEIGHTS_INTE)
Issue is below
WARNING:tensorflow:Entity <bound method Conv.call of <tensorflow.python.layers.convolutional.Conv2D object at 0x7fb846a0c550>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, setthe verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10
) and attach the full output. Cause: converting <bound methodConv.call of <tensorflow.python.layers.convolutional.Conv2D object at 0x7fb846a0c550>>: AssertionError: Bad argument numberfor Name: 3, expecting 4
①tfrecord读取问题:
按照作者的方法生成tfrecord文件,首先要裁剪图片,生成xml文件,但是由于有的txt文件在坐标数据上面多了两行不需要的文字,这会导致生成xml文件出错,可以在代码中略过这两行。
②训练的时候提示shufflebatch=0:
网上百度的结果都是说什么未初始化,我从next_batch代码中分析,应该是未正确从tfrecord文件中读取图片和标签,我调试了一下,发现是相对路径无法找到,于是我改成了绝对路径就可以了。
③预训练模型问题:
要将完整的res101.ckpt,resnet101_v1d.ckpt.index.......等等这些文件全放入预训练文件夹
④找不到bbox啥函数或者文件:
要根据作者的方法编译setup文件,在win10是无法编译的。
你好,我在iou_rotate.py中看到,这里计算使用的是opencv来计算overlap,为什么没用gpu呢, 这两个计算方式有什么差别吗
很奇怪,我按照作者大佬提供的配置文件没有做修改,直接拿来训练DOTA,但是比正常情况低了10个点,然后我看了每一类的情况,像small-vehicle ,mAP应该能达到0.66,但我这里mAP只有0.18,之前做R2CNN也是small-vehicle的mAP非常低,然后整体mAP差不多低了10个点。我怀疑是否给的anchor的size太大了,检测不到小物体?另外是否还要手动改下cfg.py的参数才能达到给出的mAP?
我是用的一块RTX2080进行测试,下面是我的预测时保存的几张图片,有些效果很好基本都能检测到,有些就很明显一个small-vehicle都没检测到,想请教下可能是什么原因造成的?
(前两张图效果还行,后面两张图,所有small-vehicle都没检测到)
hi,i can't find the SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects and the link of it is unused, so can you give some help.
I have trained a model but why is the output more than often making a plus sign. So the horizontal detection is coming along with another detection vertically making a plus sign. I don't know why is it happening ?
When I ran "python multi_gpu_train.py", I got an error importing rbbx_overlaps as follows
Traceback (most recent call last):
File "hello_world.py", line 17, in
from libs.networks import build_whole_network
File "../libs/networks/build_whole_network.py", line 13, in
from libs.losses import losses
File "../libs/losses/losses.py", line 9, in
from libs.box_utils.iou_rotate import iou_rotate_calculate2
File "../libs/box_utils/iou_rotate.py", line 10, in
from libs.box_utils.rbbox_overlaps import rbbx_overlaps
ImportError: ../libs/box_utils/rbbox_overlaps.cpython-35m-x86_64-linux-gnu.so: failed to map segment from shared object
All my packages have the correct versions. I have run the "python setup.py build_ext --inplace" commands to generate those .so files. Any suggestions?
By the way, I use English since my laptop has no Pinyin input. Please feel free to reply in Chinese. Thank you.
感谢作者,前几天更新了代码,目前训练我的数据集loss不会出现Nan了,但是依旧会出现loss为0的情况,如下所示:
[2019-07-31 19:41:10] global_step:1035 current_step:1035 per_cost_time:1.036s
cls_loss:1.436 reg_loss:0.588 total_losses:2.024
[2019-07-31 19:41:15] global_step:1040 current_step:1040 per_cost_time:1.011s
cls_loss:1.184 reg_loss:0.614 total_losses:1.798
[2019-07-31 19:41:20] global_step:1045 current_step:1045 per_cost_time:1.066s
cls_loss:0.467 reg_loss:0.000 total_losses:0.467
[2019-07-31 19:41:26] global_step:1050 current_step:1050 per_cost_time:1.079s
cls_loss:1.418 reg_loss:0.469 total_losses:1.887
[2019-07-31 19:41:31] global_step:1055 current_step:1055 per_cost_time:1.070s
cls_loss:1.206 reg_loss:0.575 total_losses:1.781
[2019-07-31 19:41:36] global_step:1060 current_step:1060 per_cost_time:1.025s
cls_loss:1.335 reg_loss:0.463 total_losses:1.798
[2019-07-31 19:41:42] global_step:1065 current_step:1065 per_cost_time:1.039s
cls_loss:1.209 reg_loss:0.550 total_losses:1.759
[2019-07-31 19:41:47] global_step:1070 current_step:1070 per_cost_time:0.951s
cls_loss:1.000 reg_loss:0.512 total_losses:1.512
[2019-07-31 19:41:52] global_step:1075 current_step:1075 per_cost_time:1.074s
cls_loss:0.891 reg_loss:0.636 total_losses:1.526
Can you update the url or upload to baiduyun? Thanks! @yangxue0827
作者你好,我在训练的时候也遇到了loss是nan或之regloss是0.0000的情况,另外就是一开始我用1024*1024的图训练出现了
out of memory
invalid argument
an illegal memory access was encountered
an illegal memory access was encountered
2019-07-26 12:37:44.321824: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321869: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7faecb400000
2019-07-26 12:37:44.321836: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321837: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321896: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7faec9800000
2019-07-26 12:37:44.321903: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7faeccc00000
2019-07-26 12:37:44.321892: E tensorflow/stream_executor/stream.cc:318] Error recording event in stream: error recording CUDA event on stream 0x55dba75d5170: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-07-26 12:37:44.321837: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-07-26 12:37:44.321971: I tensorflow/stream_executor/stream.cc:4793] stream 0x55dba725bbb0 did not memcpy host-to-device; source: 0x7fb050800000
2019-07-26 12:37:44.321974: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2019-07-26 12:37:44.321981: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:206] Unexpected Event status: 1
在Linux下可以用默认的gcc的编译器编译出box_utils和cython_utils文件夹下的几个文件,但在windows下setup.py文件并不适用,是否有人能提供修改的setup.py文件,使得这个项目可以在windows下运行?
安装并运行了代码,测试了几张无人机俯视拍摄的包含汽车的图片和dota数据集中的几张图片,使用--show_box成功保存处理后的图像,但是图上都没有画出框。检查了test_dota.py中间变量,发现det_boxes_r_均返回空数组。我的环境是anaconda3+ python3.5+ tensorflow1.12+cuda 9.1。cfgs.py文件中只修改了NET_NAME = 'resnet_v1_50'。换成其他预训练网结果相同:程序可以跑通但是没有检测到任何结果。
同样的图片在R2CNN_Faster-RCNN_Tensorflow算法上运行有结果,图上有画框。
请问有什么建议吗?
你好,请问训练retinanet回归旋转框时,loss下降的很稳定,然后突然回归部分的loss变成nan或者inf,debug发现从backbone出来的特征图已经nan了,可能是什么原因呢?调低学习率也没有解决
配置文件cfgs.py中 IMG_SHORT_SIDE_LEN ,IMG_MAX_LENGTH 两个参数是什么含义?如果我训练的数据集图片大小为1024*1024的话,这两个参数需要修改马?其他文件需要修改吗?
作者你好,输入图片尺寸必须是600或者800吗,另外学习率在设置的时候与gpu的数量及batch_size有什么关系
Here is the lines I changed in my cfgs.py
VERSION = 'resnet_v1_50_20190729' # a new name
NET_NAME = 'resnet50_v1d' # 'MobilenetV2'
Then I started to train and had a series of files in ./output/trained_weights/resnet_v1_50_20190729, including:
checkpoints
DOTA_1000model.ckpt.meta
DOTA_1000model.ckpt.index
DOTA_1000model.ckpt.data-00000-of-00001
etc.
Then I tested and got following errors:
2019-07-29 21:13:08.984880: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key resnet50_v1d/C1/conv0/BatchNorm/beta not found in checkpoint
...
tensorflow.python.framework.errors_impl.NotFoundError: Key resnet50_v1d/C1/conv0/BatchNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
...
tensorflow.python.framework.errors_impl.NotFoundError: Key resnet50_v1d/C1/conv0/BatchNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at ../libs/networks/build_whole_network.py:286) ]]
...
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
...
This error is similar to #14. I put those DOTA_1000model.ckpt files into ./data/pretrained_weights and it didn't help. Any suggestions?
hello, can you provide the pretrained_weights about mobilenetv2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.