goatmessi8 / asff Goto Github PK

View Code? Open in Web Editor NEW

1.0K 25.0 216.0 2.11 MB

yolov3 with mobilenet v2 and ASFF

License: GNU General Public License v3.0

Python 72.19% Shell 0.02% C++ 5.53% Cuda 22.26%

asff's Introduction

Learning Spatial Fusion for Single-Shot Object Detection

By Songtao Liu, Di Huang, Yunhong Wang

Introduction

In this work, we propose a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF). It learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving the scale-invariance of features, and introduces nearly free inference overhead. For more details, please refer to our arXiv paper.

Updates:

YOLOX is here!, come and use the stronger YOLO!
Add MobileNet V2!
- The previous models actually are all trained with the wrong anchor setting, we fix the error on mobileNet model.
- We currently not support rfb, dropblock and Feature Adaption for mobileNet V2.
- FP16 training for mobileNet is not working now. I didn't figure it out.
- FP16 testing for mobileNet drops about 0.2 mAP.
Add a demo.py file
Faster NMS (adopt official implementation)

COCO

System	test-dev mAP	Time (V100)	Time (2080ti)
YOLOv3 608	33.0	20ms	26ms
YOLOv3 608+ BoFs	37.0	20ms	26ms
YOLOv3 608 (our baseline)	38.8	20ms	26ms
YOLOv3 608+ ASFF	40.6	22ms	30ms
YOLOv3 608+ ASFF*	42.4	22ms	30ms
YOLOv3 800+ ASFF*	43.9	34ms	38ms
YOLOv3 MobileNetV1 416 + BoFs	28.6	-	22 ms
YOLOv3 MobileNetV2 416 (our baseline)	29.0	-	22 ms
YOLOv3 MobileNetV2 416 +ASFF	30.6	-	24 ms

Citing

Please cite our paper in your publications if it helps your research:

@article{liu2019asff,
    title = {Learning Spatial Fusion for Single-Shot Object Detection},
    author = {Songtao Liu, Di Huang and Yunhong Wang},
    booktitle = {arxiv preprint arXiv:1911.09516},
    year = {2019}
}

Installation
Datasets
Training
Evaluation
Models

Installation

Install PyTorch-1.3.1 by selecting your environment on the website and running the appropriate command.
Clone this repository.
- Note: We currently only support PyTorch-1.0.0+ and Python 3+.
Compile the DCN layer (ported from DCNv2 implementation):

./make.sh

Prerequisites

We also use apex, numpy, opencv, tqdm, pyyaml, matplotlib, scikit-image...
- Note: We use apex for distributed training and synchronized batch normalization. For FP16 training, since the current apex version have some issues, we use the old version of FP16_Optimizer, and split the code in ./utils/fp_utils.
We also support tensorboard if you have installed it.

Demo

python demo.py -i /path/to/your/image \
--cfg config/yolov3_baseline.cfg -d COCO \
--checkpoint /path/to/you/weights --half --asff --rfb -s 608

Note:
- -i, --img: image path.
- --cfg: config files.
- -d: choose datasets, COCO or VOC.
- -c, --checkpoint: pretrained weights.
- --half: FP16 testing.
- -s: evaluation image size, from 320 to 608 as in YOLOv3.

Datasets

Note: We currently only support COCO and VOC.
To make things easy, we provide simple COCO and VOC dataset loader that inherits torch.utils.data.Dataset making it fully compatible with the torchvision.datasets API.

Moreover, we also implement the Mix-up strategy in BoFs and distributed random resizing in YOLov3.

COCO Dataset

Install the MS COCO dataset at /path/to/coco from official website, default is ./data/COCO, and a soft-link is recommended.

ln -s /path/to/coco ./data/COCO

It should have this basic structure

$COCO/
$COCO/annotations/
$COCO/images/
$COCO/images/test2017/
$COCO/images/train2017/
$COCO/images/val2017/

The current COCO dataset has released new train2017 and val2017 sets, and we defaultly train our model on train2017 and evaluate on val2017.

VOC Dataset

Install the VOC dataset as ./data/VOC. We also recommend a soft-link:

ln -s /path/to/VOCdevkit ./data/VOC

Training

First download the mix-up pretrained Darknet-53 PyTorch base network weights at: https://drive.google.com/open?id=1phqyYhV1K9KZLQZH1kENTAPprLBmymfP
or from our BaiduYun Driver
For MobileNetV2, we use the pytorch official weights (change the key name to fit our code), or from our BaiduYun Driver
By default, we assume you have downloaded the file in the ASFF/weights dir:
Since random resizing consumes much more GPU memory, we implement FP16 training with an old version of apex.
We currently ONLY test the code with distributed training on multiple GPUs (10 2080ti or 4 Tesla V100).
To train YOLOv3 baseline (ours) using the train script simply specify the parameters listed in main.py as a flag or manually change them on config/yolov3_baseline.cfg:

python -m torch.distributed.launch --nproc_per_node=10 --master_port=${RANDOM+10000} main.py \
--cfg config/yolov3_baseline.cfg -d COCO --tfboard --distributed --ngpu 10 \
--checkpoint weights/darknet53_feature_mx.pth --start_epoch 0 --half --log_dir log/COCO -s 608

Note:
- --cfg: config files.
- --tfboard: use tensorboard.
- --distributed: distributed training (we only test the code with distributed training)
- -d: choose datasets, COCO or VOC.
- --ngpu: number of GPUs.
- -c, --checkpoint: pretrained weights or resume weights. You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see main.py for options)
- --start_epoch: used for resume training.
- --half: FP16 training.
- --log_dir: log dir for tensorboard.
- -s: evaluation image size, from 320 to 608 as in YOLOv3.
To train YOLOv3 with ASFF or ASFF*, you only need add some addional flags:

python -m torch.distributed.launch --nproc_per_node=10 --master_port=${RANDOM+10000} main.py \
--cfg config/yolov3_baseline.cfg -d COCO --tfboard --distributed --ngpu 10 \
--checkpoint weights/darknet53_feature_mx.pth --start_epoch 0 --half --asff --rfb --dropblock \
--log_dir log/COCO_ASFF -s 608

Note:
- --asff: add ASFF module on YOLOv3.
- --rfb: use RFB moduel on ASFF.
- --dropblock: use DropBlock.

Evaluation

To evaluate a trained network, you can use the following command:

python -m torch.distributed.launch --nproc_per_node=10 --master_port=${RANDOM+10000} eval.py \
--cfg config/yolov3_baseline.cfg -d COCO --distributed --ngpu 10 \
--checkpoint /path/to/you/weights --half --asff --rfb -s 608

Note:
- --vis: Visualization of ASFF.
- --testset: evaluate on COCO test-dev.
- -s: evaluation image size.

By default, it will directly output the mAP results on COCO val2017 or VOC test 2007.

Models

yolov3 mobilenetv2 (ours)weights baiduYun training tfboard log
yolov3 mobilenetv2 +asff weights baiduYun training tfboard log
yolov3_baseline (ours) weights baiduYun training tfboard log
yolov3_asff weights baiduYun training tfboard log
yolov3_asff* (320-608) weights baiduYun
yolov3_asff* (480-800) weights baiduYun

asff's People

Contributors

Stargazers

Watchers

Forkers

zyg11 poodarchu zhenyingfang seeker1943 cosen1024 chiukin dexception hdjsjyl 2017tjm xiaojinu xrosliang chaos1992 robotseye leo-xxx zhyj3038 jacke121 mozpp anhnktp xychen9459 zuojianhao objectdetection note-liu yutao007 amirstudy xjsxujingsong felixzhang7 bokyliu azuredsky anhtt20172948 hsp0285 benjamesbabala dodogoffy liuwenhaha shengyuqing jeremy-yung angeloas royzon xiao543348405 zzfpython hwijune chkswiftly aliushn albertzengcn guobinli yueyedeai wolfworld6 eycab yangyin2016 shadowclouds smartgx sisrfeng vipermdl abelsara samjcheng lrsir labimage wangdeyu pikerbright sunting9999 lulujianjie klq-y djaym7 alphabetakappa wulele2 18242360613 dorniwang cliwo aipakchoi liulangxing brennagibbons xingyizhou chadng dbofseuofhust zilipeng daiwc tracyleaf riwaly lovepug-xc wishgale zyweven charlesnord pandasmx tecsai xuanzhangyang wwn1233 dataxujing youtang1993 kyuuki93 tspp520 mathpopo tonylibing jia-honghenrylee ningz7 szf2020 yaoqingyuan dugujiujian12321 lyp-deeplearning mrzhaopy abcxs wangzhaoxue000

asff's Issues

大佬有没有更小的模型？

例如darknet16 或者tiny的因为现在要落地，模型最好要50M一下。

TypeError: forward() takes 3 positional arguments but 4 were given

File "main.py", line 386, in main
    loss_dict = model.forward(imgs, targets, epoch)
  File "/home/workspace/git/python/detection/ASFF/models/yolov3_asff.py", line 149, in forward
    x, anchor_loss, iou_loss, l1_loss, conf_loss, cls_loss = header(fused, targets)
  File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/workspace/git/python/detection/ASFF/models/yolov3_head.py", line 265, in forward
    loss_wh = (self.l1_loss(output[...,2:4], l1_target[...,2:4],tgt_scale)).sum() / batchsize
  File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() takes 3 positional arguments but 4 were given

Hello, thank you for sharing your great idea and code.

When run training on the coco, I meet an error: TypeError: forward() takes 3 positional arguments but 4 were given as posted. And in the code:

self.l1_loss = nn.L1Loss(reduction='none')

While https://github.com/ruinmessi/ASFF/blob/master/models/yolov3_head.py#L264-L266

loss_xy = (tgt_scale*self.bcewithlog_loss(output[...,:2], l1_target[...,:2])).sum() / batchsize
loss_wh = (self.l1_loss(output[...,2:4], l1_target[...,2:4]),tgt_scale)).sum() / batchsize

Maybe, it should be：

loss_xy = (tgt_scale*self.bcewithlog_loss(output[...,:2], l1_target[...,:2])).sum() / batchsize
loss_wh = (tgt_scale*self.l1_loss(output[...,2:4], l1_target[...,2:4])).sum() / batchsize

how does baseline mean？

Replace FPN with ASFF

When I replace the FPN with ASFF in Retinaface, the model size is double, but the result is inferior to FPN.

ImportError：torchvision/_C.so:undefined symbol:

ImportError: /home/guobaozi/anaconda3/envs/PkuNet_apex/lib/python3.7/site-packages/torchvision/_C.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

python3.7
pytorch1.3
cuda10.1

details:
Traceback (most recent call last):
File "demo.py", line 136, in
demo()
File "demo.py", line 116, in demo
outputs = postprocess(outputs, num_class, 0.01, 0.65)
File "/media/guobaozi/C022AA4B225A6D42/guobaozi_cv/ASFF-master/utils/utils.py", line 62, in postprocess
detections_class[:, :4], detections_class[:, 4]*detections_class[:, 5], nms_thre)
File "/home/guobaozi/anaconda3/envs/PkuNet_apex/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 32, in nms
_C = _lazy_import()
File "/home/guobaozi/anaconda3/envs/PkuNet_apex/lib/python3.7/site-packages/torchvision/extension.py", line 12, in _lazy_import
from torchvision import _C as C
ImportError: /home/guobaozi/anaconda3/envs/PkuNet_apex/lib/python3.7/site-packages/torchvision/_C.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

tiny yolo3

Will ASFF benefit to tiny yolo3?

where is the code of asff????and how to train my own data??must in voc2007 format???

how to use asff to detect single picture or video?

who can provide your code to me,thank you

Did you ever have tried to use focalloss instead of L1loss?

Support CPU

This version has support running on cpu repository
thanks

mismatch between the .pth you give and the ASSF model

RuntimeError:Error(s) in loading state_dict for YOLOv3:
size mismatch for module_list.18.conv.weight:copying a param with shape
torch.Size([1024,512,3,3]) from checkpoint,the shape in current model is torch.Size([256,512,3,3])

How long takes to train ASFF model?

In the paper, you said that trained ASFF model for 300 epochs on 4 V100 GPUs. Could you share how long time you spent for 300 epochs training?

The entire network is trained with stochastic gradient descent (SGD) on 4 GPUs (NVDIA Tesla V100) with 16 images per GPU. All models are trained for 300 epochs with the first 4 epochs of warmup and the cosine learning rate schedule [26] from 0.001 to 0.00001.

which version of apex?

I tried to use 0.9.10.dev0 installed by pip, but some issues occurred.

vis model bug

utils/cocoapi_evaluator.py line 186 will change i value. When using the vis（True） mode, the image obtained below does not correspond to the outputs above. It can be modified to the following:

for ind in range(bboxes.shape[0]):
       label = self.dataset.class_ids[int(cls[ind])]
       A = {"image_id": id_, "category_id": label, "bbox": bboxes[ind].numpy().tolist(),
       "score": scores[ind].numpy().item(), "segmentation": []} # COCO json format
       data_dict.append(A)

Not good for detecting small objects ?

@ruinmessi
Thanks for sharing such a great work.
Iam trying your work with a dataset which every objects in the image are quite small.
Iam using 1 GPU for training only and the model trained without any errors. But the loss didnt decrease.
Maybe the problem comes from the fact that I didnt use so many methods that you had used for training this network such as synchronized batch normalization, FP16 training ...
But Iam wondering whether it is the main problem or this network is struggle with detecting small objects ?

Here is 1 image in the dataset that Iam using for training.

Error occuring when run demo.py, how can I solve it? THX

python demo.py -i example/test.jpg --cfg config/yolov3_baseline.cfg -d COCO --checkpoint weights/YOLOv3-baseline_38.8.pth --half --rfb -s 608

Missing key(s) in state_dict: "module_list.19.Feature_adaption.rfb.branch_0.0.weight", "module_list.19.Feature_adaption.rfb.branch_0.0.bias",

Training log of baseling / ASFF model？

Could you release the training log of baseline model & ASFF model? I tried to train a baseline model with less epochs (30) for quick testing, but I found that the validation mAP is quite low (0.08 at epoch 16). And the conf loss / clf loss seems not converge (near 100...). Is this normal?

have you try more bigger backbone?

have you try more bigger backbone?
Such as Resnet101 ResnXt101 or other?
how is that performe

Effect of DropBlock and RFB block

Hi. Thanks for sharing the code.

I had two questions.
In section 4.4 of the paper you mentioned that ,

final model is YOLOv3 with ASFF*, which is an enhancedASFFversion by integrating other lightweight modules (i.e.DropBlock [7] and RFB [23]) with 1.5×longer training timethan the models in Section 4.1.

How is the final model (YOLOv3 with ASFF*) performance without DropBlock and RFB modules.
How is the model (YOLOv3 baseline) performance with DropBlock and RFB modules.

This will help us in understanding the performance improvement due to ASFF block more clearly. Thanks.

How about voc 07+12trainval in 07 test dataset map ？I only have 2080Ti*2, can I reproduce this value？

2080ti环境问题

请问下作者2080ti环境用的啥，能详细告知下吗？谢谢

levels_weight[ format?

levels_weight_v = torch.cat((level_0_weight_v, level_1_weight_v, level_2_weight_v),1)
levels_weight = self.weight_levels(levels_weight_v)
levels_weight = F.softmax(levels_weight, dim=1)

    fused_out_reduced = level_0_resized * levels_weight[:,0:1,:,:]+\
                        level_1_resized * levels_weight[:,1:2,:,:]+\
                        level_2_resized * levels_weight[:,2:,:,:]

error in eval.py

HI, when I run the eval.py, the mismatch error always occurs.

In the terminal, the code is

python -m torch.distributed.launch --nproc_per_node=2 --master_port=${RANDOM+10000} eval.py \
--cfg config/yolov3_baseline.cfg -d COCO --distributed --ngpu 2 \
--checkpoint weights/YOLOv3-ASFF_40.6.pth --half --asff --rfb -s 608

What confused me is which pretrained model should I choose.

The errors are as follows:
size mismatch for level_0_fusion.weight_level_0.conv.weight: copying a param with shape torch.Size([16, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 512, 1, 1]). size mismatch for level_0_fusion.weight_level_0.batch_norm.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_0.batch_norm.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_0.batch_norm.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_0.batch_norm.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_1.conv.weight: copying a param with shape torch.Size([16, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 512, 1, 1]). size mismatch for level_0_fusion.weight_level_1.batch_norm.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_1.batch_norm.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_1.batch_norm.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_1.batch_norm.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_2.conv.weight: copying a param with shape torch.Size([16, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 512, 1, 1]). size mismatch for level_0_fusion.weight_level_2.batch_norm.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_2.batch_norm.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_2.batch_norm.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_level_2.batch_norm.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for level_0_fusion.weight_levels.weight: copying a param with shape torch.Size([3, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 24, 1, 1]). size mismatch for level_1_fusion.weight_level_0.conv.weight: copying a param with shape torch.Size([16, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 256, 1, 1]).

Thanks for any help.

YOLOv3 baseline training gets strucked

Hello,

When I run the YOLOv3 baseline training script:

python -m torch.distributed.launch --nproc_per_node=10 --master_port=287343 main.py \
        --cfg config/yolov3_baseline.cfg -d COCO --tfboard --distributed --ngpu 8 \
        --checkpoint weights/darknet53_feature_mx.pth --start_epoch 0 --half --log_dir log/COCO -s 608

The process got strucked at:

index created!
Training YOLOv3 strong baseline!
loading pytorch ckpt... weights/darknet53_feature_mx.pth
using cuda
index created!
Training YOLOv3 strong baseline!
loading pytorch ckpt... weights/darknet53_feature_mx.pth
using cuda
loading pytorch ckpt... weights/darknet53_feature_mx.pth
loading pytorch ckpt... weights/darknet53_feature_mx.pth
using cuda
using cuda
loading pytorch ckpt... weights/darknet53_feature_mx.pth
using cuda
loading pytorch ckpt... weights/darknet53_feature_mx.pth
using cuda

I use 8 2080Ti GPUs, and the state is GPUs is:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:1A:00.0 Off |                  N/A |
| 27%   31C    P8    20W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:1B:00.0 Off |                  N/A |
| 27%   29C    P8    18W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  On   | 00000000:3D:00.0 Off |                  N/A |
| 27%   30C    P8    23W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  On   | 00000000:3E:00.0 Off |                  N/A |
| 27%   29C    P8    13W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce RTX 208...  On   | 00000000:88:00.0 Off |                  N/A |
| 27%   28C    P8     9W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce RTX 208...  On   | 00000000:89:00.0 Off |                  N/A |
| 27%   30C    P8    17W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce RTX 208...  On   | 00000000:B1:00.0 Off |                  N/A |
| 27%   29C    P8    11W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce RTX 208...  On   | 00000000:B2:00.0 Off |                  N/A |
| 27%   31C    P8    25W / 250W |   1186MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    159506      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
|    1    159507      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
|    2    159508      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
|    3    159509      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
|    4    159510      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
|    5    159511      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
|    6    159512      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
|    7    159513      C   ...hen/anaconda3/envs/pytorch13/bin/python  1175MiB |
+-----------------------------------------------------------------------------+

Single GPU

If i have 1 GPU, set n-gpus = 1, rights ?

distributed testing

Hi there, thanks for sharing your code!
I have tested your pre-trained model "YOLOv3 800+ ASFF*" with distributed testing(on 4 TITAN XPs) and single GPU testing(on one TITAN XP) on COCO minival. I noticed that except the inference time gap between them, the single GPU testing's performance is 1% mAP lower than the distributed testing results. Could you please explain that for me?

The single GPU result: 67.32ms, 42.6mAP
Distributed test on 4 GPUs result: 88.41ms, 43.6mAP

Thank you so much for your time!

cache？

非常好的工作！问一个low的问题，我的coco数据集下面只有image和annotation，请问cache是干啥的，从哪下载得到？

windows不支持分布式训练

训练时出现：
....省略...
File"C:\Users\Admin\Anaconda3\envs\pytorch\lib\sitepackages\apex\parallel\optimized_sync_batchnorm_kernel.py", line 29, in forward
if torch.distributed.is_initialized():
AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

I want to know whether support single gpu for training?Thank you!

anchor guiding

First of all thank the author for the excellent code，especially the attempts of bof and ga。I looked at the code, I feel that your Guided Anchoring idea is not the same as the original paper, do you have time to explain the implementation ideas of this part of ga？ thank you

multi scale training ?

BOFs used multi-scale training, how about ASFF? Does ASFF get the hight mAP with multi-scale training or not ?

Why do you use Convolutional layers without Batch-normalization in RFBblock?

@ruinmessi Hi,

Why do you use Convolutional layers without Batch-normalization in RFBblock?
https://github.com/ruinmessi/ASFF/blob/55b6637b21d69509ae73100f9db19dc98acb419d/models/network_blocks.py#L138-L159

'YOLOv3' object has no attribute 'module'

hello, i wonder that why i train the model about 10 epoch and meet that error, is that pytorch's version result in this problem, i change my torch's version to 1.0.1 but meet another problem that import DCN error, many thanks if you can reply me :)
torch.save(model.module.state_dict(), os.path.join(args.save_dir, File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py") _____________'YOLOv3' object has no attribute 'module'

2080ti， cuda10 ，how to install cocoapi

my environment is cuda10 ，2080ti， could you tell me how to install cocoapi，thx

output visualize problem

I simply build yolov3 asff model like this:

model = build_model()
        a_t = torch.Tensor(a).unsqueeze(axis=0).to(device)
        out = model(a_t)[0]

Which out is a (48555, 85) shape tensor.

I think the first 4 dim is location, while 81 is the score. However, when i filtered out the locations with score bigger than 0.1 (which mostly reasonable), the result is bad:

it gets more than 48000 results which apparently not the objects,

how to exactly get objects from these outputs?

question about time （24ms on 2080ti）

could you please tell me that whether inference time （24ms）contain the time nms costs？thx

demo for a single image

@ruinmessi

Any plan to have demo.py for a single image input?

Thanks,

encounter an error in running demo.py

I got maximum recursion depth error when I run the following demo.py:

python3 demo.py -i test.jpg --cfg config/yolov3_baseline.cfg -d COCO --checkpoint YOLOv3-mobile-asff.pth --asff -s 416

/home/test/anaconda3/envs/detectron2/lib/python3.6/site-packages/torchvision-0.5.0a0+1e857d9-py3.6-linux-x86_64.egg/torchvision/io/_video_opt.py:17: UserWarning: video reader based on ffmpeg c++ ops not available
Setting Arguments.. : Namespace(asff=True, cfg='config/yolov3_baseline.cfg', checkpoint='YOLOv3-mobile-asff.pth', dataset='COCO', half=False, img='test.jpg', rfb=False, test_size=416, use_cuda=True)
successfully loaded config file: {'MODEL': {'TYPE': 'YOLOv3', 'BACKBONE': 'mobile'}, 'TRAIN': {'LR': 0.001, 'MOMENTUM': 0.9, 'DECAY': 0.0005, 'BURN_IN': 5, 'MAXEPOCH': 300, 'COS': True, 'SYBN': True, 'MIX': True, 'NO_MIXUP_EPOCHS': 30, 'LABAL_SMOOTH': True, 'BATCHSIZE': 5, 'IMGSIZE': 608, 'IGNORETHRE': 0.7, 'RANDRESIZE': True}, 'TEST': {'CONFTHRE': 0.01, 'NMSTHRE': 0.65, 'IMGSIZE': 608}}
For mobilenet, we currently don't support dropblock, rfb and FeatureAdaption
Training YOLOv3 with ASFF!
loading pytorch ckpt... YOLOv3-mobile-asff.pth
using cuda
Traceback (most recent call last):
File "/home/test/anaconda3/envs/detectron2/lib/python3.6/site-packages/torchvision-0.5.0a0+1e857d9-py3.6-linux-x86_64.egg/torchvision/ops/boxes.py", line 31, in nms
File "/home/test/anaconda3/envs/detectron2/lib/python3.6/site-packages/torchvision-0.5.0a0+1e857d9-py3.6-linux-x86_64.egg/torchvision/ops/boxes.py", line 31, in nms
File "/home/test/anaconda3/envs/detectron2/lib/python3.6/site-packages/torchvision-0.5.0a0+1e857d9-py3.6-linux-x86_64.egg/torchvision/ops/boxes.py", line 31, in nms
[Previous line repeated 997 more times]
RecursionError: maximum recursion depth exceeded

please add a demo.py file

add a demo.py to only show one image result .

out of memory at the end of epoch7

Hello, thank you for your great work.

I found an out of memory error at the end of epoch 7.

I use 8 2080Ti GPUs, and my training scripts is:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=233323 main.py \
        --cfg config/yolov3_baseline.cfg -d COCO --tfboard --distributed --ngpu 8 \
        --checkpoint weights/darknet53_feature_mx.pth --start_epoch 0 --half --asff --rfb --dropblock \
        --log_dir log/COCO_ASFF -s 608

The error log is:

[Epoch 7/300][Iter 2930/2957][lr 0.001000][Loss: anchor 9.82, iou 10.44, l1 32.70, conf 27.21, cls 79.91, imgsize 544, time: 6.14]
[Epoch 7/300][Iter 2940/2957][lr 0.001000][Loss: anchor 9.65, iou 10.57, l1 32.50, conf 26.13, cls 76.35, imgsize 512, time: 6.48]
[Epoch 7/300][Iter 2950/2957][lr 0.001000][Loss: anchor 12.56, iou 13.53, l1 41.10, conf 31.59, cls 95.52, imgsize 512, time: 6.16]
Traceback (most recent call last):
  File "main.py", line 454, in <module>
    main()
  File "main.py", line 388, in main
    optimizer.backward(loss)
  File "/mnt/WXRG0353/sfchen/ASFF/utils/fp16_utils/fp16_optimizer.py", line 483, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/mnt/WXRG0353/sfchen/ASFF/utils/fp16_utils/loss_scaler.py", line 45, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/mnt/WXRG0333/sfchen/anaconda3/envs/pytorch13/lib/python3.7/site-packages/torch/tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/mnt/WXRG0333/sfchen/anaconda3/envs/pytorch13/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 8.16 GiB (GPU 5; 10.73 GiB total capacity; 1.76 GiB already allocated; 8.16 GiB free; 29.42 MiB cached)

and the nvidia-smi message:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:1A:00.0 Off |                  N/A |
| 34%   55C    P2   100W / 250W |   2922MiB / 10989MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:1B:00.0 Off |                  N/A |
| 35%   58C    P2    92W / 250W |   2920MiB / 10989MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  On   | 00000000:3D:00.0 Off |                  N/A |
| 33%   52C    P2   113W / 250W |   2912MiB / 10989MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  On   | 00000000:3E:00.0 Off |                  N/A |
| 33%   51C    P2   101W / 250W |   2920MiB / 10989MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce RTX 208...  On   | 00000000:88:00.0 Off |                  N/A |
| 32%   49C    P2    90W / 250W |   2922MiB / 10989MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce RTX 208...  On   | 00000000:89:00.0 Off |                  N/A |
| 27%   29C    P8     4W / 250W |     11MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce RTX 208...  On   | 00000000:B1:00.0 Off |                  N/A |
| 27%   28C    P8     1W / 250W |     11MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce RTX 208...  On   | 00000000:B2:00.0 Off |                  N/A |
| 27%   28C    P8     1W / 250W |     11MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    173812      C   ...hen/anaconda3/envs/pytorch13/bin/python  2911MiB |
|    1    173813      C   ...hen/anaconda3/envs/pytorch13/bin/python  2909MiB |
|    2    173814      C   ...hen/anaconda3/envs/pytorch13/bin/python  2901MiB |
|    3    173815      C   ...hen/anaconda3/envs/pytorch13/bin/python  2909MiB |
|    4    173816      C   ...hen/anaconda3/envs/pytorch13/bin/python  2911MiB |
+-----------------------------------------------------------------------------+

Hello, author. When I train my dataset, after a epoch, loss is Nan, LR is 000000. Why。。。。。。

some questions about code?

1、where is the GT assignment strategy int the code ?

2、what tricks used in this code? except bof.

3、you test size is 608 in both coco and voc ? do not use multi scale test?

thanks

Win10 cannot train!

I can't train this in Win10.
It shows: 'AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

How to fix it?

segment error(core dumped

using cude
using tfboard
segment error(core dumped)

can't find anything on VOC dataset by demo.py

Hello, I have trained the weight by VOC2007 dataset, and get the evaluate result map = 40. But when I use the my weight to test by demo.py , I can't get any prediction.
The training script is:
python -m torch.distributed.launch --nproc_per_node=1 --master_port=${RANDOM+10000} main.py --cfg config/yolov3_baseline.cfg -d VOC --ngpu 1 --distributed --checkpoint weights/darknet53_feature_mx.pth --start_epoch 0 --half --asff --rfb --dropblock -s 608
and parameters in demo.py is:
--img XX --checkpoint XX --asff --rfb --half -d VOC -s 608
and I try some your weights, but don't run with parameter -d VOC.
So should I change some code in demo.py to fit VOC?

10 块Gpu训练？

Question about the speed.

In the paper 《Objects as Points》,the running time is tested ona machine with Intel Core
i7-8086K CPU, Titan Xp GPU, Pytorch 0.4.1, CUDA 9.0,and CUDNN 7.1 It gets 39.2 mAP at 28FPS with flip test, In your paper, you annotated V100 as the same result with Titan Xp, (28 (V100) 39.2 57.1 42.8 19.9 43.0 51.4), The perfirmance of Tesla V100 is much better than Titan Xp. I am confused about the test speed.

evaluate

Traceback (most recent call last):
File "main.py", line 456, in
main()
File "main.py", line 346, in main
torch.save(model.module.state_dict(), os.path.join(args.save_dir,
File "/home/lianguofei/workspace/ASFF/py35env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 539, in getattr
type(self).name, name))
AttributeError: 'YOLOv3' object has no attribute 'module'

你好，我想请问一下，这个具体要怎么解决？

General learning VOC how many epochs convergence?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

ssh://[email protected]:10074/usr/local/bin/python -u /project/ASFF/main.py --cfg=config/yolov3_baseline.cfg -d=VOC --tfboard --checkpoint=weights/darknet53_feature_mx.pth --start_epoch=0 --half --log_dir log/VOC -s=240 --checkpoint=
Setting Arguments.. : Namespace(asff=False, cfg='config/yolov3_baseline.cfg', checkpoint='', dataset='VOC', debug=False, distributed=False, dropblock=False, eval_interval=10, half=True, local_rank=0, log_dir='log/VOC', n_cpu=4, ngpu=2, no_wd=False, rfb=False, save_dir='save', start_epoch=0, test=False, test_size=240, testset=False, tfboard=True, use_cuda=True, vis=False)
successfully loaded config file: {'MODEL': {'TYPE': 'YOLOv3', 'BACKBONE': 'darknet53'}, 'TRAIN': {'LR': 0.001, 'MOMENTUM': 0.9, 'DECAY': 0.0005, 'BURN_IN': 5, 'MAXEPOCH': 300, 'COS': True, 'SYBN': True, 'MIX': True, 'NO_MIXUP_EPOCHS': 30, 'LABAL_SMOOTH': True, 'BATCHSIZE': 4, 'IMGSIZE': 608, 'IGNORETHRE': 0.7, 'RANDRESIZE': True}, 'TEST': {'CONFTHRE': 0.01, 'NMSTHRE': 0.6, 'IMGSIZE': 608}}
Training YOLOv3 strong baseline!
using cuda
using tfboard
Traceback (most recent call last):
File "/project/ASFF/main.py", line 455, in
main()
File "/project/ASFF/main.py", line 389, in main
optimizer.backward(loss)
File "/project/ASFF/utils/fp16_utils/fp16_optimizer.py", line 483, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/project/ASFF/utils/fp16_utils/loss_scaler.py", line 45, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 4, 76, 76, 25]], which is output 0 of CloneBackward, is at version 9; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Process finished with exit code 1