GithubHelp home page GithubHelp logo

tianzhi0549 / fcos Goto Github PK

View Code? Open in Web Editor NEW
3.3K 59.0 630.0 8.69 MB

FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)

Home Page: https://arxiv.org/abs/1904.01355

License: Other

Dockerfile 0.51% Python 76.98% C++ 2.77% Cuda 18.17% C 1.58%
fcos object-detection one-stage anchor-free pytorch computer-vision iccv2019

fcos's Introduction

FCOS: Fully Convolutional One-Stage Object Detection

This project hosts the code for implementing the FCOS algorithm for object detection, as presented in our paper:

FCOS: Fully Convolutional One-Stage Object Detection;
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He;
In: Proc. Int. Conf. Computer Vision (ICCV), 2019.
arXiv preprint arXiv:1904.01355 

The full paper is available at: https://arxiv.org/abs/1904.01355.

Implementation based on Detectron2 is included in AdelaiDet.

A real-time model with 46FPS and 40.3 in AP on COCO minival is also available here.

Highlights

  • Totally anchor-free: FCOS completely avoids the complicated computation related to anchor boxes and all hyper-parameters of anchor boxes.
  • Better performance: The very simple one-stage detector achieves much better performance (38.7 vs. 36.8 in AP with ResNet-50) than Faster R-CNN. Check out more models and experimental results here.
  • Faster training and testing: With the same hardwares and backbone ResNet-50-FPN, FCOS also requires less training hours (6.5h vs. 8.8h) than Faster R-CNN. FCOS also takes 12ms less inference time per image than Faster R-CNN (44ms vs. 56ms).
  • State-of-the-art performance: Our best model based on ResNeXt-64x4d-101 and deformable convolutions achieves 49.0% in AP on COCO test-dev (with multi-scale testing).

Updates

  • FCOS with Fast And Diverse (FAD) neural architecture search is avaliable at FAD. (30/10/2020)
  • Script for exporting ONNX models. (21/11/2019)
  • New NMS (see #165) speeds up ResNe(x)t based models by up to 30% and MobileNet based models by 40%, with exactly the same performance. Check out here. (12/10/2019)
  • New models with much improved performance are released. The best model achieves 49% in AP on COCO test-dev with multi-scale testing. (11/09/2019)
  • FCOS with VoVNet backbones is available at VoVNet-FCOS. (08/08/2019)
  • A trick of using a small central region of the BBox for training improves AP by nearly 1 point as shown here. (23/07/2019)
  • FCOS with HRNet backbones is available at HRNet-FCOS. (03/07/2019)
  • FCOS with AutoML searched FPN (R50, R101, ResNeXt101 and MobileNetV2 backbones) is available at NAS-FCOS. (30/06/2019)
  • FCOS has been implemented in mmdetection. Many thanks to @yhcao6 and @hellock. (17/05/2019)

Required hardware

We use 8 Nvidia V100 GPUs.
But 4 1080Ti GPUs can also train a fully-fledged ResNet-50-FPN based FCOS since FCOS is memory-efficient.

Installation

Testing-only installation

For users who only want to use FCOS as an object detector in their projects, they can install it by pip. To do so, run:

pip install torch  # install pytorch if you do not have it
pip install git+https://github.com/tianzhi0549/FCOS.git
# run this command line for a demo 
fcos https://github.com/tianzhi0549/FCOS/raw/master/demo/images/COCO_val2014_000000000885.jpg

Please check out here for the interface usage.

For a complete installation

This FCOS implementation is based on maskrcnn-benchmark. Therefore the installation is the same as original maskrcnn-benchmark.

Please check INSTALL.md for installation instructions. You may also want to see the original README.md of maskrcnn-benchmark.

A quick demo

Once the installation is done, you can follow the below steps to run a quick demo.

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
wget https://huggingface.co/tianzhi/FCOS/resolve/main/FCOS_imprv_R_50_FPN_1x.pth?download=true -O FCOS_imprv_R_50_FPN_1x.pth
python demo/fcos_demo.py

Inference

The inference command line on coco minival split:

python tools/test_net.py \
    --config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml \
    MODEL.WEIGHT FCOS_imprv_R_50_FPN_1x.pth \
    TEST.IMS_PER_BATCH 4    

Please note that:

  1. If your model's name is different, please replace FCOS_imprv_R_50_FPN_1x.pth with your own.
  2. If you enounter out-of-memory error, please try to reduce TEST.IMS_PER_BATCH to 1.
  3. If you want to evaluate a different model, please change --config-file to its config file (in configs/fcos) and MODEL.WEIGHT to its weights file.
  4. Multi-GPU inference is available, please refer to #78.
  5. We improved the postprocess efficiency by using multi-label nms (see #165), which saves 18ms on average. The inference metric in the following tables has been updated accordingly.

Models

For your convenience, we provide the following trained models (more models are coming soon).

ResNe(x)ts:

All ResNe(x)t based models are trained with 16 images in a mini-batch and frozen batch normalization (i.e., consistent with models in maskrcnn_benchmark).

Model Multi-scale training Testing time / im AP (minival) Link
FCOS_imprv_R_50_FPN_1x No 44ms 38.7 download
FCOS_imprv_dcnv2_R_50_FPN_1x No 54ms 42.3 download
FCOS_imprv_R_101_FPN_2x Yes 57ms 43.0 download
FCOS_imprv_dcnv2_R_101_FPN_2x Yes 73ms 45.6 download
FCOS_imprv_X_101_32x8d_FPN_2x Yes 110ms 44.0 download
FCOS_imprv_dcnv2_X_101_32x8d_FPN_2x Yes 143ms 46.4 download
FCOS_imprv_X_101_64x4d_FPN_2x Yes 112ms 44.7 download
FCOS_imprv_dcnv2_X_101_64x4d_FPN_2x Yes 144ms 46.6 download

Note that imprv denotes improvements in our paper Table 3. These almost cost-free changes improve the performance by ~1.5% in total. Thus, we highly recommend to use them. The following are the original models presented in our initial paper.

Model Multi-scale training Testing time / im AP (minival) AP (test-dev) Link
FCOS_R_50_FPN_1x No 45ms 37.1 37.4 download
FCOS_R_101_FPN_2x Yes 59ms 41.4 41.5 download
FCOS_X_101_32x8d_FPN_2x Yes 110ms 42.5 42.7 download
FCOS_X_101_64x4d_FPN_2x Yes 113ms 43.0 43.2 download

MobileNets:

We update batch normalization for MobileNet based models. If you want to use SyncBN, please install pytorch 1.1 or later.

Model Training batch size Multi-scale training Testing time / im AP (minival) Link
FCOS_syncbn_bs32_c128_MNV2_FPN_1x 32 No 26ms 30.9 download
FCOS_syncbn_bs32_MNV2_FPN_1x 32 No 33ms 33.1 download
FCOS_bn_bs16_MNV2_FPN_1x 16 No 44ms 31.0 download

[1] 1x and 2x mean the model is trained for 90K and 180K iterations, respectively.
[2] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[3] c128 denotes the model has 128 (instead of 256) channels in towers (i.e., MODEL.RESNETS.BACKBONE_OUT_CHANNELS in config).
[4] dcnv2 denotes deformable convolutional networks v2. Note that for ResNet based models, we apply deformable convolutions from stage c3 to c5 in backbones. For ResNeXt based models, only stage c4 and c5 use deformable convolutions. All models use deformable convolutions in the last layer of detector towers.
[5] The model FCOS_imprv_dcnv2_X_101_64x4d_FPN_2x with multi-scale testing achieves 49.0% in AP on COCO test-dev. Please use TEST.BBOX_AUG.ENABLED True to enable multi-scale testing.

Training

The following command line will train FCOS_imprv_R_50_FPN_1x on 8 GPUs with Synchronous Stochastic Gradient Descent (SGD):

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --master_port=$((RANDOM + 10000)) \
    tools/train_net.py \
    --config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml \
    DATALOADER.NUM_WORKERS 2 \
    OUTPUT_DIR training_dir/fcos_imprv_R_50_FPN_1x

Note that:

  1. If you want to use fewer GPUs, please change --nproc_per_node to the number of GPUs. No other settings need to be changed. The total batch size does not depends on nproc_per_node. If you want to change the total batch size, please change SOLVER.IMS_PER_BATCH in configs/fcos/fcos_R_50_FPN_1x.yaml.
  2. The models will be saved into OUTPUT_DIR.
  3. If you want to train FCOS with other backbones, please change --config-file.
  4. If you want to train FCOS on your own dataset, please follow this instruction #54.
  5. Now, training with 8 GPUs and 4 GPUs can have the same performance. Previous performance gap was because we did not synchronize num_pos between GPUs when computing loss.

ONNX

Please refer to the directory onnx for an example of exporting the model to ONNX. A converted model can be downloaded here. We recommend you to use PyTorch >= 1.4.0 (or nightly) and torchvision >= 0.5.0 (or nightly) for ONNX models.

Contributing to the project

Any pull requests or issues are welcome.

Citations

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows.

@inproceedings{tian2019fcos,
  title   =  {{FCOS}: Fully Convolutional One-Stage Object Detection},
  author  =  {Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
  booktitle =  {Proc. Int. Conf. Computer Vision (ICCV)},
  year    =  {2019}
}
@article{tian2021fcos,
  title   =  {{FCOS}: A Simple and Strong Anchor-free Object Detector},
  author  =  {Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
  booktitle =  {IEEE T. Pattern Analysis and Machine Intelligence (TPAMI)},
  year    =  {2021}
}

Acknowledgments

We would like to thank @yqyao for the tricks of center sampling and GIoU. We also thank @bearcatt for his suggestion of positioning the center-ness branch with box regression (refer to #89).

License

For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.

fcos's People

Contributors

103yiran avatar apacha avatar ausk avatar belowmit avatar bernhardschaefer avatar botcs avatar chhshen avatar climbsrocks avatar coincheung avatar fmassa avatar godricly avatar henrywang1 avatar isameer avatar jario-jin avatar jiayuan-gu avatar keineahnung2345 avatar killthekitten avatar leviviana avatar newstzpz avatar renebidart avatar rodrigoberriel avatar soumith avatar stan-haochen avatar stanstarks avatar tianzhi0549 avatar wat3rbro avatar xudangliatiger avatar yelantf avatar zhangliliang avatar zimenglan-sysu-512 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fcos's Issues

/FCOS/maskrcnn_benchmark/layers

#from maskrcnn_benchmark import _C
from ._utils import _C

i use "from ._utils import _C" instead of "from maskrcnn_benchmark import _C" in roi_align.py,roi_pool.py

is it right?
I modified this way.it works now,but i do not know it right?

centerness loss is larger than other loss

I trained FCOS on my own datasets, but the loss of centerness is much larger than loxx_cls and loss_reg. Could you please give me some advice on how to solve this.

2019-04-29 07:05:16,317 maskrcnn_benchmark.trainer INFO: eta: 11:21:07  iter: 13000  loss: 0.7219 (0.7213)  loss_centerness: 0.5492 (0.5514)  loss_cls: 0.0237 (0.0238)  loss_reg: 0.1435 (0.1462)  time: 1.1004 (1.1045)  data: 0.6328 (0.6421)  lr: 0.010000  max mem: 7608
2019-04-29 07:05:39,170 maskrcnn_benchmark.trainer INFO: eta: 11:21:39  iter: 13020  loss: 0.7145 (0.7212)  loss_centerness: 0.5515 (0.5515)  loss_cls: 0.0237 (0.0238)  loss_reg: 0.1377 (0.1459)  time: 1.1148 (1.1060)  data: 0.5771 (0.6409)  lr: 0.010000  max mem: 7608
2019-04-29 07:06:03,733 maskrcnn_benchmark.trainer INFO: eta: 11:24:04  iter: 13040  loss: 0.7170 (0.7211)  loss_centerness: 0.5514 (0.5515)  loss_cls: 0.0239 (0.0239)  loss_reg: 0.1373 (0.1457)  time: 1.1881 (1.1105)  data: 0.5283 (0.6373)  lr: 0.010000  max mem: 7608
2019-04-29 07:06:28,364 maskrcnn_benchmark.trainer INFO: eta: 11:26:21  iter: 13060  loss: 0.7417 (0.7218)  loss_centerness: 0.5507 (0.5514)  loss_cls: 0.0235 (0.0239)  loss_reg: 0.1642 (0.1464)  time: 1.1700 (1.1148)  data: 0.6051 (0.6388)  lr: 0.010000  max mem: 7608

Could FCOS overfit one single train image

Hi, I am trying to overfit one single image using FCOS (train/test with single image without any other transforms like horizontal flip) to test if I correctly use your codes. I have a very strange result and it seems like it can not fully overfit one single training images.

Inference:
prediction|center|200x0

Ground-truth:
bounding_box

I have borrowed codes from mask rcnn repo and use the fcos code from rpn/fcos but didn't check other differences between these two repos. Am I missing something that may cause this problem?

or

FCOS just can't fully overfit one single image like Mask RCNN because it uses multiple binary classifier (Sigmoid Focal Loss) instead of SoftmaxFocalLoss so that other classes (except for instances that appear on training image) classifier won't train for single image dataset.

Could you give me some hints on debugging this problem?

Many thanks.

training error, cannot start

Hi! I try to train coco_train2017 data following the step as you shown, but raise an error as follow:
2019-05-15 10:23:24,814 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in
main()
File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/work/songping/anaconda3/envs/FCOS/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--skip-test', '--config-file', 'configs/fcos/fcos_R_50_FPN_1x.yaml', 'DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_R_50_FPN_1x']' died with <Signals.SIGSEGV: 11>.
could you help me to solve the problem? thank you

training error about GC?

Hello author,thank you very much for publicizing the fcos code, this is really a great job!But I encountered a problem in the process of training the model, sometimes I can train normally, sometimes,it get error:

Fatal Python error: GC object already tracked

Thread 0x00007f58161ba700 (most recent call first):

Thread 0x00007f58159b9700 (most recent call first):

Thread 0x00007f58151b8700 (most recent call first):

Thread 0x00007f58169bb700 (most recent call first):

Thread 0x00007f5825cbc700 (most recent call first):
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 296 in wait
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 865 in run
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 917 in _bootstrap_inner
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f58264bd700 (most recent call first):
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 296 in wait
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 865 in run
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 917 in _bootstrap_inner
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f5826cbe700 (most recent call first):
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 296 in wait
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 865 in run
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 917 in _bootstrap_inner
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f582813e700 (most recent call first):
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 296 in wait
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 865 in run
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 917 in _bootstrap_inner
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/threading.py", line 885 in _bootstrap

Current thread 0x00007f5903479740 (most recent call first):
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489 in call
File "/home/zzy/fcos/FCOS/maskrcnn_benchmark/modeling/backbone/resnet.py", line 334 in forward
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 494 in call
File "/home/zzy/fcos/FCOS/maskrcnn_benchmark/modeling/backbone/resnet.py", line 140 in forward
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 494 in call
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/container.py", line 97 in forward
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 494 in call
File "/home/zzy/fcos/FCOS/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 49 in forward
File "/home/zzy/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 494 in call
File "/home/zzy/fcos/FCOS/maskrcnn_benchmark/engine/trainer.py", line 66 in do_train
File "/home/zzy/fcos/FCOS/tools/train_net.py", line 73 in train
File "/home/zzy/fcos/FCOS/tools/train_net.py", line 167 in main
File "/home/zzy/fcos/FCOS/tools/train_net.py", line 174 in

How can I solve this problem? I am using this line of code to start training:

python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

RuntimeError: cannot perform reduction function min on tensor with no elements because the operation does not have an identity

Traceback (most recent call last):
File "tools/train_net.py", line 176, in
main()
File "tools/train_net.py", line 169, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 75, in train
arguments,
File "/home/administrator/FCOS/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
loss_dict = model(images, targets)
File "/home/administrator/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/FCOS/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/home/administrator/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/fcos.py", line 134, in forward
centerness, targets
File "/home/administrator/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/fcos.py", line 144, in _forward_train
locations, box_cls, box_regression, centerness, targets
File "/home/administrator/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/loss.py", line 142, in call
labels, reg_targets = self.prepare_targets(locations, targets)
File "/home/administrator/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/loss.py", line 57, in prepare_targets
points_all_level, targets, expanded_object_sizes_of_interest
File "/home/administrator/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/loss.py", line 94, in compute_targets_for_locations
is_in_boxes = reg_targets_per_im.min(dim=2)[0] > 0
RuntimeError: cannot perform reduction function min on tensor with no elements because the operation does not have an identity

how can i solve it ?

And i train my own data,i have only 20 classes
should i change 80(coco ) to 20 in the network?

About the input size

Tanks for your work, it is excellent. I have some puzzles. Do you have some experiments on smaller input size, like 300300 or 224224, how does the input size influence the final results? Also, the postprocessing of the locations need to the filtered with classification *center-ness score, then the filtered locations will be calculated back to bbox flowed by the NMS. How many bboxes averagely will be calculated back to bbox? Does this operations with NMS cost much time? Looking forward to your reply.

Improvement for small object detection

Thank you for the great work! I ran the proposed code on my custom dataset of medical image, the result list as below:
2019-05-06 03:04:17,399 maskrcnn_benchmark.inference INFO:
OrderedDict([('bbox', OrderedDict([('AP', 0.5055817771868518),
('AP50', 0.8926599742997058), ('AP75', 0.4691991724725123),
('APs', 0.0007072135785007071), ('APm', 0.10701570554699777),
('APl', 0.5285697565878185)]))])

compared with yolov3 on the same dataset, i found that the proposed algorithm works not very well on small object ('APs', 0.0007072135785007071), as mentioned in the paper this is a anchor free method. Is there any idea to improve the detection performance on small object? or any hyperparameters to finetune?

When I train resnet101 backbone,I encounter this error,while I can train resnet50 successfuly with the same coco file,what should I do?

2019-05-04 00:17:28,682 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 189, in
main()
File "tools/train_net.py", line 182, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 87, in train
arguments,
File "/home/abc/code/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/datasets/coco.py", line 94, in getitem
img, target = self.transforms(img, target)
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call
image, target = t(image, target)
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call
size = self.get_size(image.size)
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 42, in get_size
if max_original_size / min_original_size * size > max_size:
TypeError: unsupported operand type(s) for *: 'float' and 'range'

Traceback (most recent call last):
File "tools/train_net.py", line 189, in
main()
File "tools/train_net.py", line 182, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 87, in train
arguments,
File "/home/abc/code/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/datasets/coco.py", line 94, in getitem
img, target = self.transforms(img, target)
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call
image, target = t(image, target)
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call
size = self.get_size(image.size)
File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 42, in get_size
if max_original_size / min_original_size * size > max_size:
TypeError: unsupported operand type(s) for *: 'float' and 'range'

Can you provide dockerfile?

It always got wrongs about timing out especially for pytorch 1.1 when I build a docker image from dockerfile.
Can you provide the dockerfile?

when test image, the score of some object is low.

I have a question, why the scores are so low? such as the bike is 0.248 in your picture in #5 .

it also occur in my trained model, i trained the coco2017 with the code, and the score is always low, so i couldn't set the threshold, in my test, the people's score maybe 0.2-0.4 sometimes, when i set the threshold to 0.2, some false positive occur。

could you tell me how to solve the problem?

training doesn't cost time

during my training , there was no error but total training time is 0 s. i'm sure my environment is correct. here is some related information:

MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "pre_train/mymodel.pth"
RPN_ONLY: True
FCOS_ON: True
BACKBONE:
CONV_BODY: "R-50-FPN-RETINANET"
RESNETS:
BACKBONE_OUT_CHANNELS: 256
RETINANET:
USE_C5: False # FCOS uses P5 instead of C5
DATASETS:
TRAIN: ("coco_2019_train","coco_2019_val")
TEST: ("coco_2019_test","coco_2019_val")
INPUT:
MIN_SIZE_TRAIN: (400,)
MAX_SIZE_TRAIN: 1200
MIN_SIZE_TEST: 400
MAX_SIZE_TEST: 1200
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0001
STEPS: (2400, 6000)
MAX_ITER: 12000
IMS_PER_BATCH: 1
WARMUP_METHOD: "constant"

DATALOADER:
ASPECT_RATIO_GROUPING: True
NUM_WORKERS: 4
SIZE_DIVISIBILITY: 32
DATASETS:
TEST: ('coco_2019_test', 'coco_2019_val')
TRAIN: ('coco_2019_train', 'coco_2019_val')
INPUT:
MAX_SIZE_TEST: 1200
MAX_SIZE_TRAIN: 1200
MIN_SIZE_RANGE_TRAIN: (-1, -1)
MIN_SIZE_TEST: 400
MIN_SIZE_TRAIN: (400,)
PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
PIXEL_STD: [1.0, 1.0, 1.0]
TO_BGR255: True

FCOS:
FPN_STRIDES: [8, 16, 32, 64, 128]
INFERENCE_TH: 0.05
LOSS_ALPHA: 0.25
LOSS_GAMMA: 2.0
NMS_TH: 0.6
NUM_CLASSES: 3
NUM_CONVS: 4
PRE_NMS_TOP_N: 1000
PRIOR_PROB: 0.01
FCOS_ON: True

WEIGHT: pre_train/mymodel.pth
OUTPUT_DIR: ./experiments/result
PATHS_CATALOG: /home/user/cocoapi/PythonAPI/maskrcnn_FCOS/FCOS/maskrcnn_benchmark/config/paths_catalog.py
SOLVER:
BASE_LR: 0.001
BIAS_LR_FACTOR: 2
CHECKPOINT_PERIOD: 2000
GAMMA: 0.1
IMS_PER_BATCH: 1
MAX_ITER: 12000
MOMENTUM: 0.9
STEPS: (2400, 6000)
WARMUP_FACTOR: 0.3333333333333333
WARMUP_ITERS: 500
WARMUP_METHOD: constant
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0
TEST:
DETECTIONS_PER_IMG: 100
EXPECTED_RESULTS: []
EXPECTED_RESULTS_SIGMA_TOL: 4
IMS_PER_BATCH: 1

2019-05-05 18:30:27,756 maskrcnn_benchmark.trainer INFO: Start training
2019-05-05 18:30:27,852 maskrcnn_benchmark.trainer INFO: Total training time: 0:00:00.095197 (0.0000 s / it)
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2019-05-05 18:30:27,883 maskrcnn_benchmark.inference INFO: Start evaluation on coco_2019_test dataset(127 images)

About the head feature sharing

Hi, dose the head features for cls and box (e.g. 4 * conv) shared in your implementation? have you experimented to compare the difference for the final performance?

compile error with pytorch1.0.0 nightly

anaconda3/envs/py35torch1/lib/python3.5/site-packages/torch/include/ATen/Dispatch.h:15:17: error: switch quantity not an integer
switch (TYPE) {
^
Project/FCOS/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
^
anaconda3/envs/py35torch1/lib/python3.5/site-packages/torch/include/ATen/Dispatch.h:16:44: error: could not convert ‘Double’ from ‘c10::ScalarType’ to ‘’
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, VA_ARGS)
^
anaconda3/envs/py35torch1/lib/python3.5/site-packages/torch/include/ATen/Dispatch.h:8:8: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
case enum_type: {
^
Project/FCOS/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
^
anaconda3/envs/py35torch1/lib/python3.5/site-packages/torch/include/ATen/Dispatch.h:17:44: error: could not convert ‘Float’ from ‘c10::ScalarType’ to ‘’
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, VA_ARGS)
^
anaconda3/envs/py35torch1/lib/python3.5/site-packages/torch/include/ATen/Dispatch.h:8:8: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
case enum_type: {
^
Project/FCOS/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {

but when i copy code to maskrcnn benchmark, this error never happend
so, I think there may some version problem

a problem about ground-truth center-ness generation for inference

I have tried using the ground-truth center-ness for inference, however only got AP 38.8 rather than 42.1 metioned in the paper. I wonder if there are some errors in my code of ground-truth center-ness generation. Can you help me find the problem or give me your code? thx.
My code is as follow:

        labels, reg_targets = self.prepare_targets(locations, targets)
        centerness_target = []
        for l in range(len(labels)):
            reg_targets_flatten = reg_targets[l].reshape(-1, 4)
            reg_targets_flatten = (labels[l]!=0)[:,None].float()*reg_targets_flatten
            reg_targets_flatten = self.compute_centerness_targets(reg_targets_flatten)
            centerness_target.append(reg_targets_flatten.reshape(centerness[l].shape))

        return centerness_target
```

something wrong when install

  1. how to download pytorch 1.0.0 nightly? I can only download pytorch 1.1.0 nightly version.
  2. Is cuda 9.0 not ok for this repo? I used cuda 9.0 and when run python setup.py build develop give wrong infomation /usr/local/cuda/bin/nvcc: no such file or directory.

您好,请问为何在inference_single_cv中没有使用centerness

您好,非常感谢您的开源代码。我在使用inference_single_cvimage.py时,发现代码没有使用centerness,在centerness_loss时直接是一个conv2d(256,1),并没有使用compute centerness函数,同时我在compute centerness函数那里加了断点,程序并没有暂停,说明程序确实没有执行:
def compute_centerness_targets(self, reg_targets): left_right = reg_targets[:, [0, 2]] top_bottom = reg_targets[:, [1, 3]] centerness = (left_right.min(dim=-1)[0] / left_right.max(dim=-1)[0]) * \ (top_bottom.min(dim=-1)[0] / top_bottom.max(dim=-1)[0]) return torch.sqrt(centerness)
请问下是有什么参数来控制么?为何在测试这里没有执行centerness,非常感谢!

Data preparation error

When i train the network using my own data, there is a mistake:

TypeError: Traceback (most recent call last):
File "/home/wh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/wh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/wh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 232, in
return [default_collate(samples) for samples in transposed]
File "/home/wh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 234, in default_collate
raise TypeError((error_msg.format(type(batch[0]))))
TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'maskrcnn_benchmark.structures.bounding_box.BoxList'>

I follow the same configurations and data preparation process and I find Line 71 in maskrcnn_benchmark/data/dataset/voc.py, the return parameter is target which is a BoxList, so is there any modification I miss? Thanks a lot!

Can not download the pre-trained model

When I run train.py to train FCOS_X_101_64x4d_FPN_2x, but it happens download error:
Downloading: "https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/20171220/X-101-64x4d.pkl" to /home/zhengchenbin/.torch/models/X-101-64x4d.pkl Traceback (most recent call last): File "tools/train_net.py", line 175, in <module> main() File "tools/train_net.py", line 168, in main model = train(cfg, args.local_rank, args.distributed) File "tools/train_net.py", line 54, in train extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT) File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/utils/checkpoint.py", line 65, in load checkpoint = self._load_file(f) File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/utils/checkpoint.py", line 133, in _load_file cached_f = cache_url(f) File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/utils/model_zoo.py", line 54, in cache_url _download_url_to_file(url, cached_file, hash_prefix, progress=progress) File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/model_zoo.py", line 76, in _download_url_to_file u = urlopen(url) File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/urllib/request.py", line 569, in error return self._call_chain(*args) File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden
Can you provide the pre-trained model X-101-64x4d.pkl or give another download link such as baiduyun or google drive? Thank you very much!

How to handle the negative values in bbox prediction convolution layer when calculating the IOU Loss?

The outputs value of the convolution layer for bbox prediction are not always positive, so when meeting with a negative value, the IOU Loss will be 'NAN' (because the log) , I find that the output value of the bbox prediction are post-processed by a 'torch.exp' operation, will this operation harm the detection performance? Is there any other operations to deal with the negative values in the bbox prediction feature map?

How to resume training?

How to resume training? I see that there is no part of the code to resume training, they are all trained from scratch. But if the program is interrupted unexpectedly, how can I resume training?

what is the final loss?

I am training the mode on my own dataset, and I also modified some part of the code.

Could you tell me, what is the final value of each loss of your model? About cls_loss, reg_loss, centerness_loss. This might be helpful for me to check my training procedure and related code.

Thanks.

RuntimeError?

When I was training on my own dataset, I encountered the following problem:
Traceback (most recent call last):
File "tools/train_net.py", line 198, in
main()
File "tools/train_net.py", line 190, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 77, in train
arguments,
File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
loss_dict = model(images, targets)
File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 357, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/home/zhengchenbin/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/fcos.py", line 134, in forward
centerness, targets
File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/fcos.py", line 144, in _forward_train
locations, box_cls, box_regression, centerness, targets
File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/loss.py", line 146, in call
labels, reg_targets = self.prepare_targets(locations, targets)
File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/loss.py", line 57, in prepare_targets
points_all_level, targets, expanded_object_sizes_of_interest
File "/home/zhengchenbin/FcosNet/FCOS/maskrcnn_benchmark/modeling/rpn/fcos/loss.py", line 98, in compute_targets_for_locations
is_in_boxes = reg_targets_per_im.min(dim=2)[0] > 0
RuntimeError: cannot perform reduction function min on tensor with no elements because the operation does not have an identity.
It seems to be reading an image without any ground truth box, but I checked my dataset, all images have ground truth box. And I found that the reading data class COCODataset will automatically ignore images without any box.
Any ideas? How can I solve it?

How to inference on single image

Does there any scripts test on single image? So coupled with maskrcnn-benchmark, I can not even find out where the whole pipeline of FCOS exists

【Two situations】When compiling FCOS: (1) collect2: fatal error: cannot find 'ld'. (2) unable to execute 'x86_64-conda_cos6-linux-gnu-gcc': No such file or directory.

I tried to run the code in two different environment but failed.
BTW, my linux environment is offline.

【1】The environment is:
(0) Ubuntu 16.04
(1) Anaconda python 3.6.2 (GCC 7.2.0)
(2) PyTorch 1.0.0
(3) gcc -v --> 5.4.0
(4) CUDA 9.0.176
【ERROR】:
running build
running build_py
running build_ext
/home/qinhaonan/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py:118: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (x86_64-conda_cos6-linux-gnu-c++) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

x86_64-conda_cos6-linux-gnu-c++ -pthread -shared -Wl,-O2,--sort-common,--as-needed,-z,relro,-z,now -Wl,-rpath,/opt/anaconda3/lib -L/opt/anaconda3/lib -Wl,-O2,--sort-common,--as-needed,-z,relro,-z,now -Wl,-rpath,/opt/anaconda3/lib -L/opt/anaconda3/lib build/temp.linux-x86_64-3.6/home/qinhaonan/Algorithm/FCOS/FCOS_01/FCOS-master/maskrcnn_benchmark/csrc/vision.o build/temp.linux-x86_64-3.6/home/qinhaonan/Algorithm/FCOS/FCOS_01/FCOS-master/maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.o build/temp.linux-x86_64-3.6/home/qinhaonan/Algorithm/FCOS/FCOS_01/FCOS-master/maskrcnn_benchmark/csrc/cpu/nms_cpu.o build/temp.linux-x86_64-3.6/home/qinhaonan/Algorithm/FCOS/FCOS_01/FCOS-master/maskrcnn_benchmark/csrc/cuda/nms.o build/temp.linux-x86_64-3.6/home/qinhaonan/Algorithm/FCOS/FCOS_01/FCOS-master/maskrcnn_benchmark/csrc/cuda/ROIPool_cuda.o build/temp.linux-x86_64-3.6/home/qinhaonan/Algorithm/FCOS/FCOS_01/FCOS-master/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.o build/temp.linux-x86_64-3.6/home/qinhaonan/Algorithm/FCOS/FCOS_01/FCOS-master/maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.o -L/usr/local/cuda/lib64 -L/opt/anaconda3/lib -lcudart -lpython3.6m -o build/lib.linux-x86_64-3.6/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so
collect2: fatal error: cannot find 'ld'
compilation terminated.
error: command 'x86_64-conda_cos6-linux-gnu-c++' failed with exit status 1

【2】The environment is:
(0) Ubuntu 16.04
(1) Anaconda python 3.6.2 (GCC 7.2.0)
(2) PyTorch 0.4.0
(3) gcc -v --> 4.8.5
(4) CUDA 8.0.61
【ERROR】:
running build_ext
/apps/jhinno/users/.../.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py:80: UserWarning: Error checking compiler version: [Errno 2] No such file or directory: 'x86_64-conda_cos6-linux-gnu-c++'
warnings.warn('Error checking compiler version: {}'.format(error))
/apps/jhinno/users/.../.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py:106: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (x86_64-conda_cos6-linux-gnu-c++) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'maskrcnn_benchmark._C' extension
creating build
creating build/temp.linux-x86_64-3.6
...
creating build/temp.linux-x86_64-3.6/.../FCOS/FCOS-master/maskrcnn_benchmark/csrc/cpu
creating build/temp.linux-x86_64-3.6/.../FCOS/FCOS-master/maskrcnn_benchmark/csrc/cuda
x86_64-conda_cos6-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -fPIC -DWITH_CUDA -I/apps/jhinno/users/IMGLAB/4003/HeroNet/FCOS/FCOS-master/maskrcnn_benchmark/csrc -I/apps/jhinno/users/IMGLAB/4003/.local/lib/python3.6/site-packages/torch/lib/include -I/apps/jhinno/users/IMGLAB/4003/.local/lib/python3.6/site-packages/torch/lib/include/TH -I/apps/jhinno/users/IMGLAB/4003/.local/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c /apps/jhinno/users/IMGLAB/4003/HeroNet/FCOS/FCOS-master/maskrcnn_benchmark/csrc/vision.cpp -o build/temp.linux-x86_64-3.6/apps/jhinno/users/IMGLAB/4003/HeroNet/FCOS/FCOS-master/maskrcnn_benchmark/csrc/vision.o -DTORCH_EXTENSION_NAME=maskrcnn_benchmark._C -std=c++11
unable to execute 'x86_64-conda_cos6-linux-gnu-gcc': No such file or directory
error: command 'x86_64-conda_cos6-linux-gnu-gcc' failed with exit status 1

some question about loss.py in fcos file

Thanks for your work,I am reading your code and I have some questions.
in the loss.py in FCOS file,'level' comes several times,such as 'points_per_level','labels_level_first','reg_targets_level_first',So,what is the 'level' mean?,furthermore,what is the 'labels_level_first' and 'reg_targets_level_first' mean?

ValueError: num_samples should be a positive integeral value, but got num_samples=0

Traceback (most recent call last):
File "tools/train_net.py", line 175, in
main()
File "tools/train_net.py", line 168, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 61, in train
start_iter=arguments["iteration"],
File "/home/administrator/USPIntern/zq/FCOS/maskrcnn_benchmark/data/build.py", line 158, in make_data_loader
sampler = make_data_sampler(dataset, shuffle, is_distributed)
File "/home/administrator/USPIntern/zq/FCOS/maskrcnn_benchmark/data/build.py", line 63, in make_data_sampler
sampler = torch.utils.data.sampler.RandomSampler(dataset)
File "/home/administrator/.local/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 64, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integeral value, but got num_samples=0

I converted my data to coco format, but this error occurred.
can you help me?thanks a lot!

AttributeError: 'list' object has no attribute 'resize'

When I train on the coco dataset on single GPU, my datapath is FCOS-master/datasets/coco, in the coco folder are annotations, train2014 and val2014 folders, in the annotations folder are instances_train2014.json and instances_valminusminival2014.json, and I input "python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM + 10000)) tools/train_net.py --skip-test --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 0 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x", then I ran into the AttributeError:
Traceback (most recent call last):
File "tools/train_net.py", line 174, in
main()
File "tools/train_net.py", line 167, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/cj/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 560, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 560, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 85, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/cj/maskrcnn_benchmark/data/datasets/coco.py", line 67, in getitem
img, anno = super(COCODataset, self).getitem(idx)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torchvision-0.2.3a0+9077164-py3.6-linux-x86_64.egg/torchvision/datasets/coco.py", line 114, in getitem
File "/cj/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call
image, target = t(image, target)
File "/cj/maskrcnn_benchmark/data/transforms/transforms.py", line 60, in call
target = target.resize(image.size)
AttributeError: 'list' object has no attribute 'resize'
Traceback (most recent call last):
File "/miniconda/envs/py36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/miniconda/envs/py36/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in
main()
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/miniconda/envs/py36/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--skip-test', '--config-file', 'configs/fcos/fcos_R_50_FPN_1x.yaml', 'DATALOADER.NUM_WORKERS', '0', 'OUTPUT_DIR', 'training_dir/fcos_R_50_FPN_1x']' returned non-zero exit status 1.

What should I do?

when i use the test_net.py to eval the model, an error happen

Traceback (most recent call last):
  File "/root/models/FCOS/tools/test_net.py", line 97, in <module>
    main()
  File "/root/models/FCOS/tools/test_net.py", line 91, in main
    output_folder=output_folder,
  File "/root/models/FCOS/maskrcnn_benchmark/engine/inference.py", line 115, in inference
    **extra_args)
  File "/root/models/FCOS/maskrcnn_benchmark/data/datasets/evaluation/__init__.py", line 22, in evaluate
    return coco_evaluation(**args)
  File "/root/models/FCOS/maskrcnn_benchmark/data/datasets/evaluation/coco/__init__.py", line 20, in coco_evaluation
    expected_results_sigma_tol=expected_results_sigma_tol,
  File "/root/models/FCOS/maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py", line 31, in do_coco_evaluation
    predictions, dataset, area=area, limit=limit
  File "/root/models/FCOS/maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py", line 233, in evaluate_box_proposals
    inds = prediction.get_field("objectness").sort(descending=True)[1]
  File "/root/models/FCOS/maskrcnn_benchmark/structures/bounding_box.py", line 43, in get_field
    return self.extra_fields[field]
KeyError: 'objectness'
2019-05-02 14:26:22,496 maskrcnn_benchmark.inference INFO: Evaluating bbox proposals

boxlist dosn't has the objecness field, but the coco_eval.py use it to produce results, how can i solve the problem?
if i just alter the "objectness" to "scores", is the eval result right?

training error

Hi, @tianzhi0549,thanks for your project.
I am trying to run this project with on my own dataset. I change the corresponding setup in config file and began to train. However, it runs for several iterations and then this error appears:

File "tools/train_net.py", line 174, in
main()
File "tools/train_net.py", line 167, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/detection/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 656, in _process_next_batch
self._put_indices()
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 646, in _put_indices
indices = next(self.sample_iter, None)
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/iteration_based_batch_sampler.py", line 24, in iter
for batch in self.batch_sampler:
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 107, in iter
batches = self._prepare_batches()
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in _prepare_batches
first_element_of_batch = [t[0].item() for t in merged]
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in
first_element_of_batch = [t[0].item() for t in merged]
IndexError: index 0 is out of bounds for dimension 0 with size 0

I have checked the format of my dataset, could you give me some suggestions about this error?
Thanks a lot

What is scale in class FCOSHead in focs.py?

I have a question that why init_value in scale is 1.0 and why need do 5 iters?
See the code in focs.py is self.scales = nn.ModuleList([Scale(init_value=1.0) for _ in range(5)])

About the dockerfile

The Dockerfile in the docker folder doesn't work well, is there something wrong with the Dockerfile?

What's the inference speed?

What's the inference speed of FCOS with ResNet-101-FPN and ResNeXt-32x8d-101-FPN backbone. The input image size is consistency with the training setting.

inference error

Hi,
I follow the instructions to create an environment FCOS,every package installed successful.

after run

python tools/test_net.py \
    --config-file configs/fcos/fcos_R_50_FPN_1x.yaml \
    MODEL.WEIGHT models/FCOS_R_50_FPN_1x.pth \
    TEST.IMS_PER_BATCH 4 

error happens

Traceback (most recent call last):
  File "tools/test_net.py", line 97, in <module>
    main()
  File "tools/test_net.py", line 49, in main
    cfg.merge_from_file(args.config_file)
  File "/home/peng/anaconda3/envs/FCOS/lib/python3.6/site-packages/yacs/config.py", line 213, in merge_from_file
    self.merge_from_other_cfg(cfg)
  File "/home/peng/anaconda3/envs/FCOS/lib/python3.6/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/peng/anaconda3/envs/FCOS/lib/python3.6/site-packages/yacs/config.py", line 460, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/peng/anaconda3/envs/FCOS/lib/python3.6/site-packages/yacs/config.py", line 473, in _merge_a_into_b
    raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.FCOS_ON'

thank you!

配置环境遇到的坑——已跑通

目前mask rcnn benchmark仅仅支持pytorch==1.0.0版本,而不支持最新的pytorch1.0.1的版本,因此不能使用conda安装,而应该使用pip install torch==1.0.0。
然后还有一个就是,需要指定path_category.py里面的DATA_DIR的全路径,否则会因为找不到json文件而报错

loss nan

I try to train coco, but loss is nan.
this is my training script:

CUDA_VISIBLE_DEVICES=1,3,4,5 python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --master_port=$((RANDOM + 10000)) \
    tools/train_net.py \
    --skip-test \
    --config-file configs/fcos/fcos_R_50_FPN_1x.yaml \
    DATALOADER.NUM_WORKERS 2 \
    OUTPUT_DIR training_dir/fcos_R_50_FPN_1x

this is my result

2019-04-16 09:19:17,383 maskrcnn_benchmark.trainer INFO: Start training
2019-04-16 09:19:33,719 maskrcnn_benchmark.trainer INFO: eta: 20:24:50  iter: 20  loss: 4.2079 (4.8923)  loss_centerness: 0.6670 (0.6685)  loss_cls: 0.9797 (0.9730)  loss_reg: 2.5527 (3.2508)  time: 0.6882 (0.8167)  data: 0.0219 (0.0651)  lr: 0.003333  max mem: 7051
2019-04-16 09:19:48,380 maskrcnn_benchmark.trainer INFO: eta: 19:21:49  iter: 40  loss: 3.2185 (4.0965)  loss_centerness: 0.6607 (0.6652)  loss_cls: 0.8450 (0.9074)  loss_reg: 1.6475 (2.5240)  time: 0.6947 (0.7749)  data: 0.0265 (0.0462)  lr: 0.003333  max mem: 7051
2019-04-16 09:20:02,270 maskrcnn_benchmark.trainer INFO: eta: 18:41:23  iter: 60  loss: 2.9554 (3.7219)  loss_centerness: 0.6592 (0.6634)  loss_cls: 0.7685 (0.8647)  loss_reg: 1.5265 (2.1938)  time: 0.6972 (0.7481)  data: 0.0283 (0.0399)  lr: 0.003333  max mem: 7051
2019-04-16 09:20:15,608 maskrcnn_benchmark.trainer INFO: eta: 18:10:43  iter: 80  loss: 2.8321 (nan)  loss_centerness: 0.6582 (nan)  loss_cls: 0.7013 (nan)  loss_reg: 1.4726 (nan)  time: 0.6690 (0.7278)  data: 0.0277 (0.0374)  lr: 0.003333  max mem: 7051
2019-04-16 09:20:28,939 maskrcnn_benchmark.trainer INFO: eta: 17:52:08  iter: 100  loss: nan (nan)  loss_centerness: nan (nan)  loss_cls: nan (nan)  loss_reg: nan (nan)  time: 0.6653 (0.7156)  data: 0.0262 (0.0353)  lr: 0.003333  max mem: 7051

I have tried for 3 times, always nan.
what's wrong with me?

box target?

Does the network output the box offsets (l*,r*,t*,b*) normalized by image size? I tried to train fcos in another task and find the regression branch's output value is unstable.

你好,我在训练自己数据集的时候,遇到了一个问题

File "/data/ubuntu/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/fpn.py", line 62, in forward last_inner = inner_lateral + inner_top_down RuntimeError: The size of tensor a (51) must match the size of tensor b (52) at non-singleton dimension 2

facebookresearch/maskrcnn-benchmark#142
已经使用这个方法,修改了config里面的_C.DATALOADER.SIZE_DIVISIBILITY,但是仍然报相同的错误,请问一下如何解决呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.