GithubHelp home page GithubHelp logo

uber-research / upsnet Goto Github PK

View Code? Open in Web Editor NEW
639.0 28.0 119.0 168 KB

UPSNet: A Unified Panoptic Segmentation Network

License: Other

Python 84.38% C++ 2.80% Cuda 11.98% Shell 0.84%
panoptic-segmentation scene-parsing instance-segmentation cvpr2019 computer-vision deep-learning

upsnet's Introduction

UPSNet: A Unified Panoptic Segmentation Network

Introduction

UPSNet is initially described in a CVPR 2019 oral paper.

Disclaimer

This repository is tested under Python 3.6, PyTorch 0.4.1. And model training is done with 16 GPUs by using horovod. It should also work under Python 2.7 / PyTorch 1.0 and with 4 GPUs.

License

© Uber, 2018-2019. Licensed under the Uber Non-Commercial License.

Citing UPSNet

If you find UPSNet is useful in your research, please consider citing:

@inproceedings{xiong19upsnet,
    Author = {Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun},
    Title = {UPSNet: A Unified Panoptic Segmentation Network},
    Conference = {CVPR},
    Year = {2019}
}

Main Results

COCO 2017 (trained on train-2017 set)

test split PQ SQ RQ PQTh PQSt
UPSNet-50 val 42.5 78.0 52.4 48.5 33.4
UPSNet-101-DCN test-dev 46.6 80.5 56.9 53.2 36.7

Cityscapes

PQ SQ RQ PQTh PQSt
UPSNet-50 59.3 79.7 73.0 54.6 62.7
UPSNet-101-COCO (ms test) 61.8 81.3 74.8 57.6 64.8

Requirements: Software

We recommend using Anaconda3 as it already includes many common packages.

Requirements: Hardware

We recommend using 4~16 GPUs with at least 11 GB memory to train our model.

Installation

Clone this repo to $UPSNet_ROOT

Run init.sh to build essential C++/CUDA modules and download pretrained model.

For Cityscapes:

Assuming you already downloaded Cityscapes dataset at $CITYSCAPES_ROOT and TrainIds label images are generated, please create a soft link by ln -s $CITYSCAPES_ROOT data/cityscapes under UPSNet_ROOT, and run init_cityscapes.sh to prepare Cityscapes dataset for UPSNet.

For COCO:

Assuming you already downloaded COCO dataset at $COCO_ROOT and have annotations and images folders under it, please create a soft link by ln -s $COCO_ROOT data/coco under UPSNet_ROOT, and run init_coco.sh to prepare COCO dataset for UPSNet.

Training:

python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/$EXP.yaml

Test:

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/$EXP.yaml

We provide serveral config files (16/4 GPUs for Cityscapes/COCO dataset) under upsnet/experiments folder.

Model Weights

The model weights that can reproduce numbers in our paper are available now. Please follow these steps to use them:

Run download_weights.sh to get trained model weights for Cityscapes and COCO.

For Cityscapes:

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet50_cityscapes_16gpu.yaml --weight_path ./model/upsnet_resnet_50_cityscapes_12000.pth
python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet101_cityscapes_w_coco_16gpu.yaml --weight_path ./model/upsnet_resnet_101_cityscapes_w_coco_3000.pth

For COCO:

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet50_coco_16gpu.yaml --weight_path model/upsnet_resnet_50_coco_90000.pth
python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet101_dcn_coco_3x_16gpu.yaml --weight_path model/upsnet_resnet_101_dcn_coco_270000.pth

upsnet's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

upsnet's Issues

Results on ADE20k?

Thanks for your great work.
Find that you have provided codes for ADE20K, but I can find results neither in the paper nor here. Have you tested your model on ADE20K dataset? And will you share the panoptic results with us someday in the future?
Thanks.

Extension horovod.torch has not been built.

Hello, After I run :
python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml

I encountered that:
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

How should I do? Thanks for answering.

Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 24, in
file, 'mpi_lib_v2')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 44, in
import horovod.torch as hvd
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 27, in
file, 'mpi_lib', '_mpi_lib')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))

About soft copy or deep copy in init_coco.py

Thanks for your good project, I have a question about your code. The code in your project you released pano_json_stff = pano_json.copy() is dict.copy, which is soft copy in subobject , only deep copy in parent object. It seems to make the categories repeated and wrong in panoptic_coco_categories_stff.json. Could you tell me whether the code is correct?

Typo in init_cityscapes.sh?

After downloading cityscapes database, I found their file names have no "Train" in
"cp gtFine///*labelTrainIds.png labels"

So, I removed "Train" as below
"cp gtFine///*labelIds.png labels"

Am I right? Thanks.

KeyError: 'color'

When I test on COCO dataset, The following error occurred:

File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 330, in evaluate_panoptic
gt_pans, gt_json, categories, color_gererator = get_gt()
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 247, in get_gt
color_gererator = IdGenerator(categories)
File "/UPSNet-master/upsnet/../lib/dataset_devkit/panopticapi/utils.py", line 40, in init
self.taken_colors.add(tuple(category['color']))
KeyError: 'color'

I think my annotation ''panoptic_val2017.json'' has some problems. The categories don't have color. I download the annotation in http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip
Is there a problem with the annotation file I downloaded, or am I doing something wrong?
If the the annotation have some problems, could you please offer me a correct download link?

A error about logging

upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 52, in
logger, final_output_path = create_logger(config.output_path, args.cfg, config.dataset.image_set)
File "upsnet/../lib/utils/logging.py", line 38, in create_logger
logging.basicConfig(filename=os.path.join(final_output_path, log_file), format=head)
AttributeError: 'module' object has no attribute 'basicConfig'

Process finished with exit code 0

ZeroDivisionError: division by zero

When doing an evaluation on the test set, I got the following error. I don't have any clue about it.

2019-04-18 03:56:17,288 | upsnet_end2end_test.py | line 307: unified pano result:
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_test.py", line 316, in <module>
    upsnet_test()
  File "upsnet/upsnet_end2end_test.py", line 308, in upsnet_test
    test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
  File "upsnet/../upsnet/dataset/base_dataset.py", line 333, in evaluate_panoptic
    results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
  File "upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
    results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
  File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
    return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero

But, the panoptic segmentation results can be successfully generated like below.
lindau_000000_000019

So, how is that possible that I run into the case n = 0? Any idea would be appreciated...Thanks.

How to track the change of learning rate?

Sorry but I am a little bit confused by the learning rate.

optimizer = SGD(params_lr, lr=1, momentum=config.train.momentum, weight_decay=config.train.wd)

lr = adjust_learning_rate(optimizer, curr_iter, config)

What is the relation of these two learning rates? And how can I get the real leaning rate? Since if I print the optimizer.param_groups[0]["lr"], I always get 1.

Cannot find reference 'deform_conv_cuda'

Thanks for your great work.
I followed the installation steps and the installation was successful,but when i begin to train the model ,it tells me "Cannot find reference 'deform_conv_cuda'",but the file is already exists,i don't how to fix it. Could you help me with this? Thanks.

init_coco.sh bug

It seems that there is a bug in init_coco.sh.
maybe PYTHONPATH=$(pwd)/lib/dataset_devkit/panopticapi:$PYTHONPATH should change to
PYTHONPATH=$(pwd)/lib/dataset_devkit:$PYTHONPATH

KeyError

I got the following error. Wondering if I need to delete /gtInstances.json first. Thanks.

Screen Shot 2019-04-22 at 10 31 17 PM

what‘’s the performance compared with Mask RCNN on object detection?

Hi, I find that UPSNet mainly add an semantic segmentation head to mask RCNN, I wonder what's the performance of UPSNet on object detection(or instance segmentation). Can I use it to improve my object detection MAP? Sorry but I didn't find this discuss in your paper.

I would be appreciated if you could give me some advice, thanks.

ModuleNotFoundError: No module named 'upsnet.operators.modules.distbatchnorm'

Error message:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 23, in
from upsnet.models.fpn import FPN
File "upsnet/../upsnet/models/fpn.py", line 23, in
from upsnet.operators.modules.distbatchnorm import BatchNorm2d
ModuleNotFoundError: No module named 'upsnet.operators.modules.distbatchnorm'

How to reproduce the result reported in the paper?

I have trained the network with the configuration file of upsnet_resnet50_cityscapes_4gpu.yaml and 4 2080Ti gpus. The configuration file is not modified at all.

And after testing the model with test_iteration of 48000, the result shows that mIoU is 75.054%, AP_box is 38.1%, AP_mask is 32.4%, PQ:SQ:RQ is 58.7% : 79.6% : 72.4%. But the result reported in the papr is 75.2%, 39.1%, 33.3%, 59.3% : 79.7% : 73% separately.

Is there any way to achieve the performance reported in the paper?

Bug

gt_inds = np.where((roidb['gt_classes']) > 0 & (roidb['is_crowd'] == 0))[0]

I think it should be
gt_inds = np.where((roidb['gt_classes'] > 0) & (roidb['is_crowd'] == 0))[0]

But when I modify it like this, another problem occured. I think it's due to the instance that is crowd.

I think in this place, the crowd instance should be added to construct the pan_gt.

ConnectionResetError: [Errno 104] Connection reset by peer AND ValueError: Expected input batch_size (423) to match target batch_size (465).

When I training the model on COCO dataset, it can run normally at the beginning, and the loss is also decrease. But the following problems will occur in the middle:

Traceback (most recent call last):
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 494, in Client
deliver_challenge(c, authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge
response = connection.recv_bytes(256) # reject large message
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 394, in
upsnet_train()
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 193, in upsnet_train
output = train_model(data, label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/rcnn.py", line 190, in forward
cls_loss = self.cls_loss(cls_score, cls_label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 862, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1405, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (511) to match target batch_size (512).

Could you please help me solve the problems.

Question about visualization when trying to test COCO dataset(train2014)

Thanks for your great work!
I had trained the model with COCO train2017&val2017, and pulled the latest code, but when I trying to test COCO dataset (train2014) with setting the vis_mask=True, I still only got an instance segmentation like figure.
Is there anything that needs special attention?
COCO_train2014_000000000025 jpg

OSError: cannot identify image file './data/coco/annotations/panoptic_train2017_semantic_trainid_stff/000000564031.png'

Hi, I get this problem when running the coco dataset? Why did I get this problem, I use the two annotations download from http://cocodataset.org/#download
2017 Panoptic Train/Val annotations [821MB]
2017 Train/Val annotations [241MB]

It seems that the image exists in tran2017 folder, but do not exists in panoptic_train2017_semantic_trainid_stff folder.

How can I fix this problem? Thanks.

ModuleNotFoundError: No module named 'upsnet.bbox.bbox'

Did I miss any step to get this error?

~/UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  exp_config = edict(yaml.load(f))
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 60, in <module>
    from upsnet.dataset import *
  File "upsnet/../upsnet/dataset/__init__.py", line 1, in <module>
    from .cityscapes import Cityscapes
  File "upsnet/../upsnet/dataset/cityscapes.py", line 32, in <module>
    from upsnet.dataset.json_dataset import JsonDataset, extend_with_flipped_entries, filter_for_training, add_bbox_regression_targets
  File "upsnet/../upsnet/dataset/json_dataset.py", line 53, in <module>
    import upsnet.bbox.bbox_transform as box_utils
  File "upsnet/../upsnet/bbox/bbox_transform.py", line 15, in <module>
    from .bbox import bbox_overlaps as bbox_overlaps_cython
ModuleNotFoundError: No module named 'upsnet.bbox.bbox'

This is what I got in the route: upsnet/bbox

~/UPSNet_ROOT/upsnet/bbox$ ls
bbox.c                                bbox.pyx            bbox_transform.py  __init__.py  sample_rois.py
bbox.cpython-37m-x86_64-linux-gnu.so  bbox_regression.py  build              __pycache__  setup.py

Thanks.

How could we get PQ values?

No matter I run training or testing script, the metrics all I see are AP or AR. Wondering if we should we calculate PQ values ourselves. Or there is anything I missed? Thanks.

one GPU

Can I use one GPU with 12G memory to train? Where does the code need to change?
Thank you very much!

The resnet101 performance

We try to get the performance using resnet101 in your paper. But the final performance is a little worse than the reported result.

I wonder is there any problems in our settings?
Do you make sure the code can get the pq about 46?

ValueError: operands could not be broadcast together with shapes (427,640) (426,640)

Thank you for modifying the problem of ‘’panoptic_val2017_stff.json‘’. When I use the fixed json file, there is such an error: ZeroDivisionError: division by zero. I think there may be a problem in '/UPSNet-master/upsnet/dataset/base_dataset.py'' line 222. And I replaced ‘’files = [item['file_name'] for item in pan_gt_json['images']]‘’ with ‘’files = [item['file_name'].replace('jpg', 'png') for item in pan_gt_json['images']]‘’, it works. But another problem has arisen:

Traceback (most recent call last):
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 337, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
pq_stat += p.get()
File "/home/xxl/anaconda3/envs/xxl_36/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: operands could not be broadcast together with shapes (427,640) (426,640)

Is it where I am doing something wrong? Could you please help me solve the problem?

A question about panoptic gt

matched_gt[gt_masks[[i], :, :] != 0] = i + self.num_seg_classes - self.num_inst_classes

In this line, you just use "gt_masks[[i], :, :] != 0" to judge whether a pixel belongs to the instance. But the picture is padded using 255. I think there should be another rule: "& gt_masks[[i], :, :] != 255". Do you think what I pointed is right?

Another question is the overlap relation. There may be a big table in the picture. If the table is the last instance, then the whole panoptic gt will be covered by the correspond "id". This is not the panoptic gt we want, isn't it?

Looking forwark for your reply.

Question about undefined symbol

Below is the error message I got. Not so sure about how to fix it. Could you help me with this? Thanks.

====
UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in
from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
File "upsnet/../upsnet/models/resnet.py", line 21, in
from upsnet.operators.modules.deform_conv import DeformConv
File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in
from upsnet.operators.functions.deform_conv import DeformConvFunction
File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in
from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE

Question about visualization

(1) After setting vis_mask to true, I got the result below. However, I found all cars are recognized as trains...Wondering if there is something with my training
Screen Shot 2019-04-16 at 9 48 28 PM

(2) How do we get the result of panoptic segmentation as below, instead of the one above (like instance segmentation)?
Screen Shot 2019-04-16 at 9 51 32 PM

RuntimeWarning: invalid value encountered i$ greater_equal

Has anyone encounter this error when running with one GPU?

upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$ greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]

The whole message is:

upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$
greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 426, in
upsnet_train()
File "upsnet/upsnet_end2end_train.py", line 287, in upsnet_train
output = train_model(*batch)
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call
_
result = self.forward(*input, **kwargs)
File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call
_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/models/resnet_upsnet.py", line 151, in forward
rois, _ = self.pyramid_proposal(rpn_cls_prob, rpn_bbox_pred, data['im_info'])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call
_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/operators/modules/pyramid_proposal.py", line 58, in forward
bbox_pred[3][[i], :, :, :], bbox_pred[4][[i], :, :, :], torch.from_numpy(im_info[i, :]))
File "upsnet/../upsnet/operators/functions/pyramid_proposal.py", line 168, in forward
keep = nms(np.hstack((proposals, scores)).astype(np.float32))
File "upsnet/../upsnet/nms/nms.py", line 45, in _nms
return gpu_nms(dets, thresh, device_id)
File "gpu_nms.pyx", line 36, in gpu_nms.gpu_nms
IndexError: Out of bounds on buffer access (axis 0)

binary_op(): expected both inputs to be on same device

Hi, sorry to interrupt you. I use your code which with 4 gpus, but I use the 4,5,6,7 gpu on my machine. However, when I try to resume my model, there is an error says:

Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 418, in <module>
    upsnet_train()
  File "upsnet/upsnet_end2end_train.py", line 300, in upsnet_train
    optimizer.step(lr)
  File "upsnet/../lib/nn/optimizer.py", line 98, in step
    buf.mul_(momentum).add_(group['lr'] * lr, d_p)
RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:0 and input b is on cuda:4

Can you tell me what to do? I really appreciate your help.

Training is slow.

Hello, I am trying to reproduce the results without horovod.
I use 4 Tesla K80 gpus (12GB) and train the net with "upsnet_resnet50_coco_4gpu.yaml" but I find that it may take more than 10 days for training.
Have you got some advice for speeding up the training?
Thanks.

A question about channel selecting in formula Z_unknown

Thanks for your good project, I have a question about your code in detail. The code in your project you released void_logits = torch.max(fcn_output['fcn_score'][:, (config.dataset.num_classes - 1):, ...], dim=1, keepdim=True)[0] - torch.max(seg_inst_logits, dim=1, keepdim=True)[0] is seems to be not corresponding to the paper formula Z_unknown = max (X_thing) - max (X_mask).
The first max item should be 52:132 channel in fcn_output['fcn_score']
([:, (config.dataset.num_seg_classes-config.dataset.num_classes+1):, ...]) which represents the things ? I am so sorry for my English and I don't know whether the description of question allows you understand what I want to ask?

RuntimeWarning: invalid value encountered in sqrt

Hi,

I have ran to this error while I was running (upsnet_resnet50_coco_1gpu.yaml is just a number of gpu change based on 4gpu.yaml

python -u upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml
  • running env: pytorch 1.0
  • version: 96b7b5172b7b76446f637f4922b7a2054e46703b and this PR change (#35)
222 2019-05-11 23:08:29,257 | callback.py | line 40 : Batch [1560]  Speed: 2.28 samples/sec Train-rpn_cls_loss=0.202601,    rpn_bbox_loss=0.119188, rcnn_accuracy=0.918098, cls_loss=0.512433,      bbox_loss=0.178919,     mask_loss=0.637334,     fcn_loss=3.486774,      fcn_roi_loss=4.059261,  panoptic_accuracy=0.267040,     panoptic_loss=2.879008,
223 2019-05-11 23:08:38,512 | callback.py | line 40 : Batch [1580]  Speed: 2.16 samples/sec Train-rpn_cls_loss=0.204118,    rpn_bbox_loss=0.121994, rcnn_accuracy=0.918320, cls_loss=0.511436,      bbox_loss=0.178261,     mask_loss=0.637755,     fcn_loss=3.488841,      fcn_roi_loss=4.060041,  panoptic_accuracy=0.265982,     panoptic_loss=2.881260,
224 2019-05-11 23:08:47,145 | callback.py | line 40 : Batch [1600]  Speed: 2.32 samples/sec Train-rpn_cls_loss=0.208637,    rpn_bbox_loss=0.125312, rcnn_accuracy=0.918627, cls_loss=0.510648,      bbox_loss=0.177416,     mask_loss=0.637829,     fcn_loss=3.493248,      fcn_roi_loss=4.063709,  panoptic_accuracy=0.264609,     panoptic_loss=2.885051,
225 2019-05-11 23:08:55,470 | callback.py | line 40 : Batch [1620]  Speed: 2.40 samples/sec Train-rpn_cls_loss=0.210174,    rpn_bbox_loss=0.125260, rcnn_accuracy=0.918797, cls_loss=0.510753,      bbox_loss=0.176945,     mask_loss=0.637923,     fcn_loss=3.496151,      fcn_roi_loss=4.067450,  panoptic_accuracy=0.264234,     panoptic_loss=2.887030,
226 upsnet/../upsnet/operators/modules/fpn_roi_align.py:38: RuntimeWarning: invalid value encountered in sqrt
227   feat_id = np.clip(np.floor(2 + np.log2(np.sqrt(w * h) / 224 + 1e-6)), 0, 3)
228 Traceback (most recent call last):
229   File "upsnet/upsnet_end2end_train.py", line 403, in <module>
230     upsnet_train()
231   File "upsnet/upsnet_end2end_train.py", line 269, in upsnet_train
232     output = train_model(*batch)
233   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
234     result = self.forward(*input, **kwargs)
235   File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
236     return self.module(*inputs[0], **kwargs[0])
237   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
238     result = self.forward(*input, **kwargs)
239   File "upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
240     cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
241   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
242     result = self.forward(*input, **kwargs)
243   File "upsnet/../upsnet/models/rcnn.py", line 190, in forward
244     cls_loss = self.cls_loss(cls_score, cls_label)
245   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
246     result = self.forward(*input, **kwargs)
247   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 942, in forward
248     ignore_index=self.ignore_index, reduction=self.reduction)
249   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
250     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
251   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
252     .format(input.size(0), target.size(0)))
253 ValueError: Expected input batch_size (510) to match target batch_size (512).

Code not executing

Hi,

After following all the steps mentioned and setting up the environment as python 3.6; pytorch 0.4.1; the code has many bugs making it difficult to reproduce the results. Are there some other changes that need to be done - or do following these steps directly work? I am trying to train the model on COCO using the single GPU config file.

About your coco id and labels

I noticed that you make the categories sorted as 0-52:stuff,53-132:things in creating panoptic_coco_categories_stff.json. Why you do this change?

So when you print out the result matrix like this format:

IDX | PQ SQ RQ IoU TP FP FN

Does the IDX 0 mean the result for the first thing class or the first stuff class?
Similarly, how about other metrics like the Mean and per-category AP? The first line represent the first thing class or first stuff class?

I really appreciate your reply.

ZeroDivisionError: division by zero

2019-04-10 16:16:39,900 | upsnet_end2end_test.py | line 303: unified pano result:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_test.py", line 312, in
upsnet_test()
File "upsnet/upsnet_end2end_test.py", line 304, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "upsnet/../upsnet/dataset/base_dataset.py", line 328, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "upsnet/../upsnet/dataset/base_dataset.py", line 296, in pq_compute
results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero

Hi, when I evaluated the trained model, I found this error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.