uber-research / upsnet Goto Github PK

View Code? Open in Web Editor NEW

639.0 28.0 119.0 168 KB

UPSNet: A Unified Panoptic Segmentation Network

License: Other

Python 84.38% C++ 2.80% Cuda 11.98% Shell 0.84%

panoptic-segmentation scene-parsing instance-segmentation cvpr2019 computer-vision deep-learning

upsnet's Introduction

UPSNet: A Unified Panoptic Segmentation Network

Introduction

UPSNet is initially described in a CVPR 2019 oral paper.

Disclaimer

This repository is tested under Python 3.6, PyTorch 0.4.1. And model training is done with 16 GPUs by using horovod. It should also work under Python 2.7 / PyTorch 1.0 and with 4 GPUs.

License

Citing UPSNet

If you find UPSNet is useful in your research, please consider citing:

@inproceedings{xiong19upsnet,
    Author = {Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun},
    Title = {UPSNet: A Unified Panoptic Segmentation Network},
    Conference = {CVPR},
    Year = {2019}
}

Main Results

COCO 2017 (trained on train-2017 set)

	test split	PQ	SQ	RQ	PQ^Th	PQ^St
UPSNet-50	val	42.5	78.0	52.4	48.5	33.4
UPSNet-101-DCN	test-dev	46.6	80.5	56.9	53.2	36.7

Cityscapes

	PQ	SQ	RQ	PQ^Th	PQ^St
UPSNet-50	59.3	79.7	73.0	54.6	62.7
UPSNet-101-COCO (ms test)	61.8	81.3	74.8	57.6	64.8

Requirements: Software

We recommend using Anaconda3 as it already includes many common packages.

Requirements: Hardware

We recommend using 4~16 GPUs with at least 11 GB memory to train our model.

Installation

Clone this repo to $UPSNet_ROOT

Run init.sh to build essential C++/CUDA modules and download pretrained model.

For Cityscapes:

Assuming you already downloaded Cityscapes dataset at $CITYSCAPES_ROOT and TrainIds label images are generated, please create a soft link by ln -s $CITYSCAPES_ROOT data/cityscapes under UPSNet_ROOT, and run init_cityscapes.sh to prepare Cityscapes dataset for UPSNet.

For COCO:

Assuming you already downloaded COCO dataset at $COCO_ROOT and have annotations and images folders under it, please create a soft link by ln -s $COCO_ROOT data/coco under UPSNet_ROOT, and run init_coco.sh to prepare COCO dataset for UPSNet.

Training:

python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/$EXP.yaml

Test:

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/$EXP.yaml

We provide serveral config files (16/4 GPUs for Cityscapes/COCO dataset) under upsnet/experiments folder.

Model Weights

The model weights that can reproduce numbers in our paper are available now. Please follow these steps to use them:

Run download_weights.sh to get trained model weights for Cityscapes and COCO.

For Cityscapes:

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet50_cityscapes_16gpu.yaml --weight_path ./model/upsnet_resnet_50_cityscapes_12000.pth

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet101_cityscapes_w_coco_16gpu.yaml --weight_path ./model/upsnet_resnet_101_cityscapes_w_coco_3000.pth

For COCO:

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet50_coco_16gpu.yaml --weight_path model/upsnet_resnet_50_coco_90000.pth

python upsnet/upsnet_end2end_test.py --cfg upsnet/experiments/upsnet_resnet101_dcn_coco_3x_16gpu.yaml --weight_path model/upsnet_resnet_101_dcn_coco_270000.pth

upsnet's People

Stargazers

Watchers

Forkers

ml-lab fendaq collector-m hajungong007 jdc08161063 batermj nnu-gisa elena-ssq zaidhassanch weizhixiang-workspace xiaolongbetter iforeseeu sxning newmesc alliedel munian-ustc whw19950510 labimage kushalvyas eeegui shubhampachori12110095 xuyuewei huayuuu mathpopo suyanzhou626 lulusindazc ixuanzhang alexsax tsbiosky uhiu shuitx meteora9479 civilpat evangeb fayeaa xiangyi1996 lfdeep jimaldon rotorliu youngleox tauhidstanford chris0919 marcelomata yurilq hmchuong dlopespenna wolf943134497 viredery ytzhao gengdavid nanwangac pkucactus hdony curtkim paranoidw delldu giorking acodeporter fpli-mbr charlotte12l lfs119 ajitsarkaar vertexstudio woshiwoyali minghanz itking666 anikily rozgo needer28 helenncku ginobilinie junxen fcakyon kwon852456 zergey andronov04 qinwei-hfut nana0127 niltoone cosmoshua o7s8r6 ranonrkm kunlqt huangongshu shmily-yh dyy0205 wdjang xiangnanhe youtang1993 liuhl-source t-mm-dh cv-ip he-fangyuan bupthua1 schmluk siyisan arition saoruy kshaonan ctkindle

upsnet's Issues

Results on ADE20k?

Thanks for your great work.
Find that you have provided codes for ADE20K, but I can find results neither in the paper nor here. Have you tested your model on ADE20K dataset? And will you share the panoptic results with us someday in the future?
Thanks.

Extension horovod.torch has not been built.

Hello, After I run :
python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml

I encountered that:
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

How should I do? Thanks for answering.

Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 24, in
file, 'mpi_lib_v2')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 44, in
import horovod.torch as hvd
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 27, in
file, 'mpi_lib', '_mpi_lib')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))

About soft copy or deep copy in init_coco.py

Thanks for your good project, I have a question about your code. The code in your project you released pano_json_stff = pano_json.copy() is dict.copy, which is soft copy in subobject , only deep copy in parent object. It seems to make the categories repeated and wrong in panoptic_coco_categories_stff.json. Could you tell me whether the code is correct?

Typo in init_cityscapes.sh?

After downloading cityscapes database, I found their file names have no "Train" in
"cp gtFine///*labelTrainIds.png labels"

So, I removed "Train" as below
"cp gtFine///*labelIds.png labels"

Am I right? Thanks.

KeyError: 'color'

When I test on COCO dataset, The following error occurred：

File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 330, in evaluate_panoptic
gt_pans, gt_json, categories, color_gererator = get_gt()
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 247, in get_gt
color_gererator = IdGenerator(categories)
File "/UPSNet-master/upsnet/../lib/dataset_devkit/panopticapi/utils.py", line 40, in init
self.taken_colors.add(tuple(category['color']))
KeyError: 'color'

I think my annotation ''panoptic_val2017.json'' has some problems. The categories don't have color. I download the annotation in http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip
Is there a problem with the annotation file I downloaded, or am I doing something wrong?
If the the annotation have some problems, could you please offer me a correct download link?

No module named BatchNorm2d in "upsnet.operators.modules.distbatchnorm" ?

Have not see any module named BatchNorm2d in upsnet.operators.modules.distbatchnorm.

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument

Hi, I got this notification when running the command
python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_cityscapes_4gpu.yaml
But the model can still run normally, will this be a problem or may affect the final results?
Thanks

A error about logging

upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 52, in
logger, final_output_path = create_logger(config.output_path, args.cfg, config.dataset.image_set)
File "upsnet/../lib/utils/logging.py", line 38, in create_logger
logging.basicConfig(filename=os.path.join(final_output_path, log_file), format=head)
AttributeError: 'module' object has no attribute 'basicConfig'

Process finished with exit code 0

ZeroDivisionError: division by zero

When doing an evaluation on the test set, I got the following error. I don't have any clue about it.

2019-04-18 03:56:17,288 | upsnet_end2end_test.py | line 307: unified pano result:
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_test.py", line 316, in <module>
    upsnet_test()
  File "upsnet/upsnet_end2end_test.py", line 308, in upsnet_test
    test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
  File "upsnet/../upsnet/dataset/base_dataset.py", line 333, in evaluate_panoptic
    results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
  File "upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
    results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
  File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
    return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero

But, the panoptic segmentation results can be successfully generated like below.

So, how is that possible that I run into the case n = 0? Any idea would be appreciated...Thanks.

How to track the change of learning rate?

Sorry but I am a little bit confused by the learning rate.

UPSNet/upsnet/upsnet_end2end_train.py

Line 119 in 96b7b51

 optimizer = SGD(params_lr, lr=1, momentum=config.train.momentum, weight_decay=config.train.wd) 

UPSNet/upsnet/upsnet_end2end_train.py

Line 200 in 96b7b51

lr = adjust_learning_rate(optimizer, curr_iter, config)

What is the relation of these two learning rates? And how can I get the real leaning rate? Since if I print the optimizer.param_groups[0]["lr"], I always get 1.

will you further apply it to resnet101?

where can I download the pretrained model?

where can I download the pretrained model like “resnet-50-caffe.pth”

Cannot find reference 'deform_conv_cuda'

Thanks for your great work.
I followed the installation steps and the installation was successful,but when i begin to train the model ,it tells me "Cannot find reference 'deform_conv_cuda'",but the file is already exists，i don't how to fix it. Could you help me with this? Thanks.

init_coco.sh bug

It seems that there is a bug in init_coco.sh.
maybe PYTHONPATH=$(pwd)/lib/dataset_devkit/panopticapi:$PYTHONPATH should change to
PYTHONPATH=$(pwd)/lib/dataset_devkit:$PYTHONPATH

KeyError

I got the following error. Wondering if I need to delete /gtInstances.json first. Thanks.

what‘’s the performance compared with Mask RCNN on object detection?

Hi, I find that UPSNet mainly add an semantic segmentation head to mask RCNN, I wonder what's the performance of UPSNet on object detection(or instance segmentation). Can I use it to improve my object detection MAP? Sorry but I didn't find this discuss in your paper.

I would be appreciated if you could give me some advice, thanks.

ModuleNotFoundError: No module named 'upsnet.operators.modules.distbatchnorm'

Error message:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 23, in
from upsnet.models.fpn import FPN
File "upsnet/../upsnet/models/fpn.py", line 23, in
from upsnet.operators.modules.distbatchnorm import BatchNorm2d
ModuleNotFoundError: No module named 'upsnet.operators.modules.distbatchnorm'

Is deformable conv used in the setting of r50 coco？

no module named upsnet.operators.modules.distbatchnorm

from upsnet.operators.modules.distbatchnorm import BatchNorm2d

but no distbatchnorm file

How to reproduce the result reported in the paper?

I have trained the network with the configuration file of upsnet_resnet50_cityscapes_4gpu.yaml and 4 2080Ti gpus. The configuration file is not modified at all.

And after testing the model with test_iteration of 48000, the result shows that mIoU is 75.054%, AP_box is 38.1%, AP_mask is 32.4%, PQ:SQ:RQ is 58.7% : 79.6% : 72.4%. But the result reported in the papr is 75.2%, 39.1%, 33.3%, 59.3% : 79.7% : 73% separately.

Is there any way to achieve the performance reported in the paper?

OSError: [Errno 12] Cannot allocate memory

I got this error: OSError: [Errno 12] Cannot allocate memory as below.
I am wondering if this is because my current GPU memory is too small, which is 13 GB now.

Bug

UPSNet/upsnet/models/resnet_upsnet.py

Line 285 in 96b7b51

gt_inds = np.where((roidb['gt_classes']) > 0 & (roidb['is_crowd'] == 0))[0]

I think it should be
gt_inds = np.where((roidb['gt_classes'] > 0) & (roidb['is_crowd'] == 0))[0]

But when I modify it like this, another problem occured. I think it's due to the instance that is crowd.

I think in this place, the crowd instance should be added to construct the pan_gt.

Is your mask_loss the same as MaskRCNN

UPSNet/upsnet/models/rcnn.py

Line 173 in ba524d5

def mask_loss(self, input, target, weight):

Sorry but I don't understand why you design your loss as this. Why the mask score doesn't need go through sigmoid function?

Gradient w.r.t input is NoneType

I wanted to observe the gradient w.r.t inputs (that is, the input image here).
So, I tried to print data['data'].grad after loss.backward() below.
https://github.com/uber-research/UPSNet/blob/master/upsnet/upsnet_end2end_train.py#L216
But I got an error saying that data['data'].grad is NoneType.
Is there anything I misunderstand or missed or what's the correct way to dump the gradient w.r.t inputs? Any idea would be appreciated. Thanks.

why I got the error in deformable_col2im invalid argument?

why I got the error in deformable_col2im invalid argument? Even though it goes wrong, it still tells me loss, is that ok?

Why your bounding box regression loss is so small?

I found that the bounding box regression loss at first of your code is really small, like of ~0.1 scale. Could you give me some explanation?

ConnectionResetError: [Errno 104] Connection reset by peer AND ValueError: Expected input batch_size (423) to match target batch_size (465).

When I training the model on COCO dataset, it can run normally at the beginning, and the loss is also decrease. But the following problems will occur in the middle：

Traceback (most recent call last):
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 494, in Client
deliver_challenge(c, authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge
response = connection.recv_bytes(256) # reject large message
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 394, in
upsnet_train()
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 193, in upsnet_train
output = train_model(data, label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/rcnn.py", line 190, in forward
cls_loss = self.cls_loss(cls_score, cls_label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 862, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1405, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (511) to match target batch_size (512).

Could you please help me solve the problems.

Question about visualization when trying to test COCO dataset(train2014)

Thanks for your great work!
I had trained the model with COCO train2017&val2017, and pulled the latest code, but when I trying to test COCO dataset (train2014) with setting the vis_mask=True, I still only got an instance segmentation like figure.
Is there anything that needs special attention?

OSError: cannot identify image file './data/coco/annotations/panoptic_train2017_semantic_trainid_stff/000000564031.png'

Hi, I get this problem when running the coco dataset? Why did I get this problem, I use the two annotations download from http://cocodataset.org/#download
2017 Panoptic Train/Val annotations [821MB]
2017 Train/Val annotations [241MB]

It seems that the image exists in tran2017 folder, but do not exists in panoptic_train2017_semantic_trainid_stff folder.

How can I fix this problem? Thanks.

question about generating panoptic-gt

UPSNet/upsnet/operators/modules/mask_matching.py

Line 61 in 9191a59

class PanopticGTGenerate(nn.Module):

where do you use this function to generate ground truth panoptic logits?

BTW, is this possible to use COCO panoptic dataset annotations to generate them?

ModuleNotFoundError: No module named 'upsnet.bbox.bbox'

Did I miss any step to get this error?

~/UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  exp_config = edict(yaml.load(f))
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 60, in <module>
    from upsnet.dataset import *
  File "upsnet/../upsnet/dataset/__init__.py", line 1, in <module>
    from .cityscapes import Cityscapes
  File "upsnet/../upsnet/dataset/cityscapes.py", line 32, in <module>
    from upsnet.dataset.json_dataset import JsonDataset, extend_with_flipped_entries, filter_for_training, add_bbox_regression_targets
  File "upsnet/../upsnet/dataset/json_dataset.py", line 53, in <module>
    import upsnet.bbox.bbox_transform as box_utils
  File "upsnet/../upsnet/bbox/bbox_transform.py", line 15, in <module>
    from .bbox import bbox_overlaps as bbox_overlaps_cython
ModuleNotFoundError: No module named 'upsnet.bbox.bbox'

This is what I got in the route: upsnet/bbox

~/UPSNet_ROOT/upsnet/bbox$ ls
bbox.c                                bbox.pyx            bbox_transform.py  __init__.py  sample_rois.py
bbox.cpython-37m-x86_64-linux-gnu.so  bbox_regression.py  build              __pycache__  setup.py

Thanks.

How could we get PQ values?

No matter I run training or testing script, the metrics all I see are AP or AR. Wondering if we should we calculate PQ values ourselves. Or there is anything I missed? Thanks.

one GPU

Can I use one GPU with 12G memory to train? Where does the code need to change?
Thank you very much！

Bug

UPSNet/upsnet/models/resnet_upsnet.py

Line 287 in 9191a59

cls_idx = roidb['gt_classes']

Line 287 should be: cls_idx = roidb['gt_classes'][gt_inds]

The resnet101 performance

We try to get the performance using resnet101 in your paper. But the final performance is a little worse than the reported result.

I wonder is there any problems in our settings?
Do you make sure the code can get the pq about 46？

ValueError: operands could not be broadcast together with shapes (427,640) (426,640)

Thank you for modifying the problem of ‘’panoptic_val2017_stff.json‘’. When I use the fixed json file, there is such an error： ZeroDivisionError: division by zero. I think there may be a problem in '/UPSNet-master/upsnet/dataset/base_dataset.py'' line 222. And I replaced ‘’files = [item['file_name'] for item in pan_gt_json['images']]‘’ with ‘’files = [item['file_name'].replace('jpg', 'png') for item in pan_gt_json['images']]‘’, it works. But another problem has arisen:

Traceback (most recent call last):
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 337, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
pq_stat += p.get()
File "/home/xxl/anaconda3/envs/xxl_36/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: operands could not be broadcast together with shapes (427,640) (426,640)

Is it where I am doing something wrong? Could you please help me solve the problem?

A question about panoptic gt

UPSNet/upsnet/operators/modules/mask_matching.py

Line 52 in 2ced987

 matched_gt[gt_masks[[i], :, :] != 0] = i + self.num_seg_classes - self.num_inst_classes 

In this line, you just use "gt_masks[[i], :, :] != 0" to judge whether a pixel belongs to the instance. But the picture is padded using 255. I think there should be another rule: "& gt_masks[[i], :, :] != 255". Do you think what I pointed is right?

Another question is the overlap relation. There may be a big table in the picture. If the table is the last instance, then the whole panoptic gt will be covered by the correspond "id". This is not the panoptic gt we want, isn't it?

Looking forwark for your reply.

Question about undefined symbol

Below is the error message I got. Not so sure about how to fix it. Could you help me with this? Thanks.

====
UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in
from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
File "upsnet/../upsnet/models/resnet.py", line 21, in
from upsnet.operators.modules.deform_conv import DeformConv
File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in
from upsnet.operators.functions.deform_conv import DeformConvFunction
File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in
from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE

Is it OK if I use GCC 4.8.5?

Is the bounding box regression loss different from Mask RCNN?

UPSNet/upsnet/models/rcnn.py

Line 179 in 96b7b51

 def smooth_l1_loss(self, bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, sigma=1.0): 

You code here just use the coordinates to calculate loss, but original bounding box regression in MaskRCNN uses ground truth like this: tx=(Gx−Px)/Pw, ty=(Gy−Py)/Ph, tw=log(Gw/Pw), th=log(Gh/Ph).

Does it matter?

Question about visualization

(1) After setting vis_mask to true, I got the result below. However, I found all cars are recognized as trains...Wondering if there is something with my training

(2) How do we get the result of panoptic segmentation as below, instead of the one above (like instance segmentation)?

RuntimeWarning: invalid value encountered i$ greater_equal

Has anyone encounter this error when running with one GPU?

upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$ greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]

The whole message is:

upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$
greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 426, in
upsnet_train()
File "upsnet/upsnet_end2end_train.py", line 287, in upsnet_train
output = train_model(*batch)
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/models/resnet_upsnet.py", line 151, in forward
rois, _ = self.pyramid_proposal(rpn_cls_prob, rpn_bbox_pred, data['im_info'])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/operators/modules/pyramid_proposal.py", line 58, in forward
bbox_pred[3][[i], :, :, :], bbox_pred[4][[i], :, :, :], torch.from_numpy(im_info[i, :]))
File "upsnet/../upsnet/operators/functions/pyramid_proposal.py", line 168, in forward
keep = nms(np.hstack((proposals, scores)).astype(np.float32))
File "upsnet/../upsnet/nms/nms.py", line 45, in _nms
return gpu_nms(dets, thresh, device_id)
File "gpu_nms.pyx", line 36, in gpu_nms.gpu_nms
IndexError: Out of bounds on buffer access (axis 0)

ImportError: cannot import name accumulate

[Python 2.7 / PyTorch 1.0]
sh init,sh
error: ImportError: cannot import name accumulate

binary_op(): expected both inputs to be on same device

Hi, sorry to interrupt you. I use your code which with 4 gpus, but I use the 4,5,6,7 gpu on my machine. However, when I try to resume my model, there is an error says:

Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 418, in <module>
    upsnet_train()
  File "upsnet/upsnet_end2end_train.py", line 300, in upsnet_train
    optimizer.step(lr)
  File "upsnet/../lib/nn/optimizer.py", line 98, in step
    buf.mul_(momentum).add_(group['lr'] * lr, d_p)
RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:0 and input b is on cuda:4

Can you tell me what to do? I really appreciate your help.

Training is slow.

Hello, I am trying to reproduce the results without horovod.
I use 4 Tesla K80 gpus (12GB) and train the net with "upsnet_resnet50_coco_4gpu.yaml" but I find that it may take more than 10 days for training.
Have you got some advice for speeding up the training?
Thanks.

A question about channel selecting in formula Z_unknown

Thanks for your good project, I have a question about your code in detail. The code in your project you released void_logits = torch.max(fcn_output['fcn_score'][:, (config.dataset.num_classes - 1):, ...], dim=1, keepdim=True)[0] - torch.max(seg_inst_logits, dim=1, keepdim=True)[0] is seems to be not corresponding to the paper formula Z_unknown = max (X_thing) - max (X_mask).
The first max item should be 52:132 channel in fcn_output['fcn_score']
([:, (config.dataset.num_seg_classes-config.dataset.num_classes+1):, ...]) which represents the things ? I am so sorry for my English and I don't know whether the description of question allows you understand what I want to ask?

RuntimeWarning: invalid value encountered in sqrt

Hi,

I have ran to this error while I was running (upsnet_resnet50_coco_1gpu.yaml is just a number of gpu change based on 4gpu.yaml

python -u upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml

running env: pytorch 1.0
version: 96b7b5172b7b76446f637f4922b7a2054e46703b and this PR change (#35)

222 2019-05-11 23:08:29,257 | callback.py | line 40 : Batch [1560]  Speed: 2.28 samples/sec Train-rpn_cls_loss=0.202601,    rpn_bbox_loss=0.119188, rcnn_accuracy=0.918098, cls_loss=0.512433,      bbox_loss=0.178919,     mask_loss=0.637334,     fcn_loss=3.486774,      fcn_roi_loss=4.059261,  panoptic_accuracy=0.267040,     panoptic_loss=2.879008,
223 2019-05-11 23:08:38,512 | callback.py | line 40 : Batch [1580]  Speed: 2.16 samples/sec Train-rpn_cls_loss=0.204118,    rpn_bbox_loss=0.121994, rcnn_accuracy=0.918320, cls_loss=0.511436,      bbox_loss=0.178261,     mask_loss=0.637755,     fcn_loss=3.488841,      fcn_roi_loss=4.060041,  panoptic_accuracy=0.265982,     panoptic_loss=2.881260,
224 2019-05-11 23:08:47,145 | callback.py | line 40 : Batch [1600]  Speed: 2.32 samples/sec Train-rpn_cls_loss=0.208637,    rpn_bbox_loss=0.125312, rcnn_accuracy=0.918627, cls_loss=0.510648,      bbox_loss=0.177416,     mask_loss=0.637829,     fcn_loss=3.493248,      fcn_roi_loss=4.063709,  panoptic_accuracy=0.264609,     panoptic_loss=2.885051,
225 2019-05-11 23:08:55,470 | callback.py | line 40 : Batch [1620]  Speed: 2.40 samples/sec Train-rpn_cls_loss=0.210174,    rpn_bbox_loss=0.125260, rcnn_accuracy=0.918797, cls_loss=0.510753,      bbox_loss=0.176945,     mask_loss=0.637923,     fcn_loss=3.496151,      fcn_roi_loss=4.067450,  panoptic_accuracy=0.264234,     panoptic_loss=2.887030,
226 upsnet/../upsnet/operators/modules/fpn_roi_align.py:38: RuntimeWarning: invalid value encountered in sqrt
227   feat_id = np.clip(np.floor(2 + np.log2(np.sqrt(w * h) / 224 + 1e-6)), 0, 3)
228 Traceback (most recent call last):
229   File "upsnet/upsnet_end2end_train.py", line 403, in <module>
230     upsnet_train()
231   File "upsnet/upsnet_end2end_train.py", line 269, in upsnet_train
232     output = train_model(*batch)
233   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
234     result = self.forward(*input, **kwargs)
235   File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
236     return self.module(*inputs[0], **kwargs[0])
237   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
238     result = self.forward(*input, **kwargs)
239   File "upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
240     cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
241   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
242     result = self.forward(*input, **kwargs)
243   File "upsnet/../upsnet/models/rcnn.py", line 190, in forward
244     cls_loss = self.cls_loss(cls_score, cls_label)
245   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
246     result = self.forward(*input, **kwargs)
247   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 942, in forward
248     ignore_index=self.ignore_index, reduction=self.reduction)
249   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
250     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
251   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
252     .format(input.size(0), target.size(0)))
253 ValueError: Expected input batch_size (510) to match target batch_size (512).

Code not executing

Hi,

After following all the steps mentioned and setting up the environment as python 3.6; pytorch 0.4.1; the code has many bugs making it difficult to reproduce the results. Are there some other changes that need to be done - or do following these steps directly work? I am trying to train the model on COCO using the single GPU config file.

About your coco id and labels

I noticed that you make the categories sorted as 0-52:stuff,53-132:things in creating panoptic_coco_categories_stff.json. Why you do this change?

So when you print out the result matrix like this format:

IDX | PQ SQ RQ IoU TP FP FN

Does the IDX 0 mean the result for the first thing class or the first stuff class?
Similarly, how about other metrics like the Mean and per-category AP? The first line represent the first thing class or first stuff class?

I really appreciate your reply.

ZeroDivisionError: division by zero

2019-04-10 16:16:39,900 | upsnet_end2end_test.py | line 303: unified pano result:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_test.py", line 312, in
upsnet_test()
File "upsnet/upsnet_end2end_test.py", line 304, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "upsnet/../upsnet/dataset/base_dataset.py", line 328, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "upsnet/../upsnet/dataset/base_dataset.py", line 296, in pq_compute
results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero

Hi, when I evaluated the trained model, I found this error.

uber-research / upsnet Goto Github PK

upsnet's Introduction

UPSNet: A Unified Panoptic Segmentation Network

Introduction

Disclaimer

License

Citing UPSNet

Main Results

Requirements: Software

Requirements: Hardware

Installation

Model Weights

upsnet's People

Stargazers

Watchers

Forkers

upsnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs