uber-research / upsnet Goto Github PK

View Code? Open in Web Editor NEW

639.0 28.0 119.0 168 KB

UPSNet: A Unified Panoptic Segmentation Network

License: Other

Python 84.38% C++ 2.80% Cuda 11.98% Shell 0.84%

panoptic-segmentation scene-parsing instance-segmentation cvpr2019 computer-vision deep-learning

upsnet's Issues

ZeroDivisionError: division by zero

When doing an evaluation on the test set, I got the following error. I don't have any clue about it.

2019-04-18 03:56:17,288 | upsnet_end2end_test.py | line 307: unified pano result:
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_test.py", line 316, in <module>
    upsnet_test()
  File "upsnet/upsnet_end2end_test.py", line 308, in upsnet_test
    test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
  File "upsnet/../upsnet/dataset/base_dataset.py", line 333, in evaluate_panoptic
    results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
  File "upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
    results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
  File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
    return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero

But, the panoptic segmentation results can be successfully generated like below.

So, how is that possible that I run into the case n = 0? Any idea would be appreciated...Thanks.

ConnectionResetError: [Errno 104] Connection reset by peer AND ValueError: Expected input batch_size (423) to match target batch_size (465).

When I training the model on COCO dataset, it can run normally at the beginning, and the loss is also decrease. But the following problems will occur in the middle：

Traceback (most recent call last):
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 494, in Client
deliver_challenge(c, authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge
response = connection.recv_bytes(256) # reject large message
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 394, in
upsnet_train()
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 193, in upsnet_train
output = train_model(data, label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/rcnn.py", line 190, in forward
cls_loss = self.cls_loss(cls_score, cls_label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 862, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1405, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (511) to match target batch_size (512).

Could you please help me solve the problems.

How could we get PQ values?

No matter I run training or testing script, the metrics all I see are AP or AR. Wondering if we should we calculate PQ values ourselves. Or there is anything I missed? Thanks.

one GPU

Can I use one GPU with 12G memory to train? Where does the code need to change?
Thank you very much！

ValueError: operands could not be broadcast together with shapes (427,640) (426,640)

Thank you for modifying the problem of ‘’panoptic_val2017_stff.json‘’. When I use the fixed json file, there is such an error： ZeroDivisionError: division by zero. I think there may be a problem in '/UPSNet-master/upsnet/dataset/base_dataset.py'' line 222. And I replaced ‘’files = [item['file_name'] for item in pan_gt_json['images']]‘’ with ‘’files = [item['file_name'].replace('jpg', 'png') for item in pan_gt_json['images']]‘’, it works. But another problem has arisen:

Traceback (most recent call last):
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 337, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
pq_stat += p.get()
File "/home/xxl/anaconda3/envs/xxl_36/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: operands could not be broadcast together with shapes (427,640) (426,640)

Is it where I am doing something wrong? Could you please help me solve the problem?

ModuleNotFoundError: No module named 'upsnet.operators.modules.distbatchnorm'

Error message:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 23, in
from upsnet.models.fpn import FPN
File "upsnet/../upsnet/models/fpn.py", line 23, in
from upsnet.operators.modules.distbatchnorm import BatchNorm2d
ModuleNotFoundError: No module named 'upsnet.operators.modules.distbatchnorm'

question about generating panoptic-gt

UPSNet/upsnet/operators/modules/mask_matching.py

Line 61 in 9191a59

class PanopticGTGenerate(nn.Module):

where do you use this function to generate ground truth panoptic logits?

BTW, is this possible to use COCO panoptic dataset annotations to generate them?

Training is slow.

Hello, I am trying to reproduce the results without horovod.
I use 4 Tesla K80 gpus (12GB) and train the net with "upsnet_resnet50_coco_4gpu.yaml" but I find that it may take more than 10 days for training.
Have you got some advice for speeding up the training?
Thanks.

About soft copy or deep copy in init_coco.py

Thanks for your good project, I have a question about your code. The code in your project you released pano_json_stff = pano_json.copy() is dict.copy, which is soft copy in subobject , only deep copy in parent object. It seems to make the categories repeated and wrong in panoptic_coco_categories_stff.json. Could you tell me whether the code is correct?

Question about undefined symbol

Below is the error message I got. Not so sure about how to fix it. Could you help me with this? Thanks.

====
UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in
from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
File "upsnet/../upsnet/models/resnet.py", line 21, in
from upsnet.operators.modules.deform_conv import DeformConv
File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in
from upsnet.operators.functions.deform_conv import DeformConvFunction
File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in
from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE

A error about logging

upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 52, in
logger, final_output_path = create_logger(config.output_path, args.cfg, config.dataset.image_set)
File "upsnet/../lib/utils/logging.py", line 38, in create_logger
logging.basicConfig(filename=os.path.join(final_output_path, log_file), format=head)
AttributeError: 'module' object has no attribute 'basicConfig'

Process finished with exit code 0

Is the bounding box regression loss different from Mask RCNN?

UPSNet/upsnet/models/rcnn.py

Line 179 in 96b7b51

 def smooth_l1_loss(self, bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, sigma=1.0): 

You code here just use the coordinates to calculate loss, but original bounding box regression in MaskRCNN uses ground truth like this: tx=(Gx−Px)/Pw, ty=(Gy−Py)/Ph, tw=log(Gw/Pw), th=log(Gh/Ph).

Does it matter?

Gradient w.r.t input is NoneType

I wanted to observe the gradient w.r.t inputs (that is, the input image here).
So, I tried to print data['data'].grad after loss.backward() below.
https://github.com/uber-research/UPSNet/blob/master/upsnet/upsnet_end2end_train.py#L216
But I got an error saying that data['data'].grad is NoneType.
Is there anything I misunderstand or missed or what's the correct way to dump the gradient w.r.t inputs? Any idea would be appreciated. Thanks.

ImportError: cannot import name accumulate

[Python 2.7 / PyTorch 1.0]
sh init,sh
error: ImportError: cannot import name accumulate

RuntimeWarning: invalid value encountered i$ greater_equal

Has anyone encounter this error when running with one GPU?

upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$ greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]

The whole message is:

upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$
greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 426, in
upsnet_train()
File "upsnet/upsnet_end2end_train.py", line 287, in upsnet_train
output = train_model(*batch)
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/models/resnet_upsnet.py", line 151, in forward
rois, _ = self.pyramid_proposal(rpn_cls_prob, rpn_bbox_pred, data['im_info'])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/operators/modules/pyramid_proposal.py", line 58, in forward
bbox_pred[3][[i], :, :, :], bbox_pred[4][[i], :, :, :], torch.from_numpy(im_info[i, :]))
File "upsnet/../upsnet/operators/functions/pyramid_proposal.py", line 168, in forward
keep = nms(np.hstack((proposals, scores)).astype(np.float32))
File "upsnet/../upsnet/nms/nms.py", line 45, in _nms
return gpu_nms(dets, thresh, device_id)
File "gpu_nms.pyx", line 36, in gpu_nms.gpu_nms
IndexError: Out of bounds on buffer access (axis 0)

A question about panoptic gt

UPSNet/upsnet/operators/modules/mask_matching.py

Line 52 in 2ced987

 matched_gt[gt_masks[[i], :, :] != 0] = i + self.num_seg_classes - self.num_inst_classes 

In this line, you just use "gt_masks[[i], :, :] != 0" to judge whether a pixel belongs to the instance. But the picture is padded using 255. I think there should be another rule: "& gt_masks[[i], :, :] != 255". Do you think what I pointed is right?

Another question is the overlap relation. There may be a big table in the picture. If the table is the last instance, then the whole panoptic gt will be covered by the correspond "id". This is not the panoptic gt we want, isn't it?

Looking forwark for your reply.

Code not executing

Hi,

After following all the steps mentioned and setting up the environment as python 3.6; pytorch 0.4.1; the code has many bugs making it difficult to reproduce the results. Are there some other changes that need to be done - or do following these steps directly work? I am trying to train the model on COCO using the single GPU config file.

A question about channel selecting in formula Z_unknown

Thanks for your good project, I have a question about your code in detail. The code in your project you released void_logits = torch.max(fcn_output['fcn_score'][:, (config.dataset.num_classes - 1):, ...], dim=1, keepdim=True)[0] - torch.max(seg_inst_logits, dim=1, keepdim=True)[0] is seems to be not corresponding to the paper formula Z_unknown = max (X_thing) - max (X_mask).
The first max item should be 52:132 channel in fcn_output['fcn_score']
([:, (config.dataset.num_seg_classes-config.dataset.num_classes+1):, ...]) which represents the things ? I am so sorry for my English and I don't know whether the description of question allows you understand what I want to ask?

Bug

UPSNet/upsnet/models/resnet_upsnet.py

Line 285 in 96b7b51

gt_inds = np.where((roidb['gt_classes']) > 0 & (roidb['is_crowd'] == 0))[0]

I think it should be
gt_inds = np.where((roidb['gt_classes'] > 0) & (roidb['is_crowd'] == 0))[0]

But when I modify it like this, another problem occured. I think it's due to the instance that is crowd.

I think in this place, the crowd instance should be added to construct the pan_gt.

About your coco id and labels

I noticed that you make the categories sorted as 0-52:stuff,53-132:things in creating panoptic_coco_categories_stff.json. Why you do this change?

So when you print out the result matrix like this format:

IDX | PQ SQ RQ IoU TP FP FN

Does the IDX 0 mean the result for the first thing class or the first stuff class?
Similarly, how about other metrics like the Mean and per-category AP? The first line represent the first thing class or first stuff class?

I really appreciate your reply.

Question about visualization

(1) After setting vis_mask to true, I got the result below. However, I found all cars are recognized as trains...Wondering if there is something with my training

(2) How do we get the result of panoptic segmentation as below, instead of the one above (like instance segmentation)?

The resnet101 performance

We try to get the performance using resnet101 in your paper. But the final performance is a little worse than the reported result.

I wonder is there any problems in our settings?
Do you make sure the code can get the pq about 46？

Is your mask_loss the same as MaskRCNN

UPSNet/upsnet/models/rcnn.py

Line 173 in ba524d5

def mask_loss(self, input, target, weight):

Sorry but I don't understand why you design your loss as this. Why the mask score doesn't need go through sigmoid function?

Results on ADE20k?

Thanks for your great work.
Find that you have provided codes for ADE20K, but I can find results neither in the paper nor here. Have you tested your model on ADE20K dataset? And will you share the panoptic results with us someday in the future?
Thanks.

KeyError

I got the following error. Wondering if I need to delete /gtInstances.json first. Thanks.

Typo in init_cityscapes.sh?

After downloading cityscapes database, I found their file names have no "Train" in
"cp gtFine///*labelTrainIds.png labels"

So, I removed "Train" as below
"cp gtFine///*labelIds.png labels"

Am I right? Thanks.

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument

Hi, I got this notification when running the command
python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_cityscapes_4gpu.yaml
But the model can still run normally, will this be a problem or may affect the final results?
Thanks

Bug

UPSNet/upsnet/models/resnet_upsnet.py

Line 287 in 9191a59

cls_idx = roidb['gt_classes']

Line 287 should be: cls_idx = roidb['gt_classes'][gt_inds]

where can I download the pretrained model?

where can I download the pretrained model like “resnet-50-caffe.pth”

Is it OK if I use GCC 4.8.5?

why I got the error in deformable_col2im invalid argument?

why I got the error in deformable_col2im invalid argument? Even though it goes wrong, it still tells me loss, is that ok?

what‘’s the performance compared with Mask RCNN on object detection?

Hi, I find that UPSNet mainly add an semantic segmentation head to mask RCNN, I wonder what's the performance of UPSNet on object detection(or instance segmentation). Can I use it to improve my object detection MAP? Sorry but I didn't find this discuss in your paper.

I would be appreciated if you could give me some advice, thanks.

OSError: cannot identify image file './data/coco/annotations/panoptic_train2017_semantic_trainid_stff/000000564031.png'

Hi, I get this problem when running the coco dataset? Why did I get this problem, I use the two annotations download from http://cocodataset.org/#download
2017 Panoptic Train/Val annotations [821MB]
2017 Train/Val annotations [241MB]

It seems that the image exists in tran2017 folder, but do not exists in panoptic_train2017_semantic_trainid_stff folder.

How can I fix this problem? Thanks.

Extension horovod.torch has not been built.

Hello, After I run :
python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml

I encountered that:
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

How should I do? Thanks for answering.

Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 24, in
file, 'mpi_lib_v2')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 44, in
import horovod.torch as hvd
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 27, in
file, 'mpi_lib', '_mpi_lib')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))

init_coco.sh bug

It seems that there is a bug in init_coco.sh.
maybe PYTHONPATH=$(pwd)/lib/dataset_devkit/panopticapi:$PYTHONPATH should change to
PYTHONPATH=$(pwd)/lib/dataset_devkit:$PYTHONPATH

OSError: [Errno 12] Cannot allocate memory

I got this error: OSError: [Errno 12] Cannot allocate memory as below.
I am wondering if this is because my current GPU memory is too small, which is 13 GB now.

No module named BatchNorm2d in "upsnet.operators.modules.distbatchnorm" ?

Have not see any module named BatchNorm2d in upsnet.operators.modules.distbatchnorm.

KeyError: 'color'

When I test on COCO dataset, The following error occurred：

File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 330, in evaluate_panoptic
gt_pans, gt_json, categories, color_gererator = get_gt()
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 247, in get_gt
color_gererator = IdGenerator(categories)
File "/UPSNet-master/upsnet/../lib/dataset_devkit/panopticapi/utils.py", line 40, in init
self.taken_colors.add(tuple(category['color']))
KeyError: 'color'

I think my annotation ''panoptic_val2017.json'' has some problems. The categories don't have color. I download the annotation in http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip
Is there a problem with the annotation file I downloaded, or am I doing something wrong?
If the the annotation have some problems, could you please offer me a correct download link?

How to track the change of learning rate?

Sorry but I am a little bit confused by the learning rate.

UPSNet/upsnet/upsnet_end2end_train.py

Line 119 in 96b7b51

 optimizer = SGD(params_lr, lr=1, momentum=config.train.momentum, weight_decay=config.train.wd) 

UPSNet/upsnet/upsnet_end2end_train.py

Line 200 in 96b7b51

lr = adjust_learning_rate(optimizer, curr_iter, config)

What is the relation of these two learning rates? And how can I get the real leaning rate? Since if I print the optimizer.param_groups[0]["lr"], I always get 1.

ModuleNotFoundError: No module named 'upsnet.bbox.bbox'

Did I miss any step to get this error?

~/UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  exp_config = edict(yaml.load(f))
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 60, in <module>
    from upsnet.dataset import *
  File "upsnet/../upsnet/dataset/__init__.py", line 1, in <module>
    from .cityscapes import Cityscapes
  File "upsnet/../upsnet/dataset/cityscapes.py", line 32, in <module>
    from upsnet.dataset.json_dataset import JsonDataset, extend_with_flipped_entries, filter_for_training, add_bbox_regression_targets
  File "upsnet/../upsnet/dataset/json_dataset.py", line 53, in <module>
    import upsnet.bbox.bbox_transform as box_utils
  File "upsnet/../upsnet/bbox/bbox_transform.py", line 15, in <module>
    from .bbox import bbox_overlaps as bbox_overlaps_cython
ModuleNotFoundError: No module named 'upsnet.bbox.bbox'

This is what I got in the route: upsnet/bbox

~/UPSNet_ROOT/upsnet/bbox$ ls
bbox.c                                bbox.pyx            bbox_transform.py  __init__.py  sample_rois.py
bbox.cpython-37m-x86_64-linux-gnu.so  bbox_regression.py  build              __pycache__  setup.py

Thanks.

How to reproduce the result reported in the paper?

I have trained the network with the configuration file of upsnet_resnet50_cityscapes_4gpu.yaml and 4 2080Ti gpus. The configuration file is not modified at all.

And after testing the model with test_iteration of 48000, the result shows that mIoU is 75.054%, AP_box is 38.1%, AP_mask is 32.4%, PQ:SQ:RQ is 58.7% : 79.6% : 72.4%. But the result reported in the papr is 75.2%, 39.1%, 33.3%, 59.3% : 79.7% : 73% separately.

Is there any way to achieve the performance reported in the paper?

Question about visualization when trying to test COCO dataset(train2014)

Thanks for your great work!
I had trained the model with COCO train2017&val2017, and pulled the latest code, but when I trying to test COCO dataset (train2014) with setting the vis_mask=True, I still only got an instance segmentation like figure.
Is there anything that needs special attention?

Why your bounding box regression loss is so small?

I found that the bounding box regression loss at first of your code is really small, like of ~0.1 scale. Could you give me some explanation?

will you further apply it to resnet101?

Is deformable conv used in the setting of r50 coco？

binary_op(): expected both inputs to be on same device

Hi, sorry to interrupt you. I use your code which with 4 gpus, but I use the 4,5,6,7 gpu on my machine. However, when I try to resume my model, there is an error says:

Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 418, in <module>
    upsnet_train()
  File "upsnet/upsnet_end2end_train.py", line 300, in upsnet_train
    optimizer.step(lr)
  File "upsnet/../lib/nn/optimizer.py", line 98, in step
    buf.mul_(momentum).add_(group['lr'] * lr, d_p)
RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:0 and input b is on cuda:4

Can you tell me what to do? I really appreciate your help.

ZeroDivisionError: division by zero

2019-04-10 16:16:39,900 | upsnet_end2end_test.py | line 303: unified pano result:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_test.py", line 312, in
upsnet_test()
File "upsnet/upsnet_end2end_test.py", line 304, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "upsnet/../upsnet/dataset/base_dataset.py", line 328, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "upsnet/../upsnet/dataset/base_dataset.py", line 296, in pq_compute
results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero

Hi, when I evaluated the trained model, I found this error.

Cannot find reference 'deform_conv_cuda'

Thanks for your great work.
I followed the installation steps and the installation was successful,but when i begin to train the model ,it tells me "Cannot find reference 'deform_conv_cuda'",but the file is already exists，i don't how to fix it. Could you help me with this? Thanks.

RuntimeWarning: invalid value encountered in sqrt

Hi,

I have ran to this error while I was running (upsnet_resnet50_coco_1gpu.yaml is just a number of gpu change based on 4gpu.yaml

python -u upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml

running env: pytorch 1.0
version: 96b7b5172b7b76446f637f4922b7a2054e46703b and this PR change (#35)

222 2019-05-11 23:08:29,257 | callback.py | line 40 : Batch [1560]  Speed: 2.28 samples/sec Train-rpn_cls_loss=0.202601,    rpn_bbox_loss=0.119188, rcnn_accuracy=0.918098, cls_loss=0.512433,      bbox_loss=0.178919,     mask_loss=0.637334,     fcn_loss=3.486774,      fcn_roi_loss=4.059261,  panoptic_accuracy=0.267040,     panoptic_loss=2.879008,
223 2019-05-11 23:08:38,512 | callback.py | line 40 : Batch [1580]  Speed: 2.16 samples/sec Train-rpn_cls_loss=0.204118,    rpn_bbox_loss=0.121994, rcnn_accuracy=0.918320, cls_loss=0.511436,      bbox_loss=0.178261,     mask_loss=0.637755,     fcn_loss=3.488841,      fcn_roi_loss=4.060041,  panoptic_accuracy=0.265982,     panoptic_loss=2.881260,
224 2019-05-11 23:08:47,145 | callback.py | line 40 : Batch [1600]  Speed: 2.32 samples/sec Train-rpn_cls_loss=0.208637,    rpn_bbox_loss=0.125312, rcnn_accuracy=0.918627, cls_loss=0.510648,      bbox_loss=0.177416,     mask_loss=0.637829,     fcn_loss=3.493248,      fcn_roi_loss=4.063709,  panoptic_accuracy=0.264609,     panoptic_loss=2.885051,
225 2019-05-11 23:08:55,470 | callback.py | line 40 : Batch [1620]  Speed: 2.40 samples/sec Train-rpn_cls_loss=0.210174,    rpn_bbox_loss=0.125260, rcnn_accuracy=0.918797, cls_loss=0.510753,      bbox_loss=0.176945,     mask_loss=0.637923,     fcn_loss=3.496151,      fcn_roi_loss=4.067450,  panoptic_accuracy=0.264234,     panoptic_loss=2.887030,
226 upsnet/../upsnet/operators/modules/fpn_roi_align.py:38: RuntimeWarning: invalid value encountered in sqrt
227   feat_id = np.clip(np.floor(2 + np.log2(np.sqrt(w * h) / 224 + 1e-6)), 0, 3)
228 Traceback (most recent call last):
229   File "upsnet/upsnet_end2end_train.py", line 403, in <module>
230     upsnet_train()
231   File "upsnet/upsnet_end2end_train.py", line 269, in upsnet_train
232     output = train_model(*batch)
233   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
234     result = self.forward(*input, **kwargs)
235   File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
236     return self.module(*inputs[0], **kwargs[0])
237   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
238     result = self.forward(*input, **kwargs)
239   File "upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
240     cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
241   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
242     result = self.forward(*input, **kwargs)
243   File "upsnet/../upsnet/models/rcnn.py", line 190, in forward
244     cls_loss = self.cls_loss(cls_score, cls_label)
245   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
246     result = self.forward(*input, **kwargs)
247   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 942, in forward
248     ignore_index=self.ignore_index, reduction=self.reduction)
249   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
250     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
251   File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
252     .format(input.size(0), target.size(0)))
253 ValueError: Expected input batch_size (510) to match target batch_size (512).

no module named upsnet.operators.modules.distbatchnorm

from upsnet.operators.modules.distbatchnorm import BatchNorm2d

but no distbatchnorm file

uber-research / upsnet Goto Github PK

upsnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs