uber-research / upsnet Goto Github PK
View Code? Open in Web Editor NEWUPSNet: A Unified Panoptic Segmentation Network
License: Other
UPSNet: A Unified Panoptic Segmentation Network
License: Other
When doing an evaluation on the test set, I got the following error. I don't have any clue about it.
2019-04-18 03:56:17,288 | upsnet_end2end_test.py | line 307: unified pano result:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_test.py", line 316, in <module>
upsnet_test()
File "upsnet/upsnet_end2end_test.py", line 308, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "upsnet/../upsnet/dataset/base_dataset.py", line 333, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero
But, the panoptic segmentation results can be successfully generated like below.
So, how is that possible that I run into the case n = 0? Any idea would be appreciated...Thanks.
When I training the model on COCO dataset, it can run normally at the beginning, and the loss is also decrease. But the following problems will occur in the middle:
Traceback (most recent call last):
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 494, in Client
deliver_challenge(c, authkey)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge
response = connection.recv_bytes(256) # reject large message
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 394, in
upsnet_train()
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/upsnet_end2end_train.py", line 193, in upsnet_train
output = train_model(data, label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/xxl/4TB_disk/work/Panoptic_segmentation/UPSNet-fixed/upsnet/../upsnet/models/rcnn.py", line 190, in forward
cls_loss = self.cls_loss(cls_score, cls_label)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 862, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/xxl/anaconda3/envs/py36_th04/lib/python3.6/site-packages/torch/nn/functional.py", line 1405, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (511) to match target batch_size (512).
Could you please help me solve the problems.
No matter I run training or testing script, the metrics all I see are AP or AR. Wondering if we should we calculate PQ values ourselves. Or there is anything I missed? Thanks.
Can I use one GPU with 12G memory to train? Where does the code need to change?
Thank you very much!
Thank you for modifying the problem of ‘’panoptic_val2017_stff.json‘’. When I use the fixed json file, there is such an error: ZeroDivisionError: division by zero. I think there may be a problem in '/UPSNet-master/upsnet/dataset/base_dataset.py'' line 222. And I replaced ‘’files = [item['file_name'] for item in pan_gt_json['images']]‘’ with ‘’files = [item['file_name'].replace('jpg', 'png') for item in pan_gt_json['images']]‘’, it works. But another problem has arisen:
Traceback (most recent call last):
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 337, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 301, in pq_compute
pq_stat += p.get()
File "/home/xxl/anaconda3/envs/xxl_36/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: operands could not be broadcast together with shapes (427,640) (426,640)
Is it where I am doing something wrong? Could you please help me solve the problem?
Error message:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 23, in
from upsnet.models.fpn import FPN
File "upsnet/../upsnet/models/fpn.py", line 23, in
from upsnet.operators.modules.distbatchnorm import BatchNorm2d
ModuleNotFoundError: No module named 'upsnet.operators.modules.distbatchnorm'
where do you use this function to generate ground truth panoptic logits?
BTW, is this possible to use COCO panoptic dataset annotations to generate them?
Hello, I am trying to reproduce the results without horovod.
I use 4 Tesla K80 gpus (12GB) and train the net with "upsnet_resnet50_coco_4gpu.yaml" but I find that it may take more than 10 days for training.
Have you got some advice for speeding up the training?
Thanks.
Thanks for your good project, I have a question about your code. The code in your project you released pano_json_stff = pano_json.copy()
is dict.copy, which is soft copy in subobject , only deep copy in parent object. It seems to make the categories repeated and wrong in panoptic_coco_categories_stff.json
. Could you tell me whether the code is correct?
Below is the error message I got. Not so sure about how to fix it. Could you help me with this? Thanks.
====
UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in
from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
File "upsnet/../upsnet/models/resnet.py", line 21, in
from upsnet.operators.modules.deform_conv import DeformConv
File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in
from upsnet.operators.functions.deform_conv import DeformConvFunction
File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in
from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 52, in
logger, final_output_path = create_logger(config.output_path, args.cfg, config.dataset.image_set)
File "upsnet/../lib/utils/logging.py", line 38, in create_logger
logging.basicConfig(filename=os.path.join(final_output_path, log_file), format=head)
AttributeError: 'module' object has no attribute 'basicConfig'
Process finished with exit code 0
Line 179 in 96b7b51
Does it matter?
I wanted to observe the gradient w.r.t inputs (that is, the input image here).
So, I tried to print data['data'].grad
after loss.backward()
below.
https://github.com/uber-research/UPSNet/blob/master/upsnet/upsnet_end2end_train.py#L216
But I got an error saying that data['data'].grad
is NoneType.
Is there anything I misunderstand or missed or what's the correct way to dump the gradient w.r.t inputs? Any idea would be appreciated. Thanks.
[Python 2.7 / PyTorch 1.0]
sh init,sh
error: ImportError: cannot import name accumulate
Has anyone encounter this error when running with one GPU?
upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$ greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
The whole message is:
upsnet/../upsnet/operators/functions/pyramid_proposal.py:229: RuntimeWarning: invalid value encountered i$
greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 426, in
upsnet_train()
File "upsnet/upsnet_end2end_train.py", line 287, in upsnet_train
output = train_model(*batch)
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/models/resnet_upsnet.py", line 151, in forward
rois, _ = self.pyramid_proposal(rpn_cls_prob, rpn_bbox_pred, data['im_info'])
File "/home/hxt189898/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in $
call_
result = self.forward(*input, **kwargs)
File "upsnet/../upsnet/operators/modules/pyramid_proposal.py", line 58, in forward
bbox_pred[3][[i], :, :, :], bbox_pred[4][[i], :, :, :], torch.from_numpy(im_info[i, :]))
File "upsnet/../upsnet/operators/functions/pyramid_proposal.py", line 168, in forward
keep = nms(np.hstack((proposals, scores)).astype(np.float32))
File "upsnet/../upsnet/nms/nms.py", line 45, in _nms
return gpu_nms(dets, thresh, device_id)
File "gpu_nms.pyx", line 36, in gpu_nms.gpu_nms
IndexError: Out of bounds on buffer access (axis 0)
Another question is the overlap relation. There may be a big table in the picture. If the table is the last instance, then the whole panoptic gt will be covered by the correspond "id". This is not the panoptic gt we want, isn't it?
Looking forwark for your reply.
Hi,
After following all the steps mentioned and setting up the environment as python 3.6; pytorch 0.4.1; the code has many bugs making it difficult to reproduce the results. Are there some other changes that need to be done - or do following these steps directly work? I am trying to train the model on COCO using the single GPU config file.
Thanks for your good project, I have a question about your code in detail. The code in your project you released void_logits = torch.max(fcn_output['fcn_score'][:, (config.dataset.num_classes - 1):, ...], dim=1, keepdim=True)[0] - torch.max(seg_inst_logits, dim=1, keepdim=True)[0]
is seems to be not corresponding to the paper formula Z_unknown = max (X_thing) - max (X_mask)
.
The first max item should be 52:132 channel in fcn_output['fcn_score']
([:, (config.dataset.num_seg_classes-config.dataset.num_classes+1):, ...])
which represents the things ? I am so sorry for my English and I don't know whether the description of question allows you understand what I want to ask?
UPSNet/upsnet/models/resnet_upsnet.py
Line 285 in 96b7b51
I think it should be
gt_inds = np.where((roidb['gt_classes'] > 0) & (roidb['is_crowd'] == 0))[0]
But when I modify it like this, another problem occured. I think it's due to the instance that is crowd.
I think in this place, the crowd instance should be added to construct the pan_gt.
I noticed that you make the categories sorted as 0-52:stuff,53-132:things
in creating panoptic_coco_categories_stff.json
. Why you do this change?
So when you print out the result matrix like this format:
IDX | PQ SQ RQ IoU TP FP FN
Does the IDX 0
mean the result for the first thing class or the first stuff class?
Similarly, how about other metrics like the Mean and per-category AP
? The first line represent the first thing class or first stuff class?
I really appreciate your reply.
We try to get the performance using resnet101 in your paper. But the final performance is a little worse than the reported result.
I wonder is there any problems in our settings?
Do you make sure the code can get the pq about 46?
Line 173 in ba524d5
Thanks for your great work.
Find that you have provided codes for ADE20K, but I can find results neither in the paper nor here. Have you tested your model on ADE20K dataset? And will you share the panoptic results with us someday in the future?
Thanks.
After downloading cityscapes database, I found their file names have no "Train" in
"cp gtFine///*labelTrainIds.png labels"
So, I removed "Train" as below
"cp gtFine///*labelIds.png labels"
Am I right? Thanks.
Hi, I got this notification when running the command
python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_cityscapes_4gpu.yaml
But the model can still run normally, will this be a problem or may affect the final results?
Thanks
UPSNet/upsnet/models/resnet_upsnet.py
Line 287 in 9191a59
Line 287 should be: cls_idx = roidb['gt_classes'][gt_inds]
where can I download the pretrained model like “resnet-50-caffe.pth”
why I got the error in deformable_col2im invalid argument? Even though it goes wrong, it still tells me loss, is that ok?
Hi, I find that UPSNet mainly add an semantic segmentation head to mask RCNN, I wonder what's the performance of UPSNet on object detection(or instance segmentation). Can I use it to improve my object detection MAP? Sorry but I didn't find this discuss in your paper.
I would be appreciated if you could give me some advice, thanks.
Hi, I get this problem when running the coco dataset? Why did I get this problem, I use the two annotations download from http://cocodataset.org/#download
2017 Panoptic Train/Val annotations [821MB]
2017 Train/Val annotations [241MB]
It seems that the image exists in tran2017 folder, but do not exists in panoptic_train2017_semantic_trainid_stff folder.
How can I fix this problem? Thanks.
Hello, After I run :
python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
I encountered that:
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
How should I do? Thanks for answering.
Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 24, in
file, 'mpi_lib_v2')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 44, in
import horovod.torch as hvd
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/torch/init.py", line 27, in
file, 'mpi_lib', '_mpi_lib')
File "/home/anaconda3/envs/pytorch1.0_python3.5/lib/python3.5/site-packages/horovod/common/init.py", line 48, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
It seems that there is a bug in init_coco.sh.
maybe PYTHONPATH=$(pwd)/lib/dataset_devkit/panopticapi:$PYTHONPATH
should change to
PYTHONPATH=$(pwd)/lib/dataset_devkit:$PYTHONPATH
Have not see any module named BatchNorm2d in upsnet.operators.modules.distbatchnorm.
When I test on COCO dataset, The following error occurred:
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 313, in
upsnet_test()
File "/UPSNet-master/upsnet/upsnet_end2end_test.py", line 185, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(results['all_ssegs'], results['all_panos'], results['all_pano_cls_inds'], stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 330, in evaluate_panoptic
gt_pans, gt_json, categories, color_gererator = get_gt()
File "/UPSNet-master/upsnet/../upsnet/dataset/base_dataset.py", line 247, in get_gt
color_gererator = IdGenerator(categories)
File "/UPSNet-master/upsnet/../lib/dataset_devkit/panopticapi/utils.py", line 40, in init
self.taken_colors.add(tuple(category['color']))
KeyError: 'color'
I think my annotation ''panoptic_val2017.json'' has some problems. The categories don't have color. I download the annotation in http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip
Is there a problem with the annotation file I downloaded, or am I doing something wrong?
If the the annotation have some problems, could you please offer me a correct download link?
Sorry but I am a little bit confused by the learning rate.
UPSNet/upsnet/upsnet_end2end_train.py
Line 119 in 96b7b51
UPSNet/upsnet/upsnet_end2end_train.py
Line 200 in 96b7b51
What is the relation of these two learning rates? And how can I get the real leaning rate? Since if I print the optimizer.param_groups[0]["lr"]
, I always get 1.
Did I miss any step to get this error?
~/UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 60, in <module>
from upsnet.dataset import *
File "upsnet/../upsnet/dataset/__init__.py", line 1, in <module>
from .cityscapes import Cityscapes
File "upsnet/../upsnet/dataset/cityscapes.py", line 32, in <module>
from upsnet.dataset.json_dataset import JsonDataset, extend_with_flipped_entries, filter_for_training, add_bbox_regression_targets
File "upsnet/../upsnet/dataset/json_dataset.py", line 53, in <module>
import upsnet.bbox.bbox_transform as box_utils
File "upsnet/../upsnet/bbox/bbox_transform.py", line 15, in <module>
from .bbox import bbox_overlaps as bbox_overlaps_cython
ModuleNotFoundError: No module named 'upsnet.bbox.bbox'
This is what I got in the route: upsnet/bbox
~/UPSNet_ROOT/upsnet/bbox$ ls
bbox.c bbox.pyx bbox_transform.py __init__.py sample_rois.py
bbox.cpython-37m-x86_64-linux-gnu.so bbox_regression.py build __pycache__ setup.py
Thanks.
I have trained the network with the configuration file of upsnet_resnet50_cityscapes_4gpu.yaml and 4 2080Ti gpus. The configuration file is not modified at all.
And after testing the model with test_iteration of 48000, the result shows that mIoU is 75.054%, AP_box is 38.1%, AP_mask is 32.4%, PQ:SQ:RQ is 58.7% : 79.6% : 72.4%. But the result reported in the papr is 75.2%, 39.1%, 33.3%, 59.3% : 79.7% : 73% separately.
Is there any way to achieve the performance reported in the paper?
I found that the bounding box regression loss at first of your code is really small, like of ~0.1 scale. Could you give me some explanation?
Hi, sorry to interrupt you. I use your code which with 4 gpus, but I use the 4,5,6,7 gpu on my machine. However, when I try to resume my model, there is an error says:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 418, in <module>
upsnet_train()
File "upsnet/upsnet_end2end_train.py", line 300, in upsnet_train
optimizer.step(lr)
File "upsnet/../lib/nn/optimizer.py", line 98, in step
buf.mul_(momentum).add_(group['lr'] * lr, d_p)
RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:0 and input b is on cuda:4
Can you tell me what to do? I really appreciate your help.
2019-04-10 16:16:39,900 | upsnet_end2end_test.py | line 303: unified pano result:
Traceback (most recent call last):
File "upsnet/upsnet_end2end_test.py", line 312, in
upsnet_test()
File "upsnet/upsnet_end2end_test.py", line 304, in upsnet_test
test_dataset.evaluate_panoptic(test_dataset.get_unified_pan_result(all_ssegs, all_panos, all_pano_cls_inds, stuff_area_limit=config.test.panoptic_stuff_area_limit), os.path.join(final_output_path, 'results', 'pans_unified'))
File "upsnet/../upsnet/dataset/base_dataset.py", line 328, in evaluate_panoptic
results = pq_compute(gt_json, pred_json, gt_pans, pred_pans, categories)
File "upsnet/../upsnet/dataset/base_dataset.py", line 296, in pq_compute
results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
File "upsnet/../upsnet/dataset/base_dataset.py", line 97, in pq_average
return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero
Hi, when I evaluated the trained model, I found this error.
Thanks for your great work.
I followed the installation steps and the installation was successful,but when i begin to train the model ,it tells me "Cannot find reference 'deform_conv_cuda'",but the file is already exists,i don't how to fix it. Could you help me with this? Thanks.
Hi,
I have ran to this error while I was running (upsnet_resnet50_coco_1gpu.yaml
is just a number of gpu change based on 4gpu.yaml
python -u upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml
96b7b5172b7b76446f637f4922b7a2054e46703b
and this PR change (#35)222 2019-05-11 23:08:29,257 | callback.py | line 40 : Batch [1560] Speed: 2.28 samples/sec Train-rpn_cls_loss=0.202601, rpn_bbox_loss=0.119188, rcnn_accuracy=0.918098, cls_loss=0.512433, bbox_loss=0.178919, mask_loss=0.637334, fcn_loss=3.486774, fcn_roi_loss=4.059261, panoptic_accuracy=0.267040, panoptic_loss=2.879008,
223 2019-05-11 23:08:38,512 | callback.py | line 40 : Batch [1580] Speed: 2.16 samples/sec Train-rpn_cls_loss=0.204118, rpn_bbox_loss=0.121994, rcnn_accuracy=0.918320, cls_loss=0.511436, bbox_loss=0.178261, mask_loss=0.637755, fcn_loss=3.488841, fcn_roi_loss=4.060041, panoptic_accuracy=0.265982, panoptic_loss=2.881260,
224 2019-05-11 23:08:47,145 | callback.py | line 40 : Batch [1600] Speed: 2.32 samples/sec Train-rpn_cls_loss=0.208637, rpn_bbox_loss=0.125312, rcnn_accuracy=0.918627, cls_loss=0.510648, bbox_loss=0.177416, mask_loss=0.637829, fcn_loss=3.493248, fcn_roi_loss=4.063709, panoptic_accuracy=0.264609, panoptic_loss=2.885051,
225 2019-05-11 23:08:55,470 | callback.py | line 40 : Batch [1620] Speed: 2.40 samples/sec Train-rpn_cls_loss=0.210174, rpn_bbox_loss=0.125260, rcnn_accuracy=0.918797, cls_loss=0.510753, bbox_loss=0.176945, mask_loss=0.637923, fcn_loss=3.496151, fcn_roi_loss=4.067450, panoptic_accuracy=0.264234, panoptic_loss=2.887030,
226 upsnet/../upsnet/operators/modules/fpn_roi_align.py:38: RuntimeWarning: invalid value encountered in sqrt
227 feat_id = np.clip(np.floor(2 + np.log2(np.sqrt(w * h) / 224 + 1e-6)), 0, 3)
228 Traceback (most recent call last):
229 File "upsnet/upsnet_end2end_train.py", line 403, in <module>
230 upsnet_train()
231 File "upsnet/upsnet_end2end_train.py", line 269, in upsnet_train
232 output = train_model(*batch)
233 File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
234 result = self.forward(*input, **kwargs)
235 File "upsnet/../lib/utils/data_parallel.py", line 110, in forward
236 return self.module(*inputs[0], **kwargs[0])
237 File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
238 result = self.forward(*input, **kwargs)
239 File "upsnet/../upsnet/models/resnet_upsnet.py", line 139, in forward
240 cls_label, bbox_target, bbox_inside_weight, bbox_outside_weight, mask_target)
241 File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
242 result = self.forward(*input, **kwargs)
243 File "upsnet/../upsnet/models/rcnn.py", line 190, in forward
244 cls_loss = self.cls_loss(cls_score, cls_label)
245 File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
246 result = self.forward(*input, **kwargs)
247 File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 942, in forward
248 ignore_index=self.ignore_index, reduction=self.reduction)
249 File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
250 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
251 File "/opt/xxx_workspace/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
252 .format(input.size(0), target.size(0)))
253 ValueError: Expected input batch_size (510) to match target batch_size (512).
from upsnet.operators.modules.distbatchnorm import BatchNorm2d
but no distbatchnorm file
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.