evaluation cuda problems about second.pytorch HOT 5 OPEN

traveller59 commented on July 21, 2024

evaluation cuda problems

from second.pytorch.

Comments (5)

traveller59 commented on July 21, 2024 1

This problem I have no idea, I will try to create a docker for this project to provide a reproducible environment for errors.
Multi GPU: currently not supported. The major reason is I only have one GPU. If you want to use multi gpu training, you need to pad the input (or just not slice array in point_to_voxel), then slice points inside module.

from second.pytorch.

zwyzwy commented on July 21, 2024

when I training the model, the middle evaluation occurred the error below:

Traceback (most recent call last):
File "vox_gluon/train_gluon.py", line 759, in
fire.Fire()
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "vox_gluon/train_gluon.py", line 504, in train
raise e
File "vox_gluon/train_gluon.py", line 486, in train
result = get_official_eval_result(gt_annos[:len(gt_annos)-1], dt_annos, class_names)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 824, in get_official_eval_result
mAPbbox, mAPbev, mAP3d, mAPaos = do_eval_v2(gt_annos, dt_annos, current_classes, min_overlaps, compute_aos, difficultys)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 701, in do_eval_v2
ret = eval_class_v3(gt_annos, dt_annos, current_classes, difficultys, 1, min_overlaps)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 574, in eval_class_v3
rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 384, in calculate_iou_partly
overlap_part = bev_box_overlap(gt_boxes, dt_boxes).astype(np.float64)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 126, in bev_box_overlap
riou = rotate_iou_gpu_eval(boxes, qboxes, criterion)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/core/non_max_suppression/nms_gpu.py", line 652, in rotate_iou_gpu_eval
N, K, boxes_dev, query_boxes_dev, iou_dev, criterion)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/compiler.py", line 484, in call
sharedmem=self.sharedmem)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/compiler.py", line 558, in _kernel_call
cu_func(*kernelargs)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 1301, in call
self.sharedmem, streamhandle, args)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 1345, in launch_kernel
None)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 288, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 323, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [400] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_HANDLE

By the way, have you done the training by multi GPUs ?

from second.pytorch.

zwyzwy commented on July 21, 2024

what do you mean in "slice array in point_to_voxel" and "slice points inside module"?
as the code shows that you put all the points in one single batch together, how can I recognize how many points in a sample and others ?

from second.pytorch.

traveller59 commented on July 21, 2024

The number of voxels converted from points is not fixed, you can see a slice operation in point_to_voxel . For multi-gpu, you need to return voxel_num in point_to_voxel, use fixed-size input before nn.DataParallel, passvoxel_num as a Tensor and gather all valid voxels inside nn.Module in nn.DataParallel.

from second.pytorch.

jiangzhengkai commented on July 21, 2024

@zwyzwy have you any solution?

from second.pytorch.

evaluation cuda problems about second.pytorch HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs