GithubHelp home page GithubHelp logo

Comments (5)

traveller59 avatar traveller59 commented on July 21, 2024 1

This problem I have no idea, I will try to create a docker for this project to provide a reproducible environment for errors.
Multi GPU: currently not supported. The major reason is I only have one GPU. If you want to use multi gpu training, you need to pad the input (or just not slice array in point_to_voxel), then slice points inside module.

from second.pytorch.

zwyzwy avatar zwyzwy commented on July 21, 2024

when I training the model, the middle evaluation occurred the error below:

Traceback (most recent call last):
File "vox_gluon/train_gluon.py", line 759, in
fire.Fire()
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "vox_gluon/train_gluon.py", line 504, in train
raise e
File "vox_gluon/train_gluon.py", line 486, in train
result = get_official_eval_result(gt_annos[:len(gt_annos)-1], dt_annos, class_names)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 824, in get_official_eval_result
mAPbbox, mAPbev, mAP3d, mAPaos = do_eval_v2(gt_annos, dt_annos, current_classes, min_overlaps, compute_aos, difficultys)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 701, in do_eval_v2
ret = eval_class_v3(gt_annos, dt_annos, current_classes, difficultys, 1, min_overlaps)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 574, in eval_class_v3
rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 384, in calculate_iou_partly
overlap_part = bev_box_overlap(gt_boxes, dt_boxes).astype(np.float64)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 126, in bev_box_overlap
riou = rotate_iou_gpu_eval(boxes, qboxes, criterion)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/core/non_max_suppression/nms_gpu.py", line 652, in rotate_iou_gpu_eval
N, K, boxes_dev, query_boxes_dev, iou_dev, criterion)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/compiler.py", line 484, in call
sharedmem=self.sharedmem)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/compiler.py", line 558, in _kernel_call
cu_func(*kernelargs)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 1301, in call
self.sharedmem, streamhandle, args)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 1345, in launch_kernel
None)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 288, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 323, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [400] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_HANDLE

By the way, have you done the training by multi GPUs ?

from second.pytorch.

zwyzwy avatar zwyzwy commented on July 21, 2024

what do you mean in "slice array in point_to_voxel" and "slice points inside module"?
as the code shows that you put all the points in one single batch together, how can I recognize how many points in a sample and others ?

from second.pytorch.

traveller59 avatar traveller59 commented on July 21, 2024

The number of voxels converted from points is not fixed, you can see a slice operation in point_to_voxel . For multi-gpu, you need to return voxel_num in point_to_voxel, use fixed-size input before nn.DataParallel, passvoxel_num as a Tensor and gather all valid voxels inside nn.Module in nn.DataParallel.

from second.pytorch.

jiangzhengkai avatar jiangzhengkai commented on July 21, 2024

@zwyzwy have you any solution?

from second.pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.