Hi, authors, Thanks a lot for your awesome work. I

May I ask have you compiled the pointops in <code cla

Can you run successfully now? If you use one GPU, remember to add <code class="notrans

I have solved the bug of one gpu by modifing <a href="https://github.com/dvlab-researc

I have solved the bug of one gpu by modifing <a href="https://github.com/

I have solved the bug of one gpu by modifing <a href="https:

CUDA error: device-side assert triggered about stratified-transformer HOT 11 CLOSED

dvlab-research commented on June 14, 2024

CUDA error: device-side assert triggered

from stratified-transformer.

Comments (11)

X-Lai commented on June 14, 2024

Thank you for your interest in our work. May I ask have you set the right data path and the right training GPUs?

from stratified-transformer.

wencc-ucas commented on June 14, 2024

Thank you for your prompt response. I have modified the data path and GPUs.

The error above is when I specify only one GPU, when I use multiple GPUs the error is the following:

Traceback (most recent call last):
File "train.py", line 547, in
main()
File "train.py", line 84, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/mmvc/Congcong/Stratified-Transformer/train.py", line 308, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler)
File "/home/mmvc/Congcong/Stratified-Transformer/train.py", line 380, in train
output = model(feat, coord, offset, batch, neighbor_idx)
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mmvc/Congcong/Stratified-Transformer/model/stratified_transformer.py", line 449, in forward
feats, xyz, offset, feats_down, xyz_down, offset_down = layer(feats, xyz, offset)
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mmvc/Congcong/Stratified-Transformer/model/stratified_transformer.py", line 291, in forward
new_window_size = 2 * torch.tensor([self.window_size]*3).type_as(xyz).to(xyz.device)
RuntimeError: CUDA error: invalid device function

from stratified-transformer.

X-Lai commented on June 14, 2024

May I ask have you compiled the pointops in /lib? And can you locate which line causes this error?

from stratified-transformer.

wencc-ucas commented on June 14, 2024

I only compiled pointops2 according to your instruction.

For one gpu,
File "/***/Stratified-Transformer/model/stratified_transformer.py", line 357, in forward
feats = self.kpconv(xyz, xyz, neighbor_idx, feats)

For multi gpu,
File "/***/Stratified-Transformer/model/stratified_transformer.py", line 291, in forward
new_window_size = 2 * torch.tensor([self.window_size]*3).type_as(xyz).to(xyz.device)

from stratified-transformer.

X-Lai commented on June 14, 2024

The error may be caused by the kpconv provided by torch-points3d. I wonder whether you successfully install it? Can you double check that torch-points3d can work smoothly?

from stratified-transformer.

wencc-ucas commented on June 14, 2024

Thanks. I have checked it. But when I use multi gpus, this error has not appeared. Kpconv provided by torch-points3d can work well.

from stratified-transformer.

X-Lai commented on June 14, 2024

Can you run successfully now? If you use one GPU, remember to add CUDA_VISIBLE_DEVICES=0 before your python command.

from stratified-transformer.

wencc-ucas commented on June 14, 2024

I have solved the bug of one gpu by modifing this.

But now I still have the error of line 291 both using one gpu and multi gpus.

from stratified-transformer.

YangParky commented on June 14, 2024

I have solved the bug of one gpu by modifing this.

But now I still have the error of line 291 both using one gpu and multi gpus.

Hi, How do you modify it?

from stratified-transformer.

basil-hayden commented on June 14, 2024

I have solved the bug of one gpu by modifing this.
But now I still have the error of line 291 both using one gpu and multi gpus.

Hi, How do you modify it?

model = torch.nn.DataParallel(model.cuda()) ----> model = model.cuda()

from stratified-transformer.

praj441 commented on June 14, 2024

I have solved the bug of one gpu by modifing this.
But now I still have the error of line 291 both using one gpu and multi gpus.

Hi, How do you modify it?

model = torch.nn.DataParallel(model.cuda()) ----> model = model.cuda()

But, that way we can't use multi-GPU training. I am also getting this error when using model = torch.nn.DataParallel(model.cuda()).

any suggestions?

from stratified-transformer.

CUDA error: device-side assert triggered about stratified-transformer HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs