GithubHelp home page GithubHelp logo

Comments (11)

X-Lai avatar X-Lai commented on June 14, 2024

Thank you for your interest in our work. May I ask have you set the right data path and the right training GPUs?

from stratified-transformer.

wencc-ucas avatar wencc-ucas commented on June 14, 2024

Thank you for your prompt response. I have modified the data path and GPUs.

The error above is when I specify only one GPU, when I use multiple GPUs the error is the following:

Traceback (most recent call last):
File "train.py", line 547, in
main()
File "train.py", line 84, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/mmvc/Congcong/Stratified-Transformer/train.py", line 308, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler)
File "/home/mmvc/Congcong/Stratified-Transformer/train.py", line 380, in train
output = model(feat, coord, offset, batch, neighbor_idx)
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mmvc/Congcong/Stratified-Transformer/model/stratified_transformer.py", line 449, in forward
feats, xyz, offset, feats_down, xyz_down, offset_down = layer(feats, xyz, offset)
File "/home/mmvc/anaconda3/envs/pytorch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mmvc/Congcong/Stratified-Transformer/model/stratified_transformer.py", line 291, in forward
new_window_size = 2 * torch.tensor([self.window_size]*3).type_as(xyz).to(xyz.device)
RuntimeError: CUDA error: invalid device function

from stratified-transformer.

X-Lai avatar X-Lai commented on June 14, 2024

May I ask have you compiled the pointops in /lib? And can you locate which line causes this error?

from stratified-transformer.

wencc-ucas avatar wencc-ucas commented on June 14, 2024

I only compiled pointops2 according to your instruction.

For one gpu,
File "/***/Stratified-Transformer/model/stratified_transformer.py", line 357, in forward
feats = self.kpconv(xyz, xyz, neighbor_idx, feats)

For multi gpu,
File "/***/Stratified-Transformer/model/stratified_transformer.py", line 291, in forward
new_window_size = 2 * torch.tensor([self.window_size]*3).type_as(xyz).to(xyz.device)

from stratified-transformer.

X-Lai avatar X-Lai commented on June 14, 2024

The error may be caused by the kpconv provided by torch-points3d. I wonder whether you successfully install it? Can you double check that torch-points3d can work smoothly?

from stratified-transformer.

wencc-ucas avatar wencc-ucas commented on June 14, 2024

Thanks. I have checked it. But when I use multi gpus, this error has not appeared. Kpconv provided by torch-points3d can work well.

from stratified-transformer.

X-Lai avatar X-Lai commented on June 14, 2024

Can you run successfully now? If you use one GPU, remember to add CUDA_VISIBLE_DEVICES=0 before your python command.

from stratified-transformer.

wencc-ucas avatar wencc-ucas commented on June 14, 2024

I have solved the bug of one gpu by modifing this.

But now I still have the error of line 291 both using one gpu and multi gpus.

from stratified-transformer.

YangParky avatar YangParky commented on June 14, 2024

I have solved the bug of one gpu by modifing this.

But now I still have the error of line 291 both using one gpu and multi gpus.

Hi, How do you modify it?

from stratified-transformer.

basil-hayden avatar basil-hayden commented on June 14, 2024

I have solved the bug of one gpu by modifing this.
But now I still have the error of line 291 both using one gpu and multi gpus.

Hi, How do you modify it?

model = torch.nn.DataParallel(model.cuda()) ----> model = model.cuda()

from stratified-transformer.

praj441 avatar praj441 commented on June 14, 2024

I have solved the bug of one gpu by modifing this.
But now I still have the error of line 291 both using one gpu and multi gpus.

Hi, How do you modify it?

model = torch.nn.DataParallel(model.cuda()) ----> model = model.cuda()

But, that way we can't use multi-GPU training. I am also getting this error when using model = torch.nn.DataParallel(model.cuda()).

any suggestions?

from stratified-transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.