yukichiii / swin3d_task Goto Github PK

The Experiment Code for Swin3D

Home Page: https://arxiv.org/abs/2304.06906

Python 99.84% Shell 0.16%

swin3d_task's Issues

Fine-tuning on XYZ,NORM only

Thanks for providing this work!

As per the title, I am curious if it should be possible to fine-tune with your provided weights on XYZ and NORM only, as my data has no RGB available? It seems the power of Swin3D also lies the the pretraining, so want to utilize that.

setup.py complie ERROR

When I executing python setup.py install ，some errors occur：

ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 13, in <module>
    setup(
  File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 67, in run
    self.do_egg_install()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 109, in do_egg_install
    self.run_command('bdist_egg')
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 167, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 153, in call_command
    self.run_command(cmdname)
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/opt/conda/lib/python3.8/distutils/command/install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
    self.build_extension(ext)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Here is one of the outputs：

[6/6] /usr/local/cuda/bin/nvcc  -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/
site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH 
-I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include 
-I/opt/conda/include/python3.8 -c -c /opt/data/private/Swin3D_Task-main/Swin3D/Swin3D/src/attn/self_attn_aio_bwd.cu 
-o /opt/data/private/Swin3D_Task-main/Swin3D/build/temp.linux-x86_64-3.8/Swin3D/src/attn/self_attn_aio_bwd.o 
-D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ 
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 
-DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '
-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 
-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 
-gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 
-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 
-gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /opt/data/private/Swin3D_Task-main/Swin3D/build/temp.linux-x86_64-3.8/Swin3D/src/attn/self_attn_aio_bwd.o 
/usr/local/cuda/bin/nvcc  -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/
site-packages/torch/include/torch/csrc/api/include 
-I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC 
-I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c 
-c /opt/data/private/Swin3D_Task-main/Swin3D/Swin3D/src/attn/self_attn_aio_bwd.cu 
-o /opt/data/private/Swin3D_Task-main/Swin3D/build/temp.linux-x86_64-3.8/Swin3D/src/attn/self_attn_aio_bwd.o 
-D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ 
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 
-DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '
-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 
-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 
-gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 
-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 
-gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/opt/data/private/Swin3D_Task-main/Swin3D/Swin3D/src/attn/attn_utils.cuh(95): error: no operator "+=" matches these operands
            operand types are: __half2 += const __half2

/opt/data/private/Swin3D_Task-main/Swin3D/Swin3D/src/attn/attn_utils.cuh(96): error: no operator "+=" matches these operands
            operand types are: __half2 += const __half2

/opt/data/private/Swin3D_Task-main/Swin3D/Swin3D/src/attn/attn_utils.cuh(97): error: no operator "+=" matches these operands
            operand types are: __half2 += const __half2

/opt/data/private/Swin3D_Task-main/Swin3D/Swin3D/src/attn/attn_utils.cuh(106): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__half2 *, const __half2)
... ...
54 errors detected in the compilation of "/opt/data/private/Swin3D_Task-main/Swin3D/Swin3D/src/attn/self_attn_aio_bwd.cu".

How can I fix this ? Thanks for your time!!!

code and model for 3D detection task

Hi, Thanks for your great job!
I am following your work to do some experimental trials on indoor scene 3D detection, so Im wondering when will you release the example codes and models for 3D detection task based on Swin3D?
Really appreciate your help and look forward your reply.

ddp problem

Here is stuck. How can I solve it?

NaN values produced during training

Hi, thank you for sharing the code!

During training on the ScanNet dataset, I noticed that NaN values can be produced in the forward call of the Swin3DUnet module: model(sp, coords_sp). A similar issue has been reported on this page. I tried to set the fp16_mode=0 and use_amp=False, but NaN values persist.

Upon further investigation, I have identified a potential source of these NaN values, which appears to be in the self_attn_cuda_forward_device() function within the self_attn_aio_fwd.cu file. I tried to learn how to debug CUDA+python files referring to some documents, but it somehow didn't work out as intended.

I'd greatly appreciate it if you could shed some light on why these 'NaN' values are occurring and provide guidance on effectively debugging CUDA files.

Thank you and best regards.

Got error when training with S3DIS: "RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmEx"

Thank you for sharing a promising model. I'm now testing this repo with S3DIS following the instructions in the repo, but I got the following error. It seems to be caused by the wrong input dimension. Do you have any idea to solve this?

[09/09 09:28:13 main-logger]: #Model parameters: 26495773
[09/09 09:28:14 main-logger]: => no checkpoint found at 'runs/s3dis_Swin3D_RGB_S_123/model/model_last.pth'
[09/09 09:28:14 main-logger]: augmentation all
[09/09 09:28:14 main-logger]: jitter_sigma: 0.005, jitter_clip: 0.02
Totally 204 samples in train set.
Total repeated 204 samples in train set.
[09/09 09:28:14 main-logger]: train_data samples: '1224'
/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/utils/data/dataloader.py:563: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Totally 67 samples in val set.
Total repeated 67 samples in val set.
[09/09 09:28:14 main-logger]: scheduler: MultiStep. scheduler_update: epoch. milestones: [60, 80], gamma: 0.1
[09/09 09:28:14 main-logger]: lr: [0.0006, 5.9999999999999995e-05]
feat: torch.Size([80000, 3])
/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1659484683044/work/aten/src/ATen/native/TensorShape.cpp:2894.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/modules/swin3d_layers.py:783: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
w_w_id // window_size // window_size,
/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/modules/swin3d_layers.py:784: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
w_w_id // window_size % window_size,
feats.shape: torch.Size([80000, 48])
Traceback (most recent call last):
File "SemanticSeg/train.py", line 916, in
main()
File "SemanticSeg/train.py", line 114, in main
main_worker(args.train_gpu, args.ngpus_per_node, args)
File "SemanticSeg/train.py", line 510, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(
File "SemanticSeg/train.py", line 606, in train
output = model(feat, coord, batch)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Swin3D/SemanticSeg/model/Swin3D_RGB.py", line 70, in forward
return self.backbone(sp, coords_sp)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/models/Swin3D.py", line 140, in forward
sp, sp_down, coords_sp = layer(sp, coords_sp)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/modules/swin3d_layers.py", line 866, in forward
feats = blk(feats, attn_args_blk) # [N, C]
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/modules/swin3d_layers.py", line 622, in forward
feats = self.attn(feats, attn_args) # [N, c]
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/modules/swin3d_layers.py", line 504, in forward
qkv = self.qkv(feats)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

As far as I checked, Window Attention in the first block seems waiting 48 dimensional data, but it gets 80000 dimensional data.

yukichiii / swin3d_task Goto Github PK

swin3d_task's Issues

Fine-tuning on XYZ,NORM only

setup.py complie ERROR

code and model for 3D detection task

ddp problem

NaN values produced during training

Got error when training with S3DIS: "RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmEx"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs