microsoft / swin3d Goto Github PK

View Code? Open in Web Editor NEW

193.0 9.0 17.0 178 KB

A shift-window based transformer for 3D sparse tasks

License: MIT License

Python 34.74% C++ 3.36% Cuda 61.76% C 0.14%

swin3d's Introduction

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

Updates

27/04/2023

Initial commits:

Pretrained models on Structured3D are provided.
The supported code for Semantic Segmentation on ScanNet and S3DIS are provided.

Introduction

We present a pretrained 3D backbone, named Swin3D, that first-time outperforms all state-of-the-art methods on downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed for efficiently conducting self-attention on sparse voxels with a linear memory complexity and capturing the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model on various downstream real-world indoor scene understanding tasks.

Data Preparation

We pretrained our Swin3D on Structured3D, please refer to this link to prepare the data.

Pretrained Models

The models pretrained on Structured3D with different cRSE are provided here.

	Pretrain	#params	cRSE	mIoU(val)	Model	Log
Swin3D-S	Structured3D	23.57M	XYZ,RGB	77.69	model	log
Swin3D-S	Structured3D	23.57M	XYZ,RGB,NORM	79.15	model	log
Swin3D-L	Structured3D	60.75M	XYZ,RGB	79.79	model	log
Swin3D-L	Structured3D	60.75M	XYZ,RGB,NORM	81.04	model	log

Quick Start

Install the package using

pip install -r requirements.txt
python setup.py install

Build models and load our pretrained weight, Then you can finetune your model in various task.

import torch
from Swin3D.models import Swin3DUNet
model = Swin3DUNet(depths, channels, num_heads, \
        window_sizes, quant_size, up_k=up_k, \
        drop_path_rate=drop_path_rate, num_classes=num_classes, \
        num_layers=num_layers, stem_transformer=stem_transformer, \
        upsample=upsample, first_down_stride=down_stride, \
        knn_down=knn_down, in_channels=in_channels, \
        cRSE='XYZ_RGB_NORM', fp16_mode=1)
model.load_pretrained_model(ckpt_path)

Results and models

To reproduce our results on downstream tasks, please follow the code in this repo. The results are provided here.

ScanNet Segmentation

	Pretrained	mIoU(Val)	mIoU(Test)
Swin3D-S	✗	75.2	-
Swin3D-S	✓	75.6(76.8)	-
Swin3D-L	✓	76.2(77.5)	77.9

S3DIS Segmentation

	Pretrained	Area 5 mIoU	6-fold mIoU
Swin3D-S	✗	72.5	76.9
Swin3D-S	✓	73.0	78.2
Swin3D-L	✓	74.5	79.8

ScanNet 3D Detection

	Pretrained	[email protected]	[email protected]
Swin3D-S+FCAF3D	✓	74.2	59.5
Swin3D-L+FCAF3D	✓	74.2	58.6
Swin3D-S+CAGroup3D	✓	76.4	62.7
Swin3D-L+CAGroup3D	✓	76.4	63.2

S3DIS 3D Detection

	Pretrained	[email protected]	[email protected]
Swin3D-S+FCAF3D	✓	69.9	50.2
Swin3D-L+FCAF3D	✓	72.1	54.0

Citation

If you find Swin3D useful to your research, please cite our work:

@misc{yang2023swin3d,
      title={Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding}, 
      author={Yu-Qi Yang and Yu-Xiao Guo and Jian-Yu Xiong and Yang Liu and Hao Pan and Peng-Shuai Wang and Xin Tong and Baining Guo},
      year={2023},
      eprint={2304.06906},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

swin3d's People

Contributors

Stargazers

Watchers

Forkers

whuhxb umaatgithub tiantian205 bweng001 canbaba0517 dancivitarese zainlau nezamtrm instig8r jin136 iris-yuwuqing xavier-wa zc2023 oxkitsune jessy-huang venonary junjie2008v

swin3d's Issues

How do I run this network with my own data?

Dear all, thank you for publishing this work.
I am curious how to use this network on my own data. For example, say I have a PCD for chairs, I do not have the ground truth for this. I just want to run inference script to give me a list of segmented objects.
How would be the best way to progress using your network? Any help on this would be very much appreciated.
Thank you for the contribution !!

Low Result without pretrained pth

          @Yukichiii I applied epochs = 100 // loop = 30 and eval_freq = 2. I get on Area 5 :

Swin3d-S : 69.76
Swin3d-L : 69.79
It is possible to provide the log ? For the tests do you use num_vote = 12 ? For training did you change any parameters on relation to code https://github.com/Yukichiii/Swin3D_Task
The result in your paper Swin3D-S | ✗ | 72.5 (without pretrained weight), is obtained with RGB or RGB + Normals.
Thank's

Originally posted by @hpc100 in #16 (comment)

S3DIS results

Thanks for sharing your work ! It's possible to provide Swin3D-L (without pretrained) results and the log for Swin3D-S (without pretrained).
I ran some this two exp with 4GPU & 100 epoch and I get :

for swin3D-S (without pretrained) mIoU=68.0 (Area 5 validation) (4 points lower your result (72.5))
for swin3D-L (without pretrained) mIoU = 69.4 (Area 5 valdiation)
For both model (pretrained on Structured3D), I get your performance : 73 for Swin3d-S and 74.5 for Swin3d-L).
So, in paper you announced 3000 epochs for S3DIS to train from scratch on S3DIS, is the number of epochs is 3000 or 100, for training without pretrained pth on Structured3D ?
If the number if different from 100, which hyperparameter do you used (eval_freq = 2 or higher to save time, ...)
Thank's

ScanNet v2 - Normals

Thanks for providing this work !

Why use Swin3D_RGB_N.py when PointGroup's preprocessing code does not provide normals ?

s3ds 6-fold

can you provide the 6-fold code on s3ds? I will appreciate it very much !

Feature extraction for point clouds

Hi, thanks for your great work. I am trying to use your Swin3D as a frozen feature extractor for new 3D point clouds, like how people use CLIP/DINOv2 for images.

Could you share some simple scripts or ideas about how we can achieve this with your codebase? This can be very helpful.

model.eval causing nan values

Thanks for sharing your work !
@Yukichiii @yuxiaoguo I tried to test your code on Semantic3D :
In validation step, i get "nan" value in output.

I checked points cloud data, and there is no "nan" in npy files
I used this : print('Nan value .... ???? ', [k for k, v in model.named_parameters() if any(torch.isnan(v.ravel()))]). There is no nan values in train (model.train()) and validation (model.eval())
torch.where( torch.isnan(coord) == True), torch.where( torch.isnan(feat) == True), torch.where( torch.isnan(batch) == True) return empty list
So : neither nan value in data nor weigths/biais -> however output filled with nan

Do you have any idea where the problem could come from (layer norm, ....) ?

Details about Using Swin3D Backbone with CAGroup in 3D Detection

Hi! I am curious about the training details for the Swin3D encoder and CAGroup. Since the CAGroup uses a repeated ScanNet dataset during the training process, I am wondering if there are different implementations in the dataset. Additionally, when using a 48-dimensional upsampling output as the CAGroup input, the CUDA memory overflows even on a 24GB 4090 GPU, and this occurs despite setting the batch size to 1. Could you provide some insights into the training environment for the 3D detection task? Thank you!

Codes of finetuing in downstream task

According to Sec.5.2 in the paper, it seems that for fine-tuning on the 3D Detection task, I just need to replace the backbone in the FCAF3D and CAGroup3D code repositories with Swin3D and load the weights, is that correct?

As for fine-tuning on the semantic segmentation task on S3DIS and ScanNet, the paper doesn't mention which code repository it was developed based on. Could you please consider opening the fine-tuning codes?

error

Traceback (most recent call last):
File "***/sparse_dl/attn/attn_coff.py", line 12, in
import sparse_dl.attn_cuda as attn_module
ModuleNotFoundError: No module named 'sparse_dl.attn_cuda'

Runs OK on A100 but Gives Core Dumped Error on H100

Any reason why this happens and how I can resolve this?

How to prepare custom data?

Hi, thanks for sharing your great work! I was tring to use the pretrained model to segment my own point cloud data, with only position and color of each point provided. I expect the model to output the segmentation label of each point. How should I prepare my data? Can I just input the coords and the color of the points into the model?

Question about Memory-efficient self-attention

Hi, it is a nice job about utilizing swin transformer to point cloud. However, I really don't understand the content of Memory-efficient self-attention.

$f_{i,h}^{*}=\frac{\sum_{j=1}^{N}(exp(e_{ij,h})f_{j}W_{V,h})}{\sum_{j=1}^{N}exp(e_{ij},h)}----(3)$

how can I understand the idea of allowing to postpone the SoftMax normalization and avoid constructing and storing ${αij,h}$ explicitly.

Calculating the denominator and numerator of Eq. (3) simultaneously is also a question that hard to fully understand.

Could you please give me some tips about how to grasp the idea of Memory-efficient self-attention.

MemoryError: std::bad_alloc: cudaErrorMemoryAllocation: out of memory

when I use 2GPU to train the s3ds segmentation it errs:
Traceback (most recent call last):
File "train.py", line 919, in
main()
File "train.py", line 114, in main
mp.spawn(
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/data/zhangqi/HZ/Swin3D_Task-main/SemanticSeg/train.py", line 514, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(
File "/data/zhangqi/HZ/Swin3D_Task-main/SemanticSeg/train.py", line 609, in train
output = model(feat, coord, batch)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/zhangqi/HZ/Swin3D_Task-main/SemanticSeg/model/Swin3D_RGB.py", line 70, in forward
return self.backbone(sp, coords_sp)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/models/Swin3D.py", line 132, in forward
sp = self.stem_layer(sp)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/Swin3D-0.0.0-py3.8-linux-x86_64.egg/Swin3D/modules/mink_layers.py", line 77, in forward
x = self.conv_layers(x)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiConvolution.py", line 314, in forward
outfeat = self.conv.apply(
File "/home/lthpc/.conda/envs/Swin3d/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiConvolution.py", line 72, in forward
return fw_fn(
MemoryError: std::bad_alloc: cudaErrorMemoryAllocation: out of memory

Does anyone know how to fix it?
I can use singe GPU to train now by setting train_gpu to [0] in Swin3D_Task-main/SemanticSeg/config/s3dis/swin3D_RGB_L.yaml

How to apply Swin3D as backbone to detect 3d objects?

hey! I want to know how to apply SWIN3D for 3d object detection? The provided code seems to be designed for semantic segmentation. I would like to know if it can be directly used as a backbone for object detection?

Cannot access a tensor

Hello, I'm facing a pretty weird problem with the swin3dunet. I could run my training code for around 20 epochs on a v100 GPU, but eventually I will have the error below (I've set export CUDA_LAUNCH_BLOCKING=1 when running the code):

  File "/home/chenz0f/Swin-3D/decoder/backbone/swin3dunet.py", line 212, in forward
    sp, sp_down, coords_sp = layer(sp, coords_sp)
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chenz0f/Swin-3D/decoder/backbone/modules/swin3d_layers.py", line 866, in forward
    sp_down, coords_sp = self.downsample(sp, coords_sp)
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chenz0f/Swin-3D/decoder/backbone/modules/swin3d_layers.py", line 302, in forward
    feats = query_knn_feature(self.k, xyz, n_xyz, sp.F, offset, n_offset)
  File "/home/chenz0f/Swin-3D/decoder/backbone/modules/swin3d_layers.py", line 45, in query_knn_feature
    grouped_feat = src_feat[idx.view(-1).long(), :].view(m, K, c)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

This error would occur together with several lines of CUDA assertion error message:

/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [5,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [6,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [7,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [8,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1678402411778/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2318,0,0], thread: [9,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

However, when I tried to check what was wrong, I was only able to check the shape of idx and src_feat. If I try to print their values, the CUDA assert error would also be triggered:

  File "/home/chenz0f/Swin-3D/decoder/backbone/modules/swin3d_layers.py", line 308, in forward
    feats = query_knn_feature(self.k, xyz, n_xyz, sp.F, offset, n_offset)
  File "/home/chenz0f/Swin-3D/decoder/backbone/modules/swin3d_layers.py", line 48, in query_knn_feature
    print("idx:", idx)
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/_tensor.py", line 426, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/_tensor_str.py", line 636, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/_tensor_str.py", line 567, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/_tensor_str.py", line 327, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/_tensor_str.py", line 361, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/_tensor_str.py", line 361, in <listcomp>
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "/home/chenz0f/anaconda3/envs/v100/lib/python3.10/site-packages/torch/_tensor_str.py", line 353, in get_summarized_data
    return torch.cat(
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I've also tried to use detach() and cpu(), but they all got the same error. May I ask if you've had similar problems before, or do you know what have might caused it?

ModuleNotFoundError: No module named 'Swin3D.sparse_dl.attn_cuda'

thanks for your work~!
but when i run:

import torch
from Swin3D.models import Swin3DUNet
model = Swin3DUNet(depths, channels, num_heads,
window_sizes, quant_size, up_k=up_k,
drop_path_rate=drop_path_rate, num_classes=num_classes,
num_layers=num_layers, stem_transformer=stem_transformer,
upsample=upsample, first_down_stride=down_stride,
knn_down=knn_down, in_channels=in_channels,
cRSE='XYZ_RGB_NORM', fp16_mode=1)
model.load_pretrained_model(ckpt_path)

i got this bug: ModuleNotFoundError: No module named 'Swin3D.sparse_dl.attn_cuda'
emmm, what can i do???

Problem when predicting sample data input.npz

Hi, I am working with your source code and faced a problem. With your sample data "input.npz", I got below results. This result like a mess. I don't know where am I wrong, so can you please share with me your code for making prediction on own point cloud? Thank you very much.

Note:
image1 is point cloud of "input.npz"
image2 is prediction
image3 is points of label "wall"

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

staging : This repository will ship as Open Source or go public

collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.

delete : This repository will be deleted because it is no longer needed.

other : Other reasons not specified

Need more help? 🖐️

Email [email protected]. ✉️
Post your questions in GitHub inside Microsoft Team in Microsoft Teams. 🗨️

GPU memory and training cost on ScanNet_v2

Thanks for your inspiring work! I noticed that in main paper the pre-training stage on Structured3D cost 488 and 703 GPU hours for Swin3D-S and Swin3D-L. May I also know the approximately GPU hours and memory cost when training from scratch on ScanNet_v2 that leads to 75.2(Swin3D-S*)/74.2(Swin3D-L*) val mIoU?

Has Swin3d++ been released?

Request for input of inference

Hello, I'm trying to run the segmentation.py and found that the input file examples/input.npz is not in this repo, could you please upload it?

vote？

the great work，now，i am following your work for my work. but i don’t understand a sentence from paper. the sentence is “On the ScanNet benchmark(test dataset), we ensembled the results of three trained models by voting the prediction on over-segmented meshes”.
you set vote_num=12，but here is 3？

I have an understanding, but I don't know if it's correct or not. vote_num means same weight trained model but different input. “three trained model” mean same model but different weight，every trained model have vote_num=12,therefore 3 x12=36？

If it's the latter, may I ask for a GridDownsample version? Thanks.