GithubHelp home page GithubHelp logo

whuhxb / swin3d Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/swin3d

0.0 0.0 0.0 178 KB

A shift-window based transformer for 3D sparse tasks

License: MIT License

C++ 3.36% Python 34.74% C 0.14% Cuda 61.76%

swin3d's Introduction

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

PWC PWC PWC PWC PWC

Updates

27/04/2023

Initial commits:

  1. Pretrained models on Structured3D are provided.
  2. The supported code for Semantic Segmentation on ScanNet and S3DIS are provided.

Introduction

We present a pretrained 3D backbone, named Swin3D, that first-time outperforms all state-of-the-art methods on downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed for efficiently conducting self-attention on sparse voxels with a linear memory complexity and capturing the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model on various downstream real-world indoor scene understanding tasks.

teaser

Overview

Data Preparation

We pretrained our Swin3D on Structured3D, please refer to this link to prepare the data.

Pretrained Models

The models pretrained on Structured3D with different cRSE are provided here.

Pretrain #params cRSE mIoU(val) Model Log
Swin3D-S Structured3D 23.57M XYZ,RGB 77.69 model log
Swin3D-S Structured3D 23.57M XYZ,RGB,NORM 79.15 model log
Swin3D-L Structured3D 60.75M XYZ,RGB 79.79 model log
Swin3D-L Structured3D 60.75M XYZ,RGB,NORM 81.04 model log

Quick Start

Install the package using

pip install -r requirements.txt
python setup.py install

Build models and load our pretrained weight, Then you can finetune your model in various task.

import torch
from Swin3D.models import Swin3DUNet
model = Swin3DUNet(depths, channels, num_heads, \
        window_sizes, quant_size, up_k=up_k, \
        drop_path_rate=drop_path_rate, num_classes=num_classes, \
        num_layers=num_layers, stem_transformer=stem_transformer, \
        upsample=upsample, first_down_stride=down_stride, \
        knn_down=knn_down, in_channels=in_channels, \
        cRSE='XYZ_RGB_NORM', fp16_mode=1)
model.load_pretrained_model(ckpt_path)

Results and models

To reproduce our results on downstream tasks, please follow the code in this repo. The results are provided here.

ScanNet Segmentation

Pretrained mIoU(Val) mIoU(Test)
Swin3D-S 75.2 -
Swin3D-S 75.6(76.8) -
Swin3D-L 76.2(77.5) 77.9

S3DIS Segmentation

Pretrained Area 5 mIoU 6-fold mIoU
Swin3D-S 72.5 76.9
Swin3D-S 73.0 78.2
Swin3D-L 74.5 79.8

ScanNet 3D Detection

Pretrained [email protected] [email protected]
Swin3D-S+FCAF3D 74.2 59.5
Swin3D-L+FCAF3D 74.2 58.6
Swin3D-S+CAGroup3D 76.4 62.7
Swin3D-L+CAGroup3D 76.4 63.2

S3DIS 3D Detection

Pretrained [email protected] [email protected]
Swin3D-S+FCAF3D 69.9 50.2
Swin3D-L+FCAF3D 72.1 54.0

Citation

If you find Swin3D useful to your research, please cite our work:

@misc{yang2023swin3d,
      title={Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding}, 
      author={Yu-Qi Yang and Yu-Xiao Guo and Jian-Yu Xiong and Yang Liu and Hao Pan and Peng-Shuai Wang and Xin Tong and Baining Guo},
      year={2023},
      eprint={2304.06906},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

swin3d's People

Contributors

microsoftopensource avatar yuxiaoguo avatar microsoft-github-operations[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.