GithubHelp home page GithubHelp logo

eltociear / 3d-stmn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sosppxo/3d-stmn

0.0 1.0 0.0 4.93 MB

The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation"

License: Apache License 2.0

Shell 0.01% JavaScript 0.51% C++ 0.42% Python 87.80% C 7.25% Cuda 0.21% HTML 0.01% CMake 3.80%

3d-stmn's Introduction

3D-STMN

πŸ”₯This branch is for end-to-end training (about 31G of GPU RAM is needed). To save the GPU RAM by preprocessing features berfor training, please switch to the feat branch (only 7G of GPU RAM is needed for training).πŸ”₯

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation

Changli Wu, Yiwei Ma, Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji*, Xiaoshuai Sun

Introduction

​In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions. However, this conventional paradigm encounters significant challenges, most notably in terms of the generation of lackluster initial proposals and a pronounced deceleration in inference speed. Recognizing these limitations, we introduce an innovative end-to-end Superpoint-Text Matching Network (3D-STMN) that is enriched by dependency-driven insights. One of the keystones of our model is the Superpoint-Text Matching (STM) mechanism. Unlike traditional methods that navigate through instance proposals, STM directly correlates linguistic indications with their respective superpoints, clusters of semantically related points. This architectural decision empowers our model to efficiently harness cross-modal semantic relationships, primarily leveraging densely annotated superpoint-text pairs, as opposed to the more sparse instance-text pairs. In pursuit of enhancing the role of text in guiding the segmentation process, we further incorporate the Dependency-Driven Interaction (DDI) module to deepen the network's semantic comprehension of referring expressions. Using the dependency trees as a beacon, this module discerns the intricate relationships between primary terms and their associated descriptors in expressions, thereby elevating both the localization and segmentation capacities of our model. Comprehensive experiments on the ScanRefer benchmark reveal that our model not only set new performance standards, registering an mIoU gain of 11.7 points but also achieve a staggering enhancement in inference speed, surpassing traditional methods by 95.7 times.

Installation

Requirements

  • Python 3.7 or higher
  • Pytorch 1.12
  • CUDA 11.3 or higher

The following installation suppose python=3.8 pytorch=1.12.1 and cuda=11.3.

  • Create a conda virtual environment

    conda create -n 3d-stmn python=3.8
    conda activate 3d-stmn
    
  • Clone the repository

    git clone https://github.com/sosppxo/3D-STMN.git
    
  • Install the dependencies

    Install Pytorch 1.12.1

    pip install spconv-cu113
    conda install pytorch-scatter -c pyg
    pip install -r requirements.txt
    

    Install segmentator from this repo (We wrap the segmentator in ScanNet).

    Install Stanford CoreNLP toolkit from the official website.

  • Setup, Install stmn and pointgroup_ops.

    sudo apt-get install libsparsehash-dev
    python setup.py develop
    cd stmn/lib/
    python setup.py develop
    

Data Preparation

ScanNet v2 dataset

Download the ScanNet v2 dataset.

Put the downloaded scans folder as follows.

3D-STMN
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ scannetv2
β”‚   β”‚   β”œβ”€β”€ scans

Split and preprocess point cloud data

cd data/scannetv2
bash prepare_data.sh

The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.

3D-STMN
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ scannetv2
β”‚   β”‚   β”œβ”€β”€ scans
β”‚   β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”œβ”€β”€ val

ScanRefer dataset

Download ScanRefer annotations following the instructions.

Put the downloaded ScanRefer folder as follows.

3D-STMN
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ ScanRefer
β”‚   β”‚   β”œβ”€β”€ ScanRefer_filtered_train.json
β”‚   β”‚   β”œβ”€β”€ ScanRefer_filtered_val.json

Preprocess textual data

python data/features/save_graph.py --split train --data_root data/ --max_len 78
python data/features/save_graph.py --split val --data_root data/ --max_len 78

Pretrained Backbone

Download SPFormer pretrained model (We only use the Sparse 3D U-Net backbone for training).

Move the pretrained model to backbones.

mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/

Training

For single GPU (32G):

bash scripts/train.sh

For multi-GPU (11G * 4 or 24G * 2):

bash scripts/train_multi_gpu.sh

Inference

Download 3D-STMN pretrain model and move it to checkpoints.

bash scripts/test.sh

Citation

If you find this work useful in your research, please cite:

@misc{2308.16632,
Author = {Changli Wu and Yiwei Ma and Qi Chen and Haowei Wang and Gen Luo and Jiayi Ji and Xiaoshuai Sun},
Title = {3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation},
Year = {2023},
Eprint = {arXiv:2308.16632},
}

Ancknowledgement

Sincerely thanks for SoftGroup SSTNet and SPFormer repos. This repo is build upon them.

3d-stmn's People

Contributors

sosppxo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.