GithubHelp home page GithubHelp logo

rozdavid / unscene3d Goto Github PK

View Code? Open in Web Editor NEW
20.0 1.0 3.0 29.52 MB

Unsupervised 3D Instance Segmentation

License: BSD 3-Clause "New" or "Revised" License

Python 75.12% Shell 1.45% CMake 0.01% C++ 9.27% Cuda 9.84% C 4.32%

unscene3d's Introduction

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

Implementation for our CVPR 2024 paper

UnScene3D fully unsupervised 3D instance segmentation method, generating pseudo masks through self-supervised color and geometry features and refining them via self-training. Ultimately we achieve a 300% improvement over existing unsupervised methods, even in complex and cluttered 3D scenes and provide a powerful pretraining method.

teaser

For any code-related or other questions open an issue here or contact David Rozenberszki If you found this work helpful for your research, please consider citing our paper:

@inproceedings{rozenberszki2024unscene3d,
    title={UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes},
    author={Rozenberszki, David and Litany, Or and Dai, Angela},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024}
}

README structure

  • Installation - setting up a conda environment and building/installing custom cpp tool s
  • Data Preprocessing - we primarily use the ScanNet dataset, we have to preprocess them to get aligned point clouds and 2D images
  • Pseudo Mask Generation - we generate pseudo masks using self-supervised features and extract them for self-training
  • Self-Training - we mostly follow the training procedure of Mask3D, but we use the pseudo masks, noise robust losses, self-training iterations, and a class-agnostic evaluation
  • Available Resources - we provide the pseudo datasets, pretrained models for evaluation and inference and loads of visualized scenes.

Roadmap

  • Pseudo Mask Generation
  • Self-Training
  • Evaluation
  • Upload pretrained models, datasets, visualizations and training resources

Installation

The codebase was developed and tested on Ubuntu 20.04, with various GPU versions [RTX_2080, RTX_3060, RXT_3090, RXT_A6000] and NVCC 11.6 We provide an Anaconda environment with the dependencies, to install run

conda env create -f conf/unscene3d_env.yml
conda activate unscene3d

Additionally, MinkowskiEngine has to be installed manually with a specified CUDA version. E.g. for CUDA 11.6 run

export CUDA_HOME=/usr/local/cuda-11.6
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps

Finally, for building the custom cpp/cuda tools, run

cd utils/cpp_utils && python setup.py install
cd ../cuda_utils && python setup.py install
cd ../../third_party/pointnet2 && python setup.py install
cd ../..

Data download and preprocessing

After installing the dependencies, we preprocess the datasets.

We provide example training and inference scripts for the ScanNet dataset. For downloading the raw data, please refer to the instructions on the official GitHub page. For the pseudo mask generation, we dont need any specific data preprocessing, but have to extract the ScanNet images from their .sens files. For this please refer to the ScanNet repository SensReader, where you can use the python script to extract every Nth frame from the .sens files. In our experiments we use every 20th frame.

Pseudo Mask Generation

Our module combines self-supervised pretrained features from 2D/3D domains with a geometric oversegmentation of scenes, allowing for efficient representation of scene properties. We use a greedy approach with the Normalized Cut algorithm to iteratively separate foreground-background segments. After each iteration, we mask out predicted segment features and continue this process until there are no segments remaining. This method is demonstrated in the included animation.

pseudo

To extract pseudo masks using both modalities and DINO features, run

cd pseudo_masks
. scripts/unscene3d_dino_2d3d.sh

The most important parameters are the following:

  • freemask.modality - the modality to use for the pseudo mask generation, either geom, color or both
  • freemask.affinity_tau - the threshold for the Normalized Cut algorithm, for ablation studies check our supplementary material
  • data.segments_min_vert_nums - the minimum number of vertices in a segment after oversegmentation, for ablation studies check our supplementary material
  • freemask.min_segment_size - the minimum number of segments in a pseudo mask, for ablation studies check our supplementary material
  • freemask.separation_mode - the mode of the Normalized Cut algorithm, either max, avg, all or largest, for ablation studies check our supplementary material

Self-Training

self-training

We use the pseudo masks generated in the previous step to train the model in a self-training manner. We use the pseudo masks as ground truth and train the model with a noise robust loss, which is a combination of the standard cross-entropy loss and the Dice loss with low quality matches filtered by a 3D version of DropLoss. First we have to format the data for the self-train cycles. In this part of the code we rely the wast majority on the Mask3D codebase, with some minor modifications and also follow their logic on the training.

To save the datasets for self-training, run

python datasets/preprocessing/freemask_preprocessing.py preprocess 

The most important parameters are the following:

  • --data_dir - the path to the raw ScanNet dataset
  • --save_dir - the path to save the processed data
  • --modes - default is ("train", "validation"), but can be selected for either
  • --git_repo - the path to the ScanNet git repository, needed for the ScanNet label mapping
  • --oracle - if selected, the original annotation is used for pseudo masks in a class-agnostic manner. needed to create a version for evaluation
  • --freemask_dir - the path to the pseudo masks generated in the previous step
  • --n_jobs - makes you wait less if you use more cores :)

Finally, to train the model with the pseudo masks over multiple stages of self-training iterations, run

. scripts/mask3d_DINO_CSC_self_train.sh

Available Resources

We provide the pretrained weights for the CSC model, which is used for self-superivsed feature extraction. This was trained on teh training scenes of ScanNet, with default parameters.

Preprocessed Datasete

We preprocessed a set of pseudo datasets in different variations, which can be used for self-training. We provide the following datasets:

Dataset Name Description
scannet_freemask_oracle The oracle dataset using the GT ScanNet mask annotation, mostly used for evaluation only.
unscene3d_dino Our proposed psuedo mask dataset, using projected 2D features from DINO for the NCut stage.
unscene3d_dino_csc Our proposed psuedo mask dataset, using both 3D CSC and projected 2D features from DINO for the NCut stage.
unscene3d_arkit Using 3D features on the ArKitScenes dataset for the NCut stage.

Pretrained Models

We also provide the trained checkpoints for the self-training iterations, which can be used for evaluation or inference purposes. The checkpoints and the corresponding training logs are available for the following setups:

Setup Name Description
unscene3d_CSC_self_train_3 The model trained with pseudo masks using 3D features only, and after 3 self-training iteration.
unscene3d_DINO_self_train_3 The model trained with pseudo masks using 2D features only, and after 3 self-training iteration.
unscene3d_DINO_CSC_self_train_3 The model trained with pseudo masks using both 2D and 3D features only, and after 3 self-training iteration.
unscene3d_arkit_self_train_2 The model trained with the ArKitScenes and pseudo masks extracted from 3D features only, and after 2 self-training iterations.

Visualizations

Finally, we show some qualitative results of the pseudo mask generation and the self-training iterations for the different setups. You can download the visualizations for 3D only and both 2D/3D psuedo masks. For opening the visualizations in your browser, please use PyViz3D.

unscene3d's People

Contributors

rozdavid avatar

Stargazers

Seok Joon Kim avatar Fonic avatar Yixin Chen avatar Xiaobing Han avatar Lin avatar Jiacheng Deng avatar Xiuwei Xu avatar 爱可可-爱生活 avatar Dave Z. Chen  avatar Manuel Dahnert avatar Wei-Lung Hsu avatar  avatar Hai Pham avatar Giseop Kim avatar  avatar Nguyen Duc Anh Phuc avatar XuDong Frank Wang avatar Boying Li avatar Ziya ERKOC avatar Cedric Perauer avatar

Watchers

 avatar

unscene3d's Issues

AttributeError: module 'models' has no attribute 'Mask3D'

Hello!
When I run. scripts/mask3d_DINO_CSC_self_train.sh,I get an error:

Error locating target 'models.Mask3D', set env var HYDRA_FULL_ERROR=1 to see chained exception.
full_key: model
AttributeError: module 'models' has no attribute 'Mask3D'
During handling of the above exception, another exception occurred:
ModuleNotFoundError: No module named 'models.Mask3D'

Could you help me fix this error?

ModuleNotFoundError: No module named 'lib'

Hello!
When I run '. scripts/unscene3d_dino_2d3d.sh', I get the following error:

Traceback (most recent call last):
File "unscene3d_pseudo_main.py", line 15, in
from datasets import load_dataset
File "/UnScene3D/pseudo_masks/datasets/init.py", line 1, in
from lib.datasets import scannet, scannet_solo, scannet_free, arkit, s3dis
ModuleNotFoundError: No module named 'lib'

Can you help me?

About pseudo mask generation

Hello!
When I run '. scripts/unscene3d_dino_2d3d.sh', I get a new error:

hydra.errors.MissingConfigException: Could not load override hydra/launcher/submitit_slurm

Can you help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.