3dlg-hcvc / m3dref-clip Goto Github PK

View Code? Open in Web Editor NEW

63.0 1.0 3.0 1.56 MB

[ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects

Home Page: https://3dlg-hcvc.github.io/multi3drefer/

License: MIT License

Python 82.81% C++ 7.16% Cuda 7.88% C 2.15%

3d computer-vision deep-learning visual-grounding clip cuda localization pytorch pytorch-lightning transformer

m3dref-clip's Introduction

M3DRef-CLIP

This is the official implementation for Multi3DRefer: Grounding Text Description to Multiple 3D Objects.

Requirement

This repo contains CUDA implementation, please make sure your GPU compute capability is at least 3.0 or above.

We report the max computing resources usage with batch size 4:

	Training	Inference
GPU mem usage	15.2 GB	11.3 GB

Setup

Conda (recommended)

We recommend the use of miniconda to manage system dependencies.

# create and activate the conda environment
conda create -n m3drefclip python=3.10
conda activate m3drefclip

# install PyTorch 2.0.1
conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia

# install PyTorch3D with dependencies
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d

# install MinkowskiEngine with dependencies
conda install -c anaconda openblas
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps \
--install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"

# install Python libraries
pip install .

# install CUDA extensions
cd m3drefclip/common_ops
pip install .

Pip

Note: Setting up with pip (no conda) requires OpenBLAS to be pre-installed in your system.

# create and activate the virtual environment
virtualenv env
source env/bin/activate

# install PyTorch 2.0.1
pip install torch torchvision

# install PyTorch3D
pip install pytorch3d

# install MinkowskiEngine
pip install MinkowskiEngine

# install Python libraries
pip install .

# install CUDA extensions
cd m3drefclip/common_ops
pip install .

Data Preparation

Note: Both ScanRefer and Nr3D datasets requires the ScanNet v2 dataset. Please preprocess it first.

ScanNet v2 dataset

Download the ScanNet v2 dataset (train/val/test), (refer to ScanNet's instruction for more details). The raw dataset files should be organized as follows:

m3drefclip # project root
├── dataset
│   ├── scannetv2
│   │   ├── scans
│   │   │   ├── [scene_id]
│   │   │   │   ├── [scene_id]_vh_clean_2.ply
│   │   │   │   ├── [scene_id]_vh_clean_2.0.010000.segs.json
│   │   │   │   ├── [scene_id].aggregation.json
│   │   │   │   ├── [scene_id].txt

Pre-process the data, it converts original meshes and annotations to .pth data:

python dataset/scannetv2/preprocess_all_data.py data=scannetv2 +workers={cpu_count}

Pre-process the multiview features from ENet: Please refer to the instructions in ScanRefer's repo with one modification:
- comment out lines 51 to 56 in batch_load_scannet_data.py since we follow D3Net's setting that doesn't do point downsampling here.
Then put the generated enet_feats_maxpool.hdf5 (116GB) under m3drefclip/dataset/scannetv2

ScanRefer dataset

Download the ScanRefer dataset (train/val). Also, download the test set. The raw dataset files should be organized as follows:

m3drefclip # project root
├── dataset
│   ├── scanrefer
│   │   ├── metadata
│   │   │   ├── ScanRefer_filtered_train.json
│   │   │   ├── ScanRefer_filtered_val.json
│   │   │   ├── ScanRefer_filtered_test.json

Pre-process the data, "unique/multiple" labels will be added to raw .json files for evaluation purpose:
```
python dataset/scanrefer/add_evaluation_labels.py data=scanrefer
```

Nr3D dataset

Download the Nr3D dataset (train/test). The raw dataset files should be organized as follows:

m3drefclip # project root
├── dataset
│   ├── nr3d
│   │   ├── metadata
│   │   │   ├── nr3d_train.csv
│   │   │   ├── nr3d_test.csv

Pre-process the data, "easy/hard/view-dep/view-indep" labels will be added to raw .csv files for evaluation purpose:
```
python dataset/nr3d/add_evaluation_labels.py data=nr3d
```

Multi3DRefer dataset

Downloading the Multi3DRefer dataset (train/val). The raw dataset files should be organized as follows:

m3drefclip # project root
├── dataset
│   ├── multi3drefer
│   │   ├── metadata
│   │   │   ├── multi3drefer_train.json
│   │   │   ├── multi3drefer_val.json

Pre-trained detector

We pre-trained PointGroup implemented in MINSU3D on ScanNet v2 and use it as the detector. We use coordinates + colors + multi-view features as inputs.

Download the pre-trained detector. The detector checkpoint file should be organized as follows:
```
m3drefclip # project root
├── checkpoints
│   ├── PointGroup_ScanNet.ckpt
```

Training, Inference and Evaluation

Note: Configuration files are managed by Hydra, you can easily add or override any configuration attributes by passing them as arguments.

# log in to WandB
wandb login

# train a model with the pre-trained detector, using predicted object proposals
python train.py data={scanrefer/nr3d/multi3drefer} experiment_name={any_string} +detector_path=checkpoints/PointGroup_ScanNet.ckpt

# train a model with the pretrained detector, using GT object proposals
python train.py data={scanrefer/nr3d/multi3drefer} experiment_name={any_string} +detector_path=checkpoints/PointGroup_ScanNet.ckpt model.network.detector.use_gt_proposal=True

# train a model from a checkpoint, it restores all hyperparameters in the .ckpt file
python train.py data={scanrefer/nr3d/multi3drefer} experiment_name={checkpoint_experiment_name} ckpt_path={ckpt_file_path}

# test a model from a checkpoint and save its predictions
python test.py data={scanrefer/nr3d/multi3drefer} data.inference.split={train/val/test} ckpt_path={ckpt_file_path} pred_path={predictions_path}

# evaluate predictions
python evaluate.py data={scanrefer/nr3d/multi3drefer} pred_path={predictions_path} data.evaluation.split={train/val/test}

Checkpoints

ScanRefer dataset

M3DRef-CLIP_ScanRefer.ckpt

Performance:

Split	IoU	Unique	Multiple	Overall
Val	0.25	85.3	43.8	51.9
Val	0.5	77.2	36.8	44.7
Test	0.25	79.8	46.9	54.3
Test	0.5	70.9	38.1	45.5

Nr3D dataset

M3DRef-CLIP_Nr3d.ckpt

Performance:

Split	Easy	Hard	View-dep	View-indep	Overall
Test	55.6	43.4	42.3	52.9	49.4

Multi3DRefer dataset

M3DRef-CLIP_Multi3DRefer.ckpt

Performance:

Split	IoU	ZT w/ D	ZT w/o D	ST w/ D	ST w/o D	MT	Overall
Val	0.25	39.4	81.8	34.6	53.5	43.6	42.8
Val	0.5	39.4	81.8	30.6	47.8	37.9	38.4

Benchmark

ScanRefer

Convert M3DRef-CLIP predictions to ScanRefer benchmark format:

python dataset/scanrefer/convert_output_to_benchmark_format.py data=scanrefer pred_path={predictions_path} +output_path={output_file_path}

Nr3D

Please refer to ReferIt3D benchmark to report results.

m3dref-clip's People

Contributors

Stargazers

Watchers

Forkers

xiaolong-rrl hiyyg whuhxb

m3dref-clip's Issues

Reproducing Nr3D results in Table 6.

Hi,

I am trying to train from scratch to reproduce the 49.4 overall accuracy on Nr3D, as is reported in Table 6. However, I could only get around 46.5 under the settings of provided config (changing data to nr3d and also set the use_gt_proposal flag). Could you provide more details on the training settings to reproduce your result on Nr3D? Thanks!

RuntimeError: CUDA error: an illegal memory access was encountered

Encountered this error，everything is freshly clone from this repo

“RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. ”

RuntimeError when setting "model.network.detector.use_gt_proposal=True"

Dear author:

Thanks for your interesting work.

When I run the following command:

# train a model with the pretrained detector, using GT object proposals
python train.py data={scanrefer/nr3d/multi3drefer} experiment_name={any_string} +detector_path=checkpoints/PointGroup_ScanNet.ckpt model.network.detector.use_gt_proposal=True

an error has occurred:

RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.

It seems like the model return some variables that do not calculate loss, and I wander how to solve it?

Best!
Xiaolong

debugger crashes while normal outside the vscode

Hi! Thanks for the work!

I got error constantly when i using debugger to run the project as :
"Error executing job with overrides: ['data=scanrefer', 'experiment_name=test', 'data.inference.split=test', 'ckpt_path=/path/to/my/M3DRef-CLIP_ScanRefer.ckpt']"

or other train/eval/test commands, but things works when i don't use the debugger. Do you have any idea which part caused this?

Could not append to config. An item is already at 'ckpt_path'. Either remove + prefix: 'ckpt_path=M3DRef-CLIP_ScanRefer.ckpt' Or add a second + to add or override 'ckpt_path': '++ckpt_path=M3DRef-CLIP_ScanRefer.ckpt'

Hello, I followed the readme and did all of it, when I ran the command:
python test.py data=scanrefer data.inference.split=val +ckpt_path=M3DRef-CLIP_ScanRefer.ckpt pred_path=output
The error is:
Could not append to config. An item is already at 'ckpt_path'. Either remove + prefix: 'ckpt_path=M3DRef-CLIP_ScanRefer.ckpt' Or add a second + to add or override 'ckpt_path': '++ckpt_path=M3DRef-CLIP_ScanRefer.ckpt'

How to test on Scanrefer benchmark?

This is my first time trying to test on the scanrefer benchmark, but I encountered some difficulties. When I ran the test command according to the instructions in the readme, some errors occurred in the Dataloader (these errors were caused by the fact that the test data set did not have instance ids, sem labels, etc.), and the test data set did not seem to be adapted in test_epoch.

How do you conduct benchmark testing? Do I need to write additional code?

Hope to get your reply! Thanks!

Memory cost went too high

I have added number of views, and the computation cost went bigger than 24GB as i'm using a single 3090.

Is there any way I can reduce the computation cost while the performance won't down drasticly?

Many thanks.

Proposal filtering for PointGroup

Hi,

I noticed that you chose not to filter the PointGroup proposals by nms or proposal scores. Have you tried that before? I wonder about the effects of proposal filtering, such as the final performance going down, but early-stage performance being high.

Best,
Tony

The training speed becomes so slow after a few epochs

Great work, and the code is clearly written!
However, when I was training with the default configuration on a single NVIDIA 3090 gpu, I noticed something strange.

When using only 3d features, the training and reasoning speed are relatively fast, but after 20 hours (the 24th epoch) it becomes very slow (tens of times slower), and the gpu occupancy, power consumption, etc., are significantly reduced.
When using 2d+3d features, validation of an epoch takes more than 5 hours (I'm not sure if this is normal), and training becomes particularly slow after the first validation epoch (again tens of times slower), which confuses me.

Have you ever encountered these problems? Looking forward to your reply very much, thanks!

Visualization Script

Hi,

Thanks for the nice work! I am wondering whether you could provide the visualization scripts for your model. For example, the script to generate the sub-figures in your Figure 6, Figure 10 and Figure 12. Thanks!

Questions about the predictions on ScanRefer with the given ckpt

Dear author:

Thanks for your interesting work.

I have completed the entire process of training and inferencing following the README.md, but when I run the follow command with the given ckpt:

# get the predictions
python test.py data=scanrefer data.inference.split=val +ckpt_path={M3DRef-CLIP_ScanRefer.ckpt} pred_path={predictions_path}

# evaluate predictions
python evaluate.py data=scanrefer pred_path={M3DRef-CLIP_ScanRefer.ckpt} pred_path={predictions_path} data.evaluation.split=val

I get unsatisfactory performance, far lower than your results in readme.md:

===========================================
IoU         unique      multiple    overall     
-------------------------------------------
0.25        45.3        28.6        31.8        
0.50        33.1        21.9        24.1        
===========================================

I wander if it's correct? And how to handle it to achieve the same results as the one in readme.md?

Thanks!!

M3DRef-CLIP on Scanrefer Test Benchmark

Hi,

Thank you for your awesome work!

For scanrefer benchmark submission, do you train on a combination of train+val split or only the train split? I am asking because in scannet benchmarks, training on train+val is a common practice, but not sure what people do in referential grounding benchmarks. Also are there any other details/tricks for test set submission, or simply exporting the data with the provided checkpoint trained on train set?

Thank you!

Access the Multi3DRefer test set

Hi,

Thanks for the nice work!
I am wondering whether you would hold some online benchmark for us to evaluate on the test set?

CLIP text model output. How/Why two outputs word_features & sentence features?

I was wondering why you are expecting two outputs when calling word_features, sentence_features = self.clip_model.encode_text(clip_tokens) here

As far as I understand, you are using a vanilla clip model which outputs only one embedding for clip_model.encode_text().
Evidently this cant be the case since you are expecting two different embeddings. So where did you implement the custom functionality to get two embeddings from encode_text()?

3dlg-hcvc / m3dref-clip Goto Github PK

m3dref-clip's Introduction

M3DRef-CLIP

Requirement

Setup

Conda (recommended)

Pip

Data Preparation

ScanNet v2 dataset

ScanRefer dataset

Nr3D dataset

Multi3DRefer dataset

Pre-trained detector

Training, Inference and Evaluation

Checkpoints

ScanRefer dataset

Nr3D dataset

Multi3DRefer dataset

Benchmark

ScanRefer

Nr3D

m3dref-clip's People

Contributors

Stargazers

Watchers

Forkers

m3dref-clip's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs