CHORUS [ICCV 2023, Oral]

CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images,
Sookwan Han, Hanbyul Joo
International Conference on Computer Vision (ICCV), 2023

News

Sep 2023

Initial code release, including the codes for training and evaluation.

Installation

To set up the necessary environments for running CHORUS, please refer to the instructions provided here.

Demo

Demo colab notebook is coming soon! (eta October 2023)

Training

NOTE: Current version only supports COCO & LVIS categories.

Dataset Generation

CHORUS is trained on a generated dataset of human-object interaction images. Here, we provide an example for running the entire dataset generation pipeline for the surfboard category. To create this dataset of images, please activate the appropriate environment beforehand using the following command:

conda activate chorus_gen

Prompt Generation

CHORUS initially produces multiple HOI prompts for the given category surfboard using ChatGPT. You can find example prompts for the surfboard category under prompts/demo directory. If you wish to generate prompts for other categories or create your own, follow the steps outlined below.

The OpenAI API relies on API keys for authentication. To generate prompts on your own, it is essential to have access to your own API key. If you don't have one already, please refer to this link.
After successfully configuring your API keys, execute the following command:
```
python scripts/generation/generate_prompts.py --categories 'surfboard'
```
to generate plausible HOI prompts for the specified surfboard category. By default, the results will be saved under the prompts/demo directory.

Please note that the OpenAI API does not support random seeding, as mentioned here; hence, the results of prompt generation are not reproducible. To address this, we also provide the prompts used in our paper under the prompts/chorus directory, as they may not be reproducible.

Image Generation

To generate the dataset of images from HOI prompts for the surfboard category, execute the following command:

sh scripts/demo_surfboard_gen.sh $NUM_BATCH_PER_AUGPROMPT $BATCH_SIZE   # Default: 20, 6

Please note the following details:

The generation process typically requires around 9~10 hours when running on a single RTX 3090 GPU.
If you wish to reduce the generation time (i.e., number of generated images), you can consider modifying the $NUM_BATCH_PER_AUGPROMPT argument (default: 20).
In case you encounter a CUDA Out Of Memory (OOM) error, you can alleviate this issue by reducing the batch size. Adjust the $BATCH_SIZE argument (default: 6) accordingly.
To resume the generation process, simply rerun the command. The program will automatically skip existing samples during the process.
The generated images will be saved under the results/images directory by default.

Aggregation (Learning)

Once the dataset generation is complete, CHORUS aggregates the information from images for 3D HOI reasoning. To execute the aggregation pipeline, please activate the appropriate environment beforehand using the following command:

conda activate chorus_aggr

With the generated dataset in place, you can execute the complete aggregation pipeline for the surfboard category by running the following command:

sh scripts/demo_surfboard_aggr.sh

Please note the following details:

To resume the aggregation process, simply rerun the command. The program will automatically skip existing samples during the process.
After running the command successfully, you can check the visualizations under the results_demo directory!

Evaluation

Preparing the Test Dataset

For quantitative evaluation, we utilize the extended COCO-EFT dataset as our test dataset. To set up the test dataset, please follow the steps below.

Download the COCO dataset and the COCO-EFT dataset by running the following command:
```
sh scripts/download_coco_eft.sh
```
By default, the COCO dataset will be downloaded to the imports/COCO directory, and the COCO-EFT dataset will be downloaded to the imports/eft directory.
After downloading the datasets, preprocess and extend the dataset by executing the following command:
```
python scripts/evaluation/extend_eft.py
```
This script will prepare the dataset for evaluation.

Reproduce Results

To replicate the results in the paper, you have two options:

Option 1: Download Pretrained Results

You can download the pretrained results using the following script:

sh scripts/download_pretrained_chorus.sh

This option is recommended if you have limited storage space or want to quickly access the pretrained results.

Option 2: Full Reproduction

Note: This process requires at least 5TB of storage to save the generated results.

To fully reproduce the results, follow the steps below.

Activate the chorus_gen environment:
```
conda activate chorus_gen
```
Generate the dataset for all categories used in quantitative evaluation by running:
```
sh scripts/run_quant_gen.sh
```
Please note that this step may require significant storage space.
Activate the chorus_aggr environment:
```
conda activate chorus_aggr
```
Aggregate the information from images by running:
```
sh scripts/run_quant_aggr.sh    
```
Please note that this step may require significant storage space.

Evaluate PAP (Projective Average Precision)

We perform quantitative evaluations for COCO categories using the proposed Projective Average Precision (PAP) metrics. To compute PAP for the reproduced results, run the following command:

python scripts/evaluation/evaluate_pap.py --aggr_setting_names 'quant:full'

This command will calculate PAP for each category based on the reproduced results. To report the mean PAP (mPAP) averaged over all categories, execute the following command:

python scripts/evaluation/report_pap.py

Citation

If you find our work helpful or use our code, please consider citing:

@inproceedings{han2023chorus,
  title = {Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images},
  author = {Han, Sookwan and Joo, Hanbyul},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}

Acknowledgements

Our codebase builds heavily on
Thanks for open-sourcing!
We thank Byungjun Kim for valuable insights & comments!

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. However, please note that our code depends on other libraries (e.g., SMPL), which each have their own respective licenses that must also be followed.

jellyheadandrew / chorus Goto Github PK

chorus's Introduction

CHORUS [ICCV 2023, Oral]

News

Sep 2023

Installation

Demo

Training

Dataset Generation

Prompt Generation

Image Generation

Aggregation (Learning)

Evaluation

Preparing the Test Dataset

Reproduce Results

Option 1: Download Pretrained Results

Option 2: Full Reproduction

Evaluate PAP (Projective Average Precision)

Citation

Acknowledgements

License

chorus's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org

Jobs