GithubHelp home page GithubHelp logo

baiiiiiiiiii / combo-avs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yannqi/combo-avs

0.0 0.0 0.0 2.42 MB

[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Home Page: https://yannqi.github.io/AVS-COMBO/

License: Apache License 2.0

Shell 1.06% C++ 1.55% Python 83.41% Cuda 13.98%

combo-avs's Introduction

COMBO-AVS

Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and Shiming Xiang

This repository provides the PyTorch implementation for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024 (Highlight).

🔥What's New

  • (2024. 4.06) Our paper(COMBO) is marked as Highlight Paper! 😮
  • (2024. 3.19) Our checkpoints are available to the public, looking from YannQi/COMBO-AVS-checkpoints · Hugging Face!
  • (2024. 3.14) Our code is available to the public in $\pi$ day!
  • (2024. 3.12) Our code is ready to share for the public 🌲🌲🌲!
  • (2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024!
  • (2023.11.17) We completed the implemention of COMBO and push the code.

🪵 TODO List

📚Method

Overview of the proposed COMBO.

🛠️ Getting Started

1. Environments

  • Linux or macOS with Python ≥ 3.6
# recommended
pip install -r requirements.txt
pip install soundfile
# build MSDeformAttention
cd model/modeling/pixel_decoder/ops
sh make.sh
  • Preprocessing for detectron2

    For using Siam-Encoder Module (SEM), we refine 1-line code of the detectron2.

    The refined file that requires attention is located at:

    conda_envs/xxx/lib/python3.xx/site-packages/detectron2/checkpoint/c2_model_loading.py (refine the xxx to your own environment)

    Commenting out the following code in L287 will allow the code to run without errors:

# raise ValueError("Cannot match one checkpoint key to multiple keys in the model.")  
  • Install Semantic-SAM (Optional)
# Semantic-SAM
pip install git+https://github.com/cocodataset/panopticapi.git
git clone https://github.com/UX-Decoder/Semantic-SAM
cd Semantic-SAM
python -m pip install -r requirements.txt

Find out more at Semantic-SAM

2. Datasets

Please refer to the link AVSBenchmark to download the datasets. You can put the data under data folder or rename your own folder. Remember to modify the path in config files. The data directory is as bellow:

|--AVS_dataset
   |--AVSBench_semantic/
   |--AVSBench_object/Multi-sources/
   |--AVSBench_object/Single-source/

3. Download Pre-Trained Models

|--pretrained
   |--detectron2/R-50.pkl
   |--detectron2/d2_pvt_v2_b5.pkl
   |--vggish-10086976.pth
   |--vggish_pca_params-970ea276.pth

4. Maskiges pregeneration

  • Generate class-agnostic masks (Optional)
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val 
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test
  • Generate Maskiges (Optional)
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test
|--AVS_dataset
    |--AVSBench_semantic/pre_SAM_mask/
    |--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
    |--AVSBench_object/Single-source/s4_data/pre_SAM_mask/

5. Train

# ResNet-50
sh scripts/res_train_avs4.sh # or ms3, avss
# PVTv2
sh scripts/pvt_train_avs4.sh # or ms3, avss

6. Test

# ResNet-50
sh scripts/res_test_avs4.sh # or ms3, avss
# PVTv2
sh scripts/pvt_test_avs4.sh # or ms3, avss

7. Results and Download Links

We provide the checkpoints of the S4 Subset at YannQi/COMBO-AVS-checkpoints · Hugging Face.

Method Backbone Subset Config mIoU F-score
COMBO-R50 ResNet-50 S4 config 81.7 90.1
COMBO-PVTv2 PVTv2-B5 S4 config 84.7 91.9
COMBO-R50 ResNet-50 MS3 config 54.5 66.6
COMBO-PVTv2 PVTv2-B5 MS3 config 59.2 71.2
COMBO-R50 ResNet-50 AVSS config 33.3 37.3
COMBO-PVTv2 PVTv2-B5 AVSS config 42.1 46.1

🤝 Citing COMBO

@misc{yang2023cooperation,
      title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation},
      author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang},
      year={2023},
      eprint={2312.06462},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

combo-avs's People

Contributors

yannqi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.