[CVPR2024] The official implementation of "MoCha-Stereo: Motif Channel Attention Network for Stereo Matching".
Demo.mp4
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen†, Wei Long†, He Yao†, Yongjun Zhang✱,Bingshu Wang, Yongbin Qin, Jia Wu
CVPR 2024
Correspondence: [email protected]; [email protected]✱
@inproceedings{chen2024mocha,
title={MoCha-Stereo: Motif Channel Attention Network for Stereo Matching},
author={Chen, Ziyang and Long, Wei and Yao, He and Zhang, Yongjun and Wang, Bingshu and Qin, Yongbin and Wu, Jia},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={27768--27777},
year={2024}
}
Python = 3.8
CUDA = 11.3
conda create -n mocha python=3.8
conda activate mocha
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
The following libraries are also required
tqdm
tensorboard
opt_einsum
einops
scipy
imageio
opencv-python-headless
scikit-image
timm
six
To evaluate/train RAFT-stereo, you will need to download the required datasets.
- Sceneflow (Includes FlyingThings3D, Driving, Monkaa)
- Middlebury
- ETH3D
- KITTI
By default stereo_datasets.py
will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets
folder
├── datasets
├── FlyingThings3D
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Monkaa
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Driving
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── KITTI
├── KITTI_2015
├── testing
├── training
├── KITTI_2012
├── testing
├── training
├── Middlebury
├── MiddEval3
├── ETH3D
├── two_view_training
├── two_view_training_gt
├── two_view_testing
python train_stereo.py --batch_size 8 --mixed_precision
To evaluate a trained model on a validation set (e.g. Middlebury full resolution), run
python evaluate_stereo.py --restore_ckpt models/mocha-stereo.pth --dataset middlebury_F
Weight is available here.
Q1. Weight for "tf_efficientnetv2_l"?
A1: Please refer to issue #6 "关于tf_efficientnetv2_l检查点的问题", #8 "预训练权重", and #9 "code error".
- [CVPR2024] V1 version
- Paper
- Code of MoCha-Stereo
- V2 version
- Preprint manuscript
- Code of MoCha-V2
- This project borrows the code from IGEV, RAFT-Stereo, GwcNet. We thank the original authors for their excellent works!
- Grateful to Prof. Wenting Li, Prof. Huamin Qu, Dr. Junda Cheng, Mr./Mrs. "DLUTTengYH" and anonymous reviewers for their comments on "MoCha-Stereo: Motif Channel Attention Network for Stereo Matching" (V1 version of MoCha-Stereo).
- This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (Project No. [2023]159).
- This project is supported by Natural Science Research Project of Guizhou Provincial Department of Education, China (QianJiaoJi[2022]029, QianJiaoHeKY[2021]022).