GithubHelp home page GithubHelp logo

qjizhi / codd Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookresearch/codd

0.0 0.0 0.0 1.65 MB

CODD ("Temporally Consistent Online Depth Estimation in Dynamic Scenes"), WACV 2023.

License: Other

Shell 0.17% Python 99.83%

codd's Introduction

Temporally Consistent Online Depth Estimation in Dynamic Scenes

This is the official repo for our work Temporally Consistent Online Depth Estimation in Dynamic Scenes accepted at WACV 2023.

If you find CODD relevant, please cite

@inproceedings{li2023temporally,
  title={Temporally consistent online depth estimation in dynamic scenes},
  author={Li, Zhaoshuo and Ye, Wei and Wang, Dilin and Creighton, Francis X and Taylor, Russell H and Venkatesh, Ganesh and Unberath, Mathias},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={3018--3027},
  year={2023}
}

Environment Setup

CODD is based on several excellent open-sourced libraries

Example setup commands (tested on Ubuntu 20.04 and 22.04)

conda create --name codd python=3.8 -y
conda activate codd
pip install scipy pyyaml terminaltables natsort
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 # pytorch
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1121/download.html # pytorch3d
pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html # mmcv
pip install mmsegmentation # mmseg
pip install git+https://github.com/princeton-vl/lietorch.git # lietorch -- this will take a while

Pretrained Weights

Dataset Used

Configuration

For more details and examples, please see configs folder.

Network

In CODD, you can configure your model in a modular manner. The network is often specified in the following way:

model = dict(
    type='ConsistentOnlineDynamicDepth',
    stereo=dict(
        type='HITNetMF',  # enter your choice of stereo network
        ...  # model specific configs
    ),
    motion=dict(
        type="Motion",  # enter your choice of motion network
        ...  # model specific configs
    ),
    fusion=dict(
        type="Fusion",  # enter your choice of fusion network
        ...  # model specific configs
    )
)

If only stereo network is needed, you can simply comment out the motion and fusion network. You can also swap out the individual networks with your own implementation.

Dataset

In each dataset config, there are several things to be specified.

  • data_root: path to stereo data.
    • For FlyingThings3D dataset, the data are downloaded individually. So data_root is the path to RGB images. Additionally specify disp_root, path to disparity data; flow_root, path to optical flow data; and disp_change_root, path to disparity change data.
  • train_split, val_split, test_split: path to split files. Please see section Others - Split Files below for more details.

The rest of the variables are already set but feel free to adjust if you want to customize.

  • batch_size: training batch size.
  • crop_size: training crop size
  • num_frames: the number of frames to run. For training, CODD uses 2 frames. For inference, CODD runs on the entire sequence num_frames=-1
  • calib: focal length * baseline
  • disp_range: range of disparity
  • intrinsics: fx, fy, cx, cy

Train/Inference

The training config is of the following format

_base_ = [
    'PATH_TO_MODEL_CONFIG', 'PATH_TO_DATA_CONFIG',
    'default_runtime.py', 'PATH_TO_SCHEDULE_CONFIG'
]

Modify configs/train_config.py for desirable model and dataset config

The inference config is of the following format

_base_ = [
    'PATH_TO_MODEL_CONFIG', 'PATH_TO_DATA_CONFIG',
    'default_runtime.py'
]

Modify configs/inference.py for desirable model and dataset config

Training

  • CODD uses a three stage training strategy on FlyingThings3D
    • Training stereo
    • Training motion
    • Training fusion
  • The pretrained model is then fine-tuned on other datasets.
  • Modify configs/train_config.py for desirable model and dataset config
  • Run following command
    • Distributed
      ./scripts/train.sh configs/train_config.py NUM_GPUS --work-dir PATH_TO_LOG
      
    • Single GPU
      python train.py configs/train_config.py NUM_GPUS --work-dir PATH_TO_LOG
      

Inference

There are two inference modes

  • Evaluate --eval: compute metrics and save results
  • Show --show: save disparity estimates
    • when running with custom_data, provide path to left and right images using --img-dir and --r-img-dir

To run inference

  • Modify configs/inference_config.py for model and dataset config
  • Run following command
    • Distributed
      ./scripts/inference.sh configs/inference_config.py CHECKPOINT_PATH NUM_GPUS [optional arguments]
      
    • Single GPU
      python inference.py configs/inference_config.py CHECKPOINT_PATH NUM_GPUS [optional arguments]
      

Optional arguments:

  • --work-dir: logging directory
  • --num-frames: number of frames to inference on, -1 for all frames

Others

Split Files

The split file is stored in the following format

LEFT_IMAGE RIGHT_IMAGE DISPARITY_IMAGE OPTICAL_FLOW DISPARITY_CHANGE OPTICAL_FLOW_OCCLUSION DISPARITY_FRAME2_in_FRAME1 DISPARITY_OCCLUSION

The split files can be generated by using utils/generate_split_files.py.

  • For datasets (TartanAir and Sintel) without ground truth disparity change, I use OPTICAL_FLOW to warp the next frame disparity into current frame and compute the change myself. However, not all regions are valid due to flow occlusion. Therefore, for such computation, OPTICAL_FLOW_OCCLUSION must be provided.
  • For datasets (KITTI Depth) without ground truth optical flow, I used RAFT to estimate the optical flow information. The disparity of the next frame is stored as DISPARITY_FRAME2_in_FRAME1 following KITTI convention.
  • To generate disparity from the ground truth lidar point cloud, please refer to pykitti.
  • When a specific type of data is not provided, None is used to skip reading. Please see datasets/custom_stereo_mf.py for more details of how data is parsed.

Visualize Point Cloud

To visualize the 3D point cloud generated from depth map, the script utils/vis_point_cloud.py can be used.

Benchmark Speed

To benchmark speed, run the following command

python benchmark.py configs/models/codd.py

Disclaimer

The majority of CODD is licensed under CC-BY-NC, however portions of the project are available under separate license terms: https://github.com/princeton-vl/RAFT-3D is licensed under the BSD-3-Clause license.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.