GithubHelp home page GithubHelp logo

monoformer's Introduction

MonoFormer

This repo is reference PyTorch implementation for training and testing depth estimation models using the method described in

Deep Digging into the Generalization of Self-supervised Monocular Depth Estimation
Jinwoo Bae, Sungho Moon and Sunghoon Im
AAAI 2023 (arxiv)

Our code is based on PackNet-SfM of TRI.
If you find our work useful in your research, please consider citing our papers :

@inproceedings{bae2022monoformer,
  title={Deep Digging into the Generalization of Self-supervised Monocular Depth Estimation},
  author={Bae, Jinwoo and Moon, Sungho and Im, Sunghoon},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2023}
}

Setup

conda create -n monoformer python=3.7
git clone https://github.com/sjg02122/MonoFormer.git
pip install -r requirements.txt

We ran our experiments with PyTorch 1.10.0+cu113, Python 3.7, A6000 GPU and Ubuntu 20.04.

Pretrained model weights

We experiment extensively on modern backbone architectures (e.g., ConvNeXt, RegionViT). MF means MonoFormer.

Model Abs Rel Sq Rel RMSE a1
MF-hybrid 0.104 0.846 4.580 0.891
MF-ViT 0.118 0.942 4.840 0.873
MF-Twins 0.125 1.309 4.973 0.866
MF-RegionViT 0.113 0.893 4.756 0.875
MF-ConvNeXt 0.111 0.760 4.533 0.878
MF-SLaK 0.117 0.866 4.811 0.878

Datasets

You configure your datasets in config.py or other config yaml files. (DATA_PATH means your data root path.). In our experiments, we only use the KITTI datasets for training. Other datasets (e.g., ETH3D, DeMoN, and etc.) is used for testing.

KITTI (for Training)

The KITTI (raw) datasets can be downloaded from the KITTI website. If you want to download the datasets using command, please use the command of PackNet-SfM.

You can download the texture-shift datasets (Water, Pencil and Style-transfered)

Other datasets (for Evaluation)

In our experiments, we use the ETH3D, DeMoN (e.g., MVS, SUN3D, RGBD, Scenes11) and our generated texture-shifted datasets.
It will be updated soon.

Inference

You can directly run inference on a single image or folder:

python3 scripts/infer.py --checkpoint <checkpoint.ckpt> --input <image or folder> --output <image or folder> [--image_shape <input shape (h,w)>]

You can also evaluate the model using:

python3 scripts/eval.py --checkpoint <checkpoint.ckpt> [--config <config.yaml>]

Training

Our training is similar to PackNet-SfM. Any training, including fine-tuning, can be done by passing either a .yaml config file or a .ckpt model checkpoint to scripts/train.py:

python3 scripts/train.py <config.yaml or checkpoint.ckpt>

monoformer's People

Contributors

sjg02122 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.