GithubHelp home page GithubHelp logo

rajeev921 / depth-anything Goto Github PK

View Code? Open in Web Editor NEW

This project forked from liheyoung/depth-anything

0.0 0.0 0.0 107.63 MB

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Home Page: https://depth-anything.github.io

License: Apache License 2.0

Shell 0.05% Python 99.95%

depth-anything's Introduction

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang1 · Bingyi Kang2+ · Zilong Huang2 · Xiaogang Xu3,4 · Jiashi Feng2 · Hengshuang Zhao1+

1The University of Hong Kong · 2TikTok · 3Zhejiang Lab · 4Zhejiang University

+corresponding authors

Paper PDF Project Page

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images.

teaser

News

  • 2024-01-22: Paper, project page, code, models, and demo are released.

Features of Depth Anything

  • Relative depth estimation:

    Our foundation models listed here can provide relative depth estimation for any given image robustly. Please refer here for details.

  • Metric depth estimation

    We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation. Please refer here for details.

  • Better depth-conditioned ControlNet

    We re-train a better depth-conditioned ControlNet based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet. Please refer here for details.

  • Downstream high-level scene understanding

    The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, e.g., semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K. Please refer here for details.

Performance

Here we compare our Depth Anything with the previously best MiDaS v3.1 BEiTL-512 model.

Please note that the latest MiDaS is also trained on KITTI and NYUv2, while we do not.

Method Params KITTI NYUv2 Sintel DDAD ETH3D DIODE
AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$
MiDaS 345.0M 0.127 0.850 0.048 0.980 0.587 0.699 0.251 0.766 0.139 0.867 0.075 0.942
Ours-S 24.8M 0.080 0.936 0.053 0.972 0.464 0.739 0.247 0.768 0.127 0.885 0.076 0.939
Ours-B 97.5M 0.080 0.939 0.046 0.979 0.432 0.756 0.232 0.786 0.126 0.884 0.069 0.946
Ours-L 335.3M 0.076 0.947 0.043 0.981 0.458 0.760 0.230 0.789 0.127 0.882 0.066 0.952

We highlight the best and second best results in bold and italic respectively (better results: AbsRel $\downarrow$ , $\delta_1 \uparrow$).

Pre-trained models

We provide three models of varying scales for robust relative depth estimation:

Model Params Inference Time on V100 (ms) A100 RTX4090 (TensorRT)
Depth-Anything-Small 24.8M 12 8 3
Depth-Anything-Base 97.5M 13 9 6
Depth-Anything-Large 335.3M 20 13 12

Note that the V100 and A100 inference time (without TensorRT) is computed by excluding the pre-processing and post-processing stages, whereas the last column RTX4090 (with TensorRT) is computed by including these two stages. See here for details.

You can easily load our pre-trained models by:

from depth_anything.dpt import DepthAnything

encoder = 'vits' # can also be 'vitb' or 'vitl'
depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{:}14'.format(encoder))

No network connection, cannot load these models?

Click here for solutions
# suppose the config and checkpoint files are stored under the folder checkpoints/depth_anything_vitb14
depth_anything = DepthAnything.from_pretrained('checkpoints/depth_anything_vitb14', local_files_only=True)

Usage

Installation

git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt

Running

python run.py --encoder <vits | vitb | vitl> --img-path <img-directory | single-img | txt-file> --outdir <outdir>

For the img-path, you can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.

For example:

python run.py --encoder vitl --img-path assets/examples --outdir depth_visualization

Gradio demo

To use our gradio demo locally:

python app.py

You can also try our online demo.

Import Depth Anything to your project

If you want to use Depth Anything in your own project, you can simply follow run.py to load our models and define data pre-processing.

Code snippet (note the difference between our data pre-processing and that of MiDaS)
from depth_anything.dpt import DepthAnything
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet

import cv2
import torch

encoder = 'vits' # can also be 'vitb' or 'vitl'
depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{:}14'.format(encoder))

transform = Compose([
    Resize(
        width=518,
        height=518,
        resize_target=False,
        keep_aspect_ratio=True,
        ensure_multiple_of=14,
        resize_method='lower_bound',
        image_interpolation_method=cv2.INTER_CUBIC,
    ),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet(),
])

image = cv2.cvtColor(cv2.imread('your image path'), cv2.COLOR_BGR2RGB) / 255.0
image = transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0)

# depth shape: 1xHxW
depth = depth_anything(image)

Community Support

We sincerely appreciate all the extentions built on our Depth Anything from the community. Thank you a lot!

Here we list the extensions we have found:

If you have your amazing projects supporting or improving (e.g., speed) Depth Anything, please feel free to drop an issue. We will add them here.

Citation

If you find this project useful, please consider citing:

@article{depthanything,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
      journal={arXiv:2401.10891},
      year={2024}
}

depth-anything's People

Contributors

liheyoung avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.