GithubHelp home page GithubHelp logo

jspenmar / slowtv_monodepth Goto Github PK

View Code? Open in Web Editor NEW
81.0 3.0 5.0 19.08 MB

Official repository for the ICCV2023 paper "Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV"

Home Page: https://arxiv.org/abs/2307.10713

License: Other

Python 98.95% Shell 0.87% Dockerfile 0.12% Makefile 0.07%
datasets deep-learning iccv2023 monocular-depth-estimation self-supervised-learning

slowtv_monodepth's Introduction

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV


This repository contains the code associated with the following publications:

Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV

Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden

ArXiv (ArXiv 2024)

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV

Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden

ArXiv (ICCV 2023)

Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter

Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden

ArXiv (TMLR 2022)

We have organized several monocular depth prediction challenges around the proposed SYNS-Patches dataset. Check the MDEC website for details on previous editions!

image_0026 image_0254 image_0698 depth_0026 depth_0254 depth_0698

image_0026 image_0254 image_0698 depth_0026 depth_0254 depth_0698


Project Structure

  • .git-hooks: Dir containing a pre-commit hook for ignoring Jupyter Notebook outputs.
  • api: Dir containing main scripts for training, evaluating and data preparation.
  • assets Dir containing images used in README.
  • cfg Dir containing config files for training/evaluating.
  • docker Dir containing Dockerfile and Anaconda package requirements.
  • data*: (Optional) Dir containing datasets.
  • hpc: (Optional) Dir containing submission files to HPC clusters.
  • models*: (Optional) Dir containing trained model checkpoints.
  • results*: Dir containing the precomputed results used in the paper.
  • src: Dir containing source code.
  • .gitignore: File containing patterns ignored by Git.
  • PATHS.yaml*: File containing additional data & model roots.
  • README.md: This file!

* Not tracked by Git!


Pretrained Checkpoints

You can download the pretrained full models from the following DropBox link:

We also provide a minium-requirements script to load a pretrained model and compute predictions on a directory of images. This is probably what you want if you just want to try out the model, as opposed to training it yourself. Code illustrating how to align the predictions to a ground-truth depth map can be found here.

The only requirements for running the model are: timm, torch and numpy.


MapFreeReloc

You can download the val/test MapFreeReloc predictions for our public models from:

These can be used in your own MapFreeReloc submission to replace the baseline DPT+KITTI. Please remember to cite us if doing so!


Getting Started

Each section of the code has its own README file with more detailed instructions. Follow them only after having carried out the remaining steps in this section.

PYTHONPATH

Remember to add the path to the repo to the PYTHONPATH in order to run the code.

# Example for `bash`. Can be added to `~/.bashrc`.
export PYTHONPATH=/path/to/slowtv_monodepth:$PYTHONPATH

Git Hooks

First, set up a GitHub pre-commit hook that stops us from committing Jupyter Notebooks with outputs, since they may potentially contain large images.

./.git-hooks/setup.sh
chmod +x .git/hooks/pre-commit  # File sometimes isn't copied as executable. This should fix it. 

Anaconda

If using Miniconda, create the environment and run commands as

ENV_NAME=slowtv
conda env create --file docker/environment.yml
conda activate $ENV_NAME
python api/train/train.py ...

Docker

To instead build the Docker image, run

docker build -t $ENV_NAME ./docker
docker run -it \
    --shm-size=24gb \
    --gpus all \
    -v $(pwd -P):$(pwd -P) \
    -v /path/to/dataroot1:/path/to/dataroot1 \
    --user $(id -u):$(id -g) \
    $ENV_NAME:latest \
    /bin/bash

python api/train/train.py ...

Paths

The default locations for datasets and model checkpoints are ./data & ./models, respectively. If you want to store them somewhere else, you can either create symlinks to them, or add additional roots. This is done by creating the ./PATHS.yaml file with the following contents:

# -----------------------------------------------------------------------------
MODEL_ROOTS: 
  - /path/to/modelroot1

DATA_ROOTS:
  - /path/to/dataroot1
  - /path/to/dataroot2
  - /path/to/dataroot3
# -----------------------------------------------------------------------------

NOTE: This file should not be tracked by Git, as it may contain sensitve information about your machine.

Multiple roots may be useful if training in an HPC cluster where data has to be copied locally. Roots should be listed in order of preference, i.e. dataroot1/kitti_raw_syns will be given preference over dataroot2/kitti_raw_syns.

Results

We provide the YAML files containing the precomputed results used in the paper. These should be copied over to the ./models directory (or any desired root) in order to follow the structure required by the evaluation and table-generating scripts.

cp -r ./results/* ./models

Citation

If you used the code in this repository or found the papers interesting, please cite them as

@inproceedings{spencer2024cribstv,
title={Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
booktitle={ArXiv Preprint},
year={2024}
}
@inproceedings{spencer2023slowtv,
title={Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023}
}
@article{spencer2022deconstructing,
title={Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2022},
url={https://openreview.net/forum?id=GFK1FheE7F},
note={Reproducibility Certification}
}

References

We would also like to thank the authors of the papers below for their contributions and for releasing their code. Please consider citing them in your own work.

Tag Title Author Conf ArXiv GitHub
Garg Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue Garg et. al ECCV 2016 ArXiv GitHub
Monodepth Unsupervised Monocular Depth Estimation with Left-Right Consistency Godard et. al CVPR 2017 ArXiv GitHub
Kuznietsov Semi-Supervised Deep Learning for Monocular Depth Map Prediction Kuznietsov et. al CVPR 2017 ArXiv GitHub
SfM-Learner Unsupervised Learning of Depth and Ego-Motion from Video Zhou et. al CVPR 2017 ArXiv GitHub
Depth-VO-Feat Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction Zhan et. al CVPR 2018 ArXiv GitHub
DVSO Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry Yang et. al ECCV 2018 ArXiv
Klodt Supervising the new with the old: learning SFM from SFM Klodt & Vedaldi ECCV 2018 CVF
MonoResMatch Learning monocular depth estimation infusing traditional stereo knowledge Tosi et. al CVPR 2019 ArXiv GitHub
DepthHints Self-Supervised Monocular Depth Hints Watson et. al ICCV 2019 ArXiv GitHub
Monodepth2 Digging Into Self-Supervised Monocular Depth Estimation Godard et. al ICCV 2019 ArXiv GitHub
SuperDepth SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation Pillai et. al ICRA 2019 ArXiv GitHub
Johnston Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume Johnston & Carneiro CVPR 2020 ArXiv
FeatDepth Feature-metric Loss for Self-supervised Learning of Depth and Egomotion Shu et. al ECCV 2020 ArXiv GitHub
CADepth Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation Yan et. al 3DV 2021 ArXiv GitHub
DiffNet Self-Supervised Monocular Depth Estimation with Internal Feature Fusion Zhou et. al BMVC 2021 ArXiv GitHub
HR-Depth HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation Lyu et. al AAAI 2021 ArXiv GitHub
MiDaS Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer Ranftl el. al PAMI 2020 ArXiv GitHub
DPT Vision Transformers for Dense Prediction Ranftl el. al ICCV 2021 ArXiv GitHub
NeWCRFs NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation Weihao el. al CVPR 2022 ArXiv GitHub

Licence

This project is licenced under the Commons Clause and GNU GPL licenses. For commercial use, please contact the authors.


slowtv_monodepth's People

Contributors

jspenmar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

slowtv_monodepth's Issues

KBR++ requires BEiT download, which doesn't work

Hi, thanks for open sourcing your work. I get the following error when trying to load the checkpoint of KBR++:

urllib.error.HTTPError: HTTP Error 409: Public access is not permitted on this storage account.

The issue is that it is trying to download BEiT weights from:

Downloading: "https://conversationhub.blob.core.windows.net/beit-share-public/beit/beit_large_patch16_384_pt22k_ft22kto1k.pth"

However, even copy-pasting the link to the browser gives the following error:

<Error> <Code>PublicAccessNotPermitted</Code> <Message>Public access is not permitted on this storage account. RequestId:43bed8aa-001e-00bf-4b14-7935e1000000 Time:2024-03-18T09:16:30.7260011Z</Message> </Error>

Any workaround?

Typo in get_mask in evaluator.py

Hi Jaime,

Great job done!

I believe I've identified a typo in evaluator.py. It appears that the Eigen crop is applying the NYU mask, and vice versa, the NYU crop is applying the Eigen mask.

    def get_mask(self, target: ty.A) -> ty.A:
        """Helper to mask ground-truth depth based on the selected range and Eigen crop."""
        mask = target > self.min
        if self.max: mask &= target < self.max
        if self.use_eigen_crop: mask &= self._get_nyud_mask(target.shape)
        if self.use_nyud_crop: mask &= self._get_eigen_mask(target.shape)
        return mask

Thank you!

Dockerfile does not build

Hello there,

thank you for this interesting project!
I am not sure what the role of your Dockerfile is in the project.
The requirements are huge and it does not build from nvcr.io/nvidia/nvhpc:23.5-devel-cuda_multi-ubuntu22.04 which I guess is the source of the base image.
After some hours of building, conda will report conflicts among the requirements.

Does the file work for you?

Thank you and kind regards

Generating Gray Scale Images and Normal Maps

I am not an expert of depth estimation, but I would really like to try your code with the many applications in the 3D area.
But these applications use depth estimation in a grayscale format.
I have found this comment in the code for the MiDaS depth estimation

    parser.add_argument('--grayscale',
                        action='store_true',
                        help='Use a grayscale colormap instead of the inferno one. Although the inferno colormap, '
                             'which is used by default, is better for visibility, it does not allow storing 16-bit '
                             'depth values in PNGs but only 8-bit ones due to the precision limitation of this '
                             'colormap.'
                        )

I guess that your code also generates an "inferno color map", does it not?
Could it also feature such a parameter and generate the result in the grayscale format?
And also would it be possible to create a normal map from your result?
Would you consider adding these options?

Export estimate Colmap intrinsics failed

When I use python api/data/preprocess/export_slow_tv.py --n-proc 16 to export the Colmap intrinsics, I encounter an error.
RuntimeError: -> Tried [42, 195, 335, 558, 724] and they all failed!!How should I resolve this?

Simple evaluation on monodepth2's code

Hi Jaime,

Again, thank you for the work!

I am trying to evaluate your KBR weights on Monodepth2's evaluation code with Eigen split and median scaling.

See the code: evaluate_depth_kbr

However, the results I'm obtaining are worse than anticipated. Please see the details below:

Loading weights with prefix 'nets.depth.encoder.':
        Total number of keys: 340
        Number of missing keys: 0
        Number of unexpected keys: 0
Loading weights with prefix 'nets.depth.decoders.disp.':
        Total number of keys: 28
        Number of missing keys: 0
        Number of unexpected keys: 0
-> Computing predictions with size 640x192
-> Evaluating
   Mono evaluation - using median scaling
 Scaling ratios | med: 1.755 | std: 0.170

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 |
&   0.137  &   1.731  &   5.461  &   0.215  &   0.851  &   0.944  &   0.974  \\

-> Done!

Scaling disparities does not significantly influence the outcomes:
pred_disp, _ = disp_to_depth(pred_disp, opt.min_depth, opt.max_depth)

Could I be overlooking something?

Thank you for your assistance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.