GithubHelp home page GithubHelp logo

facebookresearch / omni3d Goto Github PK

View Code? Open in Web Editor NEW
669.0 669.0 65.0 6.95 MB

Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"

License: Other

Python 99.81% Shell 0.19%

omni3d's Introduction

Omni3D & Cube R-CNN

Support Ukraine

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari

[Project Page] [arXiv] [BibTeX]

Zero-shot (+ tracking) on Project Aria data Aria demo video

Predictions on COCO COCO demo

Table of Contents:

  1. Installation
  2. Demo
  3. Omni3D Data
  4. Cube R-CNN Training
  5. Cube R-CNN Inference
  6. Results
  7. License
  8. Citing

Installation

# setup new evironment
conda create -n cubercnn python=3.8
source activate cubercnn

# main dependencies
conda install -c fvcore -c iopath -c conda-forge -c pytorch3d -c pytorch fvcore iopath pytorch3d pytorch=1.8 torchvision=0.9.1 cudatoolkit=10.1

# OpenCV, COCO, detectron2
pip install cython opencv-python
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

# other dependencies
conda install -c conda-forge scipy seaborn

For reference, we used cuda/10.1 and cudnn/v7.6.5.32 for our experiments. We expect that slight variations in versions are also compatible.

Demo

To run the Cube R-CNN demo on a folder of input images using our DLA34 model trained on the full Omni3D dataset,

# Download example COCO images
sh demo/download_demo_COCO_images.sh

# Run an example demo
python demo/demo.py \
--config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \
--input-folder "datasets/coco_examples" \
--threshold 0.25 --display \
MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \
OUTPUT_DIR output/demo 

See demo.py for more details. For example, if you know the camera intrinsics you may input them as arguments with the convention --focal-length <float> and --principal-point <float> <float>. See our MODEL_ZOO.md for more model checkpoints.

Omni3D Data

See DATA.md for instructions on how to download and set up images and annotations of our Omni3D benchmark for training and evaluating Cube R-CNN.

Training Cube R-CNN on Omni3D

We provide config files for trainin Cube R-CNN on

We train on 48 GPUs using submitit which wraps the following training command,

python tools/train_net.py \
  --config-file configs/Base_Omni3D.yaml \
  OUTPUT_DIR output/omni3d_example_run

Note that our provided configs specify hyperparameters tuned for 48 GPUs. You could train on 1 GPU (though with no guarantee of reaching the final performance) as follows,

python tools/train_net.py \
  --config-file configs/Base_Omni3D.yaml --num-gpus 1 \
  SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.0025 \
  SOLVER.MAX_ITER 5568000 SOLVER.STEPS (3340800, 4454400) \
  SOLVER.WARMUP_ITERS 174000 TEST.EVAL_PERIOD 1392000 \
  VIS_PERIOD 111360 OUTPUT_DIR output/omni3d_example_run

Tips for Tuning Hyperparameters

Our Omni3D configs are designed for multi-node training.

We follow a simple scaling rule for adjusting to different system configurations. We find that 16GB GPUs (e.g. V100s) can hold 4 images per batch when training with a DLA34 backbone. If $g$ is the number of GPUs, then the number of images per batch is $b = 4g$. Let's define $r$ to be the ratio between the recommended batch size $b_0$ and the actual batch size $b$, namely $r = b_0 / b$. The values for $b_0$ can be found in the configs. For instance, for the full Omni3D training $b_0 = 196$ as shown here. We scale the following hyperparameters as follows:

  • SOLVER.IMS_PER_BATCH $=b$
  • SOLVER.BASE_LR $/=r$
  • SOLVER.MAX_ITER $*=r$
  • SOLVER.STEPS $*=r$
  • SOLVER.WARMUP_ITERS $*=r$
  • TEST.EVAL_PERIOD $*=r$
  • VIS_PERIOD $*=r$

We tune the number of GPUs $g$ such that SOLVER.MAX_ITER is in a range between about 90 - 120k iterations. We cannot guarantee that all GPU configurations perform the same. We expect noticeable performance differences at extreme ends of resources (e.g. when using 1 GPU).

Inference on Omni3D

To evaluate trained models from Cube R-CNN's MODEL_ZOO.md, run

python tools/train_net.py \
  --eval-only --config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \
  MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \
  OUTPUT_DIR output/evaluation

Our evaluation is similar to COCO evaluation and uses $IoU_{3D}$ (from PyTorch3D) as a metric. We compute the aggregate 3D performance averaged across categories.

To run the evaluation on your own models outside of the Cube R-CNN evaluation loop, we recommending using the Omni3DEvaluationHelper class from our evaluation similar to how it is utilized here.

The evaluator relies on the detectron2 MetadataCatalog for keeping track of category names and contiguous IDs. Hence, it is important to set these variables appropriately.

# (list[str]) the category names in their contiguous order
MetadataCatalog.get('omni3d_model').thing_classes = ... 

# (dict[int: int]) the mapping from Omni3D category IDs to the contiguous order
MetadataCatalog.get('omni3d_model').thing_dataset_id_to_contiguous_id = ...

In summary, the evaluator expects a list of image-level predictions in the format of:

{
    "image_id": <int> the unique image identifier from Omni3D,
    "K": <np.array> 3x3 intrinsics matrix for the image,
    "width": <int> image width,
    "height": <int> image height,
    "instances": [
        {
            "image_id":  <int> the unique image identifier from Omni3D,
            "category_id": <int> the contiguous category prediction IDs, 
                which can be mapped from Omni3D's category ID's using
                MetadataCatalog.get('omni3d_model').thing_dataset_id_to_contiguous_id
            "bbox": [float] 2D box as [x1, y1, x2, y2] used for IoU2D,
            "score": <float> the confidence score for the object,
            "depth": <float> the depth of the center of the object,
            "bbox3D": list[list[float]] 8x3 corner vertices used for IoU3D,
        }
        ...
    ]
}

Results

See RESULTS.md for detailed Cube R-CNN performance and comparison with other methods.

License

Cube R-CNN is released under CC-BY-NC 4.0.

Citing

Please use the following BibTeX entry if you use Omni3D and/or Cube R-CNN in your research or refer to our results.

@inproceedings{brazil2023omni3d,
  author =       {Garrick Brazil and Abhinav Kumar and Julian Straub and Nikhila Ravi and Justin Johnson and Georgia Gkioxari},
  title =        {{Omni3D}: A Large Benchmark and Model for {3D} Object Detection in the Wild},
  booktitle =    {CVPR},
  address =      {Vancouver, Canada},
  month =        {June},
  year =         {2023},
  organization = {IEEE},
}

If you use the Omni3D benchmark, we kindly ask you to additionally cite all datasets. BibTex entries are provided below.

Dataset BibTex
@inproceedings{Geiger2012CVPR,
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {CVPR},
  year = {2012}
}
@inproceedings{caesar2020nuscenes,
  title={nuscenes: A multimodal dataset for autonomous driving},
  author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},
  booktitle={CVPR},
  year={2020}
}
@inproceedings{song2015sun,
  title={Sun rgb-d: A rgb-d scene understanding benchmark suite},
  author={Song, Shuran and Lichtenberg, Samuel P and Xiao, Jianxiong},
  booktitle={CVPR},
  year={2015}
}
@inproceedings{dehghan2021arkitscenes,
  title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},
  author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},
  booktitle={NeurIPS Datasets and Benchmarks Track (Round 1)},
  year={2021},
}
@inproceedings{hypersim,
  author    = {Mike Roberts AND Jason Ramapuram AND Anurag Ranjan AND Atulit Kumar AND
                 Miguel Angel Bautista AND Nathan Paczan AND Russ Webb AND Joshua M. Susskind},
  title     = {{Hypersim}: {A} Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding},
  booktitle = {ICCV},
  year      = {2021},
}
@article{objectron2021,
  title={Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations},
  author={Ahmadyan, Adel and Zhang, Liangkai and Ablavatski, Artsiom and Wei, Jianing and Grundmann, Matthias},
  journal={CVPR},
  year={2021},
}

omni3d's People

Contributors

garrickbrazil avatar gkioxari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omni3d's Issues

Installation gets stuck on Google Colab

Hi,

I was installing Omni3D on Google Colab, however the installation got stuck in this phase,

image

That particular cell has been running for more than 15 minutes, is this the normal time of execution for this?

(I am using Python 3.8 with Miniconda)

Maybe bug in evaluation

Hi, I first evaluate the provided checkpoint and get results as expected.
image
The detection results comprise the 2D detection and 3D detection parts. The problem is when I delete all 3D detection results and only submit the 2D detection results, the evaluation code still returns very promising 3D detection scores. What causes this?
image

Code for converting the raw annotation into Omni3d format

Dear Authors,

Thanks for your wonderful work in contributing to the Omni3D benchmark. I am wondering if it is possible to share the code for converting the raw annotation into Omni3D format (i.e. SUN RGB-D, ARKitScenes, and etc.) so that we can add more features to the benchmarks.

Best,
Qing

Loss explosion with outdoor data.

Hi, I ran your code with Base_Omni3D_out config and encountered

cubercnn WARNING: Skipping gradient update due to higher than normal loss 44.94 vs. rolling mean 3.55, Dict-> {'BoxHead/loss_box_reg': 1.9145429134368896, 'BoxHead/loss_cls': 19.28242301940918, 'Cube/loss_dims': 0.0018776070792227983, 'Cube/loss_joint': 0.06404059380292892, 'Cube/loss_pose': 0.018857350572943687, 'Cube/loss_xy': 0.0025855381973087788, 'Cube/loss_z': 0.012942994013428688, 'Cube/uncert': 21.71675682067871, 'rpn/cls': 0.5578740239143372, 'rpn/loc': 1.372727870941162}

after iteration 43k.

I also found that scaling up the batch size to 160 made the model even easier to encounter Skipping gradient update due to higher than normal loss.

Is this a normal phenomenon? I ran the code with 8 A100 GPUS.
My environment is:

sys.platform            linux
Python                  3.8.15 (default, Nov  4 2022, 20:59:55) [GCC 11.2.0]
numpy                   1.23.4
detectron2              0.6 
Compiler                GCC 7.3
CUDA compiler           CUDA 11.3
detectron2 arch flags   3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.10.1 
PyTorch debug build     False
GPU available           Yes
GPU 0,1,2,3,4,5,6,7     NVIDIA A100-SXM4-80GB (arch=8.0)
Driver version          470.129.06
CUDA_HOME               cuda-11.1
Pillow                  8.3.2
torchvision             0.11.2 
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                  0.1.5.post20210915
iopath                  0.1.9
cv2                     4.6.0
----------------------  -------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

Thank you.

Funky 3D box predictions

We are fine-tuning the cubercnn_DLA34_FPN.pth model on our lab data.
We seem to notice some unorthodox 3D boxes wheh visualizing the validation results.

000170

The image size is (640, 480), and the calibration matrix k is

k = [[380.07025146484375, 0.0, 324.4729919433594], 
     [0.0, 379.66119384765625, 237.78517150878906], 
     [0.0, 0.0, 1.0]]

The predictions from the evaluation json are attached. The cabinet is category 9 for us.
predictions.txt

Is this caused by the predicted vertices not being in the proper order for the visualization?
Or do you think it's a training issue?

Details of obtaining object pose from the raw dataset

Dear authors,

Thanks for your great work. I am wondering if you can provide the code for obtaining the object pose from the raw dataset (i.e. SUN-RGBD, Objectron).

From the visualization, it seems that the object pose is not very accurate: the corresponding 3D box is not fully aligned with the objects in most of the indoor datasets. I want to check if it is the labeling problem.
Like this:
image
Image: SUNRGBD/kv2/kinect2data/000065_2014-05-16_20-14-38_260595134347_rgbf000121-resize/image/0000121.jpg

Best,
Jason

Unstable training with keypoint loss

First of all, thank you very much for your insightful paper! You combined multiple interesting tricks into a single powerful approach.

I tried applying the proposed keypoint loss for monocular 3D object detection on a custom dataset and on a custom detector. I have already run dozens of experiments now, but I always observe the same phenomen that training becomes unstable after a few epochs. As proposed by you, I applied:

  • All keypoint losses including dimension, rotation, amodel_center, depth, all and uncertainty. I also performed experiments, where I excluded depth (also in the all loss)
  • I clipped the uncertainty to have values of at least 0.01. I also tried runs without clipping and some without uncertainty.
  • I applied your loss scales, but also experimented with lower and higher ones.
  • I applied the chamfer loss for rotation. But I also tried applying the l1 loss on it.
  • I tried runs with and runs without data augmentation.

The custom dataset is comparable to KITTI except that it requires to predict all rotation angles.
I also performed trainings on KITTI with the custom detector and your keypoint loss. There training was more stable, but there was no performance gain.
With training instability I mean that the network is learning for a few epochs. But, suddenly all losses start to increase (also 2D detection losses). Finally, the detector is unable to detect anything. For classical losses (e.g. amodel offset, l1 on rotation, etc.) training works very well. Computed keypoints are fine. We confirmed it by plotting.

Did you also observe something similar or training instabilities in general with the proposed keypoint loss? Do you have recommendations what to check or what might increase training stability?

How to know whether an image is flipped

Hi, I notice that random horizontal flip is applied during the training process. Although the image is flipped, the camera intrinsics returned by the data loader does not be changed accordingly. So, I need to transform the intrinsics by myself. However, I cannot find where the random horizontal flip is performed, so I cannot know whether an image is flipped. Can you give some reminder? Where I can know whether an image is flipped?

Prediction of 3D boxes format

Hi!
First of all, thank you very much for this awesome work :)
I have been trying to understand in detail the prediction format of the boxes.
in the paper it is said:
image

  1. does [u,v] represent the pixels where the projected box center is? that is simply (K x center_cam)?
  2. the virtual depth params are set to f_v = H_v = 512, which derives z_v = H / f ? is f given?
  3. are the per-category w0,h0.l0 are given? could not find where...
  4. about the allocentric rotation, when I convert p to egocentric rotation, does this correspond to 'R_cam'?

Thank you very much for the time and sorry all the questions

Failed to download https://dl.fbaipublicfiles.com/cubercnn/omni3d\category_meta.json

When running demo.py script i get the following error:
Failed to download https://dl.fbaipublicfiles.com/cubercnn/omni3d\category_meta.json Traceback (most recent call last): File "demo/demo.py", line 196, in <module> launch( File "C:\Users\user\anaconda3\envs\cubercnn\lib\site-packages\detectron2-0.6-py3.8-win-amd64.egg\detectron2\engine\launch.py", line 84, in launch main_func(*args) File "demo/demo.py", line 157, in main do_test(args, cfg, model) File "demo/demo.py", line 49, in do_test category_path = util.CubeRCNNHandler._get_local_path(util.CubeRCNNHandler, category_path) File "C:\Users\user\Downloads\TFMRuben\omni3d-bak\omni3d\cubercnn\util\model_zoo.py", line 19, in _get_local_path return PathManager.get_local_path(self.CUBERCNN_PREFIX + name) File "C:\Users\user\anaconda3\envs\cubercnn\lib\site-packages\iopath\common\file_io.py", line 1197, in get_local_path bret = handler._get_local_path(path, force=force, **kwargs) File "C:\Users\user\anaconda3\envs\cubercnn\lib\site-packages\iopath\common\file_io.py", line 797, in _get_local_path cached = download(path, dirname, filename=filename) File "C:\Users\user\anaconda3\envs\cubercnn\lib\site-packages\iopath\common\download.py", line 58, in download tmp, _ = request.urlretrieve(url, filename=tmp, reporthook=hook(t)) File "C:\Users\user\anaconda3\envs\cubercnn\lib\urllib\request.py", line 247, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "C:\Users\user\anaconda3\envs\cubercnn\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\user\anaconda3\envs\cubercnn\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\user\anaconda3\envs\cubercnn\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "C:\Users\user\anaconda3\envs\cubercnn\lib\urllib\request.py", line 569, in error return self._call_chain(*args) File "C:\Users\user\anaconda3\envs\cubercnn\lib\urllib\request.py", line 502, in _call_chain result = func(*args) File "C:\Users\user\anaconda3\envs\cubercnn\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

demo.py config files

Thank you so much for opensourcing such an amazing benchmark and model!

I managed to easily setup the environment and run the inference on Omni3D. As far as I understand, cubercnn_DLA34_FPN.yaml configs for omni3d, indoor and outdoor categories are hosted on https://dl.fbaipublicfiles.com/cubercnn/ (as well as corresponding category_meta.jsons).

However, the cubercnn_ResNet34_FPN.yaml configs are not hosted (or can't be accessed). I understand that it is possible to specify the local path to the config file (in the configs directory), and download the corresponding category_meta.jsons and run the demo with ResNet34 like that (note, that if path is not prefixed with cubercnn:// then category_meta.json will be searched locally and there are no such files by default).

So, to make long story short, it is also possible to make demo.py work with ResNet34 with some code changes. But wouldn't it be more consistent if you also host the cubercnn_ResNet34_FPN.yaml so that anyone could run the demo with ResNet34 just by changing cubercnn_DLA34_FPN.yaml to cubercnn_ResNet34_FPN.yaml?

Do I understand correctly that category_meta.json files for each category are the same for different types of models?

P.S.
Support Ukraine badge in the README.md is πŸ’”. Thank you!

KITTI Annotations

I see that your KITTI data differs from the original KITTI data. This means, center_cam is different and the intrinsics K are different (3x3 instead of 3x4). How did you compute these two?

ReNet34 yaml not working

Hi, I was trying to run the demo by passing the ResNet34 yaml file and the Res34 pth file for model weights as the argument but I was prompted with an error urllib.error.HTTPError: HTTP Error 403: Forbidden. Can you please help me resolve this issue?
Thank you.

Expanding detected classes

I was wondering if you also trained on the "boat" class from COCO. If so, would it be possible to expand the demo to also predict this class without retraining?

interfacing with ros slam node and stereo camera

hai folks

First of all, kudos to all of you guys. You guys have done great work developing the OMNI 3d. I have a question about how to interface it with the stereo depth and slam-based algorithms.

If possible, how to input the configuration data to it for using depth or point-cloud data as an input?

Regards

Venkata prasad s

Question regarding the consistency of object orientation (object frame) of common classes across datasets

Hi, Thanks for this awesome collection of datasets! I was just wondering if the object orientations of a particular class is consistent across the datasets. For eg. say we have 'bed' class, in sunrgbd the orientation of the bed (in object centric frame) is defined as let's say, the front face is the bed facing up (i.e normal direction of mattress), in other dataset such as hypersim is the front face of the bed defined in the same way? Because if not, then even if we convert all the annotations into a common co-ordinate frame, it is still inconsistent right. Could you please throw some light on this. Thanks :)

How to do tracking on a video

Hi, I'm trying to do object tracking on an mp4 video file as you did on the main page. can you or anybody who knows to tell me how to do that?
Thanks in advance

Conda install environment

I have some issues installing the packages.

What is the base environment to make it work?

I use python 3.8 and ubuntu 22.04.2 LTS, and I have several conflict error messages like this one:

Package liblapack conflicts for:
2213.5 pytorch=1.8 -> numpy[version='>=1.19.5,<2.0a0'] -> liblapack[version='3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.8.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|3.9.0|>=3.8.0,<4.0.0a0|>=3.8.0,<4.0a0|>=3.9.0,<4.0a0',build='4_mkl|7_mkl|9_mkl|10_mkl|11_mkl|20_mkl|5_mkl|7_mkl|9_mkl|12_linux64_mkl|19_linux64_mkl|16_linux64_mkl|15_linux64_mkl|14_linux64_mkl|13_linux64_mkl|11_linux64_mkl|10_mkl|8_mkl|6_mkl|21_mkl|19_mkl|18_mkl|16_mkl|15_mkl|14_mkl|13_mkl|12_mkl|8_mkl|6_mkl|5_mkl']
2213.5 fvcore -> numpy -> liblapack[version='>=3.8.0,<4.0.0a0|>=3.8.0,<4.0a0|>=3.9.0,<4.0a0']
2213.5 torchvision=0.9.1 -> numpy[version='>=1.18.5,<2.0a0'] -> liblapack[version='>=3.8.0,<4.0.0a0|>=3.8.0,<4.0a0|>=3.9.0,<4.0a0']

...

ERROR: failed to solve: process "/bin/sh -c conda install -c fvcore -c iopath -c conda-forge -c pytorch fvcore iopath pytorch=1.8 torchvision=0.9.1 cudatoolkit=10.1" did not complete successfully: exit code: 1

configs/category_meta.json

Hi, I'm tring to run the demo on Omni3D.
No such file or directory: 'configs/category_meta.jsonβ€˜.
May I ask if I can get "category_meta.json"?

Thank you!

Category Conversion Code from SUN RGB-D to Omni3D Dataset

Dear Authors,

Thank you for your exceptional contribution to the Omni3D benchmark. I've observed a reduction in class dimensionality when transitioning from the SUN RGB-D dataset to Omni3D. Specifically, certain categories like 'ottoman' in SUN RGB-D have been reclassified as 'chair' in Omni3D. Could you please share the code used for this category conversion? So that we can add more information like semantic segmentation to the dataset.

Best,
Jin

How to obtain the depth sensor point cloud of objectron images in Omni3D

Hi, thanks for your great work. I want to obtain the depth sensor point cloud of objectron images in Omni3D, so I also need to know which objection images in Omni3D correspond to which images in the original Objectron dataset, and how the coordinate is transformed. Can you share how the processed objection part in Omni3D is obtained? The data in objection seems to be videos rather than images, so you sample some frames based on some rules?

More details about default camera intrinsics in demo.py file

Can you provide explanation for why the default focal length, principal point are set in demo.py file as shown below. Also what does "focal_length_ndc " means?

if focal_length == 0:
focal_length_ndc = 4.0
focal_length = focal_length_ndc * h / 2

    if len(principal_point) == 0:
        px, py = w/2, h/2
    else:
        px, py = principal_point

Zero-shot + tracking

Hi, is there any way to run the Zero-shot + tracking demo shown in the main page?

Why some class label is -1

I note some class labels are -1, and the Gts the class labels of which are -1 have been removed from training. I am wondering what the class label -1 refers to. Does it mean this gt box exists in the dataset but does not belong to the concerned target classes?

Training log?

Hi, thank you for your great work.
Can you provide a training log of omni3d so I can check whether I set up my machine and run the codes correctly?

BTW

  1. The URL link in the README "Tips for tuning hyperparameters" is not working.
  2. The data.md has some minor typos:
datasets/nuScenes/samples
└── samples # two "samples" here
    β”œβ”€β”€ CAM_FRONT

and

./Omni3D/datasets/SUNRGBD
β”œβ”€β”€ kv1
β”œβ”€β”€ kv2
β”œβ”€β”€ realsense
# miss xtion here

Thanks.

demo.py not working properly

I'm having trouble while running the demo.py script.

I run it by using the command provided in github:
python demo/demo.py --config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml --input-folder "datasets/coco_examples" --threshold 0.25 --display MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth OUTPUT_DIR output/demo

When i try to run it this way, i get the following message:
Traceback (most recent call last): File "demo/demo.py", line 196, in <module> launch( File "c:\users\marti\source\repos\omni3d\detectron2\detectron2\engine\launch.py", line 84, in launch main_func(*args) File "demo/demo.py", line 149, in main model = build_model(cfg) File "C:\Users\marti\source\repos\omni3d\cubercnn\modeling\meta_arch\rcnn3d.py", line 253, in build_model model = META_ARCH_REGISTRY.get(meta_arch)(cfg, priors=priors) File "c:\users\marti\source\repos\omni3d\detectron2\detectron2\config\config.py", line 189, in wrapped explicit_args = _get_args_from_config(from_config_func, *args, **kwargs) File "c:\users\marti\source\repos\omni3d\detectron2\detectron2\config\config.py", line 245, in _get_args_from_config ret = from_config_func(*args, **kwargs) File "C:\Users\marti\source\repos\omni3d\cubercnn\modeling\meta_arch\rcnn3d.py", line 30, in from_config backbone = build_backbone(cfg, priors=priors) File "C:\Users\marti\source\repos\omni3d\cubercnn\modeling\meta_arch\rcnn3d.py", line 270, in build_backbone backbone = BACKBONE_REGISTRY.get(backbone_name)(cfg, input_shape, priors) File "C:\Users\marti\source\repos\omni3d\cubercnn\modeling\backbone\dla.py", line 493, in build_dla_from_vision_fpn_backbone bottom_up = DLABackbone(cfg, input_shape) File "C:\Users\marti\source\repos\omni3d\cubercnn\modeling\backbone\dla.py", line 421, in __init__ base = dla34(pretrained=True, tricks=cfg.MODEL.DLA.TRICKS) File "C:\Users\marti\source\repos\omni3d\cubercnn\modeling\backbone\dla.py", line 319, in dla34 model.load_pretrained_model(data='imagenet', name='dla34', hash='ba72cf86') File "C:\Users\marti\source\repos\omni3d\cubercnn\modeling\backbone\dla.py", line 304, in load_pretrained_model model_weights = model_zoo.load_url(model_url) File "C:\Users\marti\anaconda3\envs\cubercnn\lib\site-packages\torch\hub.py", line 746, in load_state_dict_from_url download_url_to_file(url, cached_file, hash_prefix, progress=progress) File "C:\Users\marti\anaconda3\envs\cubercnn\lib\site-packages\torch\hub.py", line 611, in download_url_to_file u = urlopen(req) File "C:\Users\marti\anaconda3\envs\cubercnn\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\marti\anaconda3\envs\cubercnn\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\marti\anaconda3\envs\cubercnn\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "C:\Users\marti\anaconda3\envs\cubercnn\lib\urllib\request.py", line 569, in error return self._call_chain(*args) File "C:\Users\marti\anaconda3\envs\cubercnn\lib\urllib\request.py", line 502, in _call_chain result = func(*args) File "C:\Users\marti\anaconda3\envs\cubercnn\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

Installation issue

Hello,
I followed the installation instructions line by line, and they all went through without any error. However, when trying out the demo example, I get the following error:

ImportError: /home/matteo/.local/lib/python3.8/site-packages/detectron2/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7reshapeEN3c108ArrayRefIlEE

Here, the full traceback:

Traceback (most recent call last):
  File "demo/demo.py", line 12, in <module>
    from detectron2.engine import default_argument_parser, default_setup, launch
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/engine/__init__.py", line 11, in <module>
    from .hooks import *
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 18, in <module>
    from detectron2.evaluation.testing import flatten_results_dict
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/evaluation/__init__.py", line 2, in <module>
    from .cityscapes_evaluation import CityscapesInstanceEvaluator, CityscapesSemSegEvaluator
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/evaluation/cityscapes_evaluation.py", line 11, in <module>
    from detectron2.data import MetadataCatalog
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/data/__init__.py", line 4, in <module>
    from .build import (
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/data/build.py", line 12, in <module>
    from detectron2.structures import BoxMode
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/structures/__init__.py", line 7, in <module>
    from .masks import BitMasks, PolygonMasks, polygons_to_bitmask
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/structures/masks.py", line 9, in <module>
    from detectron2.layers.roi_align import ROIAlign
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/layers/__init__.py", line 3, in <module>
    from .deform_conv import DeformConv, ModulatedDeformConv
  File "/home/matteo/.local/lib/python3.8/site-packages/detectron2/layers/deform_conv.py", line 11, in <module>
    from detectron2 import _C
ImportError: /home/matteo/.local/lib/python3.8/site-packages/detectron2/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7reshapeEN3c108ArrayRefIlEE

Any suggestion?

Camera Pose Information

Hi!

I wanted to see if it was possible to get the camera pose, in such a way that it was consistent coordinate frame between images for a single scene. If it's not available directly, would there be a way to get it by mapping from the original dataset with the information [e.g., Hypersim] to your format?

Arbitrary Translation and Rotaion

Dear Authors
Thank you very much for your fantastic work:

Can the network return the arbitrary rotation and translation (6 degrees of freedom )for the object that is located in indoor scenes? or just returns the rotation along the y-axis and translation along the x and the z-axis similar to outdoor scenes.?

Best

questions about dataset format

Hi, I have several questions regarding the dataset format when converting Objectron to COCO format.

  1. I am wondering what "bbox2D_trunc" means. Does it mean the minimum enclosed box?
  2. For "R_cam'', should I use the rotation matrix from camera to world or world to camera?
  3. For "bbox2D_proj '', I think we can get 8 vertices when projecting a 3D bbox to image space, but how can I get the form like [x1, y1, x2, y2]?
    Thanks!

where is the file named category_meta.json

HI, thanks for your great work. I'm going to run the demo script demo.py. When everything was prepared, there is an error occurred, implying the file of category_meta.json missed. Thus, I'm curious about where can I get the file.

How to make a new dateset for tranning?

Hi!
Thank you very much for this fantastic work :)
I wanted to try running it in a live environment, how could I make a new dataset? Whether to make the dataset in KITTI format.
I am looking forward to your prompt reply.

NVIDIA RTX A4000 with CUDA capability sm_86 is not compatible with the current PyTorch installation

I ran into the following warning when trying to run train_net.py (with --eval-only) on 2 GPUs, from which point my terminal simply hangs.

This error doesn't occur when --num_gpus is 1, but the inference time takes more than several hours. I'm wondering if this is expected?

UserWarning: 
NVIDIA RTX A4000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA RTX A4000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

training kitti only

Hello and thank you very much for this Great work , i am trying to train only kitti because i want to replace it with my custom dataset in kitti format
i changed he confi _out
BASE: "Base.yaml"
SOLVER:
TYPE: "sgd"
IMS_PER_BATCH: 32
BASE_LR: 0.02
STEPS: (69600, 92800)
MAX_ITER: 116000
WARMUP_ITERS: 3625
TEST:
EVAL_PERIOD: 29000
VIS_PERIOD: 2320
DATASETS:
#TRAIN: ('nuScenes_train', 'nuScenes_val', 'KITTI_train', 'KITTI_val')
#TRAIN: ('KITTI_train')
#TEST: ('nuScenes_test', 'KITTI_test')
TRAIN: ('KITTI_train', 'KITTI_val',)
TEST: ('KITTI_val',)
CATEGORY_NAMES: ('cyclist', 'pedestrian', 'trailer', 'bus', 'motorcycle', 'car', 'barrier', 'truck', 'van', 'traffic cone', 'bicycle')
MODEL:
ROI_HEADS:
NUM_CLASSES: 11

but when i run the training command (python tools/train_net.py --config-file configs/Base_Omni3D_out.yaml --num-gpus 1 SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 5568000 SOLVER.STEPS "(3340800, 4454400)" ) i got this problem

File "tools/train_net.py", line 508, in
launch(
File "/home/sun/anaconda3/envs/cubercnn/lib/python3.8/site-packages/detectron2/engine/launch.py", line 84, in launch
main_func(*args)
File "tools/train_net.py", line 462, in main
if do_train(cfg, model, dataset_id_to_unknown_cats, dataset_id_to_src, resume=args.resume):
File "tools/train_net.py", line 137, in do_train
data_loader = build_detection_train_loader(cfg, mapper=data_mapper, dataset_id_to_src=dataset_id_to_src)
File "/home/sun/anaconda3/envs/cubercnn/lib/python3.8/site-packages/detectron2/config/config.py", line 208, in wrapped
return orig_func(**explicit_args)
File "/home/sun/Downloads/omni3d-main/cubercnn/data/build.py", line 189, in build_detection_train_loader
return build_batch_data_loader(
File "/home/sun/anaconda3/envs/cubercnn/lib/python3.8/site-packages/detectron2/data/build.py", line 339, in build_batch_data_loader
data_loader = torchdata.DataLoader(
File "/home/sun/anaconda3/envs/cubercnn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 243, in init
assert prefetch_factor > 0
TypeError: '>' not supported between instances of 'NoneType' and 'int'

could you help , thank you very much

Incorrect 2D Bbox Labels in Objectron Dataset

I found some Objectron labels are incorrect. For example (the upper and lower row are the same, please ignore the upper row):
datasets_objectron_train_book_batch_24_49_0000100

The 2D Bbox is not sticking to the 3D projected Bbox. There are many more samples having problems. The example is titled: datasets_objectron_train_book_batch_24_49_0000100. I also did not view similar issues in other datasets.

The issues of adapting the other models to the omni3d datasets

Hi, thanks for your great work. When adapting some mono 3D detectors to the omni3d datatset, I encounter the circumstance that the 3D AP remains below 1 after about 20000 iterations. Therefore, I wnoder whether it is possible to release the code of adapting the other models, like SMOKE, to the dataset, just as the paper shows. Meanwhile, I wonder if it is possible to give me some clues about what parts might lead to such results specifically, like the angle loss, the training post process(virtual depth), data augmentation. Thanks very much.

Using actual camera intrinsics

Suppose we have access to the actual camera intrinsics, will it improve performance if we just replace the K with our recorded camera intrinsics?
The K in the demo.py code for example.

Running Cube-RCNN on a different dataset

Dear Authors,

I like your approach a lot. Thanks a lot also that you provide such an extensive documentation.
Especially that you define how to define the learning parameters in case of a different batch size.

Let's assume you have a different joint training dataset with about half the size. What would you change?
I would assume that you also linearly scale:

  • SOLVER.STEPS
  • SOLVER.WARMUP_ITERS
  • SOLVER.MAX_ITER
  • TEST.EVAL_PERIOD
  • VIS_PERIOD
    Anything else?

Best wishes

Johannes

Script for converting objectron to omni3d format

Hey, I noticed that omni3d only uses a small fraction of the original objectron dataset. And I am trying to cover more data in objectron. Can you provide the script for converting the objectron data format to omni3d format? Just convert the objectron coordinate to omni3d coordinate is enough.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.