dl3dv-10k / dataset Goto Github PK

News: the 7k dataset is ready for download.

Home Page: https://dl3dv-10k.github.io/DL3DV-10K/

License: Other

Python 0.43% HTML 99.57%

3d-models ai deep-learning novel-view-synthesis 3d-reconstruction 3d-vision computer-graphics computer-vision machine-learning pytorch

dataset's Introduction

DL3DV-10K Dataset

DL3DV-10K is a dataset of real-world scene-level videos with scene annotations.

This repo helps you get ready to download all the DL3DV-10K dataset.

Dataset Download • Website • NVS Benchmark Training Results • Data Preparation • License • Issues • BibTex

Abstract

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation.

Key Feature

10,510 multi-view scenes covering 51.2 million frames at 4k resolution.
140 videos as Novel view synthesis (NVS) benchmark.
All videos are annotated by scene environment (indoor vs. outdoor), levels of reflection, transparency, and lighting.
Released samples include colmap calculated camera pose.
Benchmark videos offer trained parameters from the SOTA NVS methods, including 3D Gaussian Splatting, ZipNeRF, Mip-NeRF 360, Instant-NGP, and Nerfacto.

NVS Benchmark Training Results

We report the performances of the main STOA methods (2023 Fall) on our large-scale NVS benchmark. Here are the quantitative results. Please refer to our paper for more details (e.g. more quantitative and qualitative results.)

Performance on the benchmark. The error metric is calculated from the mean of 140 scenes on a scale factor of 4. Zip-NeRF uses the default batch size (65536) and Zip-NeRF* uses the identical batch size as other methods (4096). Note, the training time and memory usage may be different depending on various configurations.

A presents the density plot of PSNR and SSIM and their relationship on \benchmark~for each method. B describes the performance comparison by scene complexity. The text above the bar plot is the mean value of the methods on the attribute.

Data Preparation

Data Scale

DL3DV-10K has more than 10K high-quality videos that cover diverse real-world scenes for 3D vision tasks.

Data collection

We have formulated the following requirements as guidelines for recording high-quality scene-level videos:

The scene coverage is in the circle or half-circle with a 30 secs-45 secs walking diameter and has at least five instances with a natural arrangement.
The default focal length of the camera corresponds to the 0.5x ultra-wide mode for capturing a wide range of background information.
Each video has a horizontal view of at least 180◦ or 360◦ from different heights, including overhead and waist. It offers high-density views of objects within the coverage area.
The video resolution should be 4K and have 60 fps (or 30 fps).
The video's length should be at least 60 secs for mobile phone capture and 45 secs for drone video recording.
We recommend limiting the duration of moving objects in the video to under 3 secs, with a maximum allowance of 10 secs.
The frames should not be motion-blurred or overexposed, and the captured objects should be stereoscopic.

Data statistics

Visit DL3DV-10K Website

Dataset Download

Dataset Preview

We provide a preview page here. The preview page has a snapshot of each scene, its hash code and labels. Some of the missing labels should be updated soon.

Download Instructions

Free download sample videos (11 scenes)
- Access here
Benchmark dataset release (140 scenes)
- Raw videos
- Benchmark images, camera pose (Ready for download)
  - The user requests here.
  - We provide both Nerfstudio and 3D Gaussian Splatting formats for benchmark scenes.
- Benchmark trained weights for 3D Gaussian Splatting, ZipNeRF, Mip-NeRF 360, Instant NGP, and Nerfacto ( Coming soon)
[-] 10K Full Dataset Release: The whole dataset is extremly large. Here are different versions for different needs.
- 480P resolution frames with poses (~730G). Dataset link.
- 960P resolution frames with poses (~2.8T).Dataset link.
- 2K resolution frames with poses (~11T). Dataset link.
- 4K resolution frames with poses (~44T). Dataset link.
- 4K videos (~7T). Dataset link.
- COLMAP cache folders (may be useful if you need post-processing based on those cach). Dataset link.

Please go to the relevant huggingface dataset page and request the access. If you request the access, you automatically sign our term of use and license and can access the dataset. Note, the latest license is open to the usage of the dataset. But it is the user's responsibility to keep the use appropriately. The DL3DV organization disclaims any responsibility for the misuse, inappropriate use, or unethical application of the dataset by individuals or entities who download or access it. More details can be found in our license.

If you have enough space, you can use git to download a dataset from huggingface. See this link. 480P/960P versions should satisfies most needs.

If you do not have enough space, we further provide a download script here to download a subset. First make sure you have applied for the access (See above). To set up the environment for the script, call this in your python virtual environment:

pip install huggingface_hub tqdm pandas

The usage for the download.py:

usage: download.py [-h] --odir ODIR --subset {1K,2K,3K,4K,5K,6K,7K,8K,9K,10K} --resolution {4K,2K,960P,480P} --file_type {images+poses,video,colmap_cache} [--hash HASH]
                  [--clean_cache]

optional arguments:
  -h, --help            show this help message and exit
  --odir ODIR           output directory
  --subset {1K,2K,3K,4K,5K,6K,7K,8K,9K,10K}
                        The subset of the benchmark to download
  --resolution {4K,2K,960P,480P}
                        The resolution to donwnload
  --file_type {images+poses,video,colmap_cache}
                        The file type to download
  --hash HASH           If set subset=hash, this is the hash code of the scene to download
  --clean_cache         If set, will clean the huggingface cache to save space

Here are some examples:

# Make sure you have applied for the access.
# Use this to download the download.py script 
wget https://raw.githubusercontent.com/DL3DV-10K/Dataset/main/scripts/download.py 

# Download 480P resolution images and poses, 0~1K subset, output to DL3DV-10K directory   
python download.py --odir DL3DV-10K --subset 1K --resolution 480P --file_type images+poses --clean_cache


# Download 960P resolution images and poses, 0~1K subset, output to DL3DV-10K directory   
python download.py --odir DL3DV-10K --subset 1K --resolution 960P --file_type images+poses --clean_cache


# Download 2K resolution images and poses, 0~1K subset, output to DL3DV-10K directory   
python download.py --odir DL3DV-10K --subset 1K --resolution 2K --file_type images+poses --clean_cache


# Download 4K resolution images and poses, 0~1K subset, output to DL3DV-10K directory   
python download.py --odir DL3DV-10K --subset 1K --resolution 4K --file_type images+poses --clean_cache


# Download 4K resolution videos, 0~1K subset, output to DL3DV-10K directory   
python download.py --odir DL3DV-10K --subset 1K --resolution 4K --file_type video --clean_cache


# Download 480P resolution images and poses, 1K~2K subset, output to DL3DV-10K directory   
python download.py --odir DL3DV-10K --subset 2K --resolution 480P --file_type images+poses --clean_cache

License

DL3DV-10K is released under the DL3DV-10K Terms of Use. The DL3DV-10K Terms of Use, disclaimer, and the copy of the license are available in this repository.

Issues

Despite our best efforts to anonymize data, there may be instances where sensitive details are inadvertently included. If you identify any such issues within the dataset (scenes), don't hesitate to get in touch with us at issue. We will manually redact any sensitive information to ensure the privacy and integrity of the dataset.

Want to contribute the DL3DV-10K dataset? Upload your video here.

About

The DL3DV-10K team is a non-profit organization with members inlcuding the authors of DL3DV-10K paper and volunteers who contribute to the dataset. Our mission is to make large-scale of deep learning models and datasets available to the general public.

BibTeX

If you found this dataset useful, please cite our paper.

@article{ling2023dl3dv,
  title={DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision},
  author={Ling, Lu and Sheng, Yichen and Tu, Zhi and Zhao, Wentian and Xin, Cheng and Wan, Kun and Yu, Lantao and Guo, Qianyu and Yu, Zixun and Lu, Yawen and others},
  journal={arXiv preprint arXiv:2312.16256},
  year={2023}
}

dataset's People

Contributors

Stargazers

Watchers

Forkers

edustack alexandor91 nkarpov psalmmyn

dataset's Issues

How this dataset was made？

Thank you very much for your dataset, but I still need to make a similar dataset myself for my work, may I ask how your dataset was made and especially how the transforms.json file was obtained?

The Release of Trained Weights

Thank you for the amazing work you have done!

Just curious when you will release the pretrained weights of DL3DV-10K? It is quite impactful to my current work.

Some benchmark videos not in DL3DV-valid yet?

Hi, I found that among the 140 benchmark videos, 98 of them are in DL3DV-valid.csv.
Are the remaining benchmark videos yet to be added, and will they be added?

I'm not really interested in downloading the nerf/gsplats, so would be nice to be able to download everything via the regular download.

near & far setting for NeRF training?

What value did you set for the near and far plane when training NeRF & Gaussian Splatting?

Undistorted Images

Hello,
Thanks for this awesome dataset. Are the 960x540 images already undistorted or do we need to undistort them ourselves using the OPENCV camera model provided in the colmap files?

License/clarifications on usage

I'm really excited for this dataset but I'm wondering about the non-commercial license.
It would be great if the license could clarify if weights of non-generative models inherit the non-commercial license.

For example, I'm interested in training models for image matching, which does not aim to replicate the original data.
I feel that downstream commercial usage of this type of model shouldn't be problematic, but in its current state its very difficult to know if the license prohibits such usage.

A further example could be training monodepth networks, or other scene agnostic types of models that are conditioned on images.

How to calculate the distance between two matrices in transform.json in the world coordinate system?

Thanks for your dataset, but for my work (localization) I need to know the distance between two camera frames in the world coordinate system, how can I calculate the distance in the real world from the data in transform.json?

How can I transform the poses (nerfstudio format) to opencv convention from OpenGL convention?

I try to flip the y and z axes direction of the camera-to-world matrix but it seems that it is not enough. when I train generalizable Gaussian Splatting on DL3DV. The RGB loss suddenly increased a lot and failed.

c2w[..., :, 1:3] *= -1

How to download images from 5K to 7K

Though the news said that 7k dataset is ready for download, we could only download 1k~4k images.

Resolution error in some samples

Thanks for the great work!

I noticed that the resolution of some samples is not correct. For example, in the benchmark samples, the 960P images of 07d9f9724ca854fae07cb4c57d7ea22bf667d5decd4058f547728922f909956b (under nerfstudio/images_4)is 480P instead. It seems that the full resolution of the image is 2K instead of 4K and thus the downsampled ones do not match the target size.

A question about setup of dateset

Sorry, I found the nerf-h training method I referenced in training, which uses Cambridge dataset to train nerf, and provides a setup_world.json file to manually align the scales in addition to pose as shown in the code below, do I need to set up such a file to train with DL3DV as well? If yes, how do I set it up?

if rescale_coord.
sc=train_set.pose_scale # manual tuned factor, align with colmap scale
all_poses[:,:3,3] *= sc

### Quite ugly ### 
## move centre of camera pose
if train_set.move_all_cam_vec ! = [0.,0.,0.].
    all_poses[:, :3, 3] += train_set.move_all_cam_vec

if train_set.pose_scale2 ! = 1.0: all_poses[:, :3, 3] += train_set.
    all_poses[:, :3, 3] *= train_set.pose_scale2

The transform_matrix is world-to-camera matrix? Or camera-to-world matrix?

Question about downloading outdoor scenes for nerf training

Thanks for the great work! I want to download some outdoor scenes data for nerf training, can I just run it directly with this code? #python download.py --odir DL3DV-10K --subset 1K --resolution 960P --file_type images+poses --clean_ cache

Request Full Dataset.

Hi! When do you plan to release all the datasets including poses and nerf depths?

How to generate depth image

Thanks for your excellent work! I would like to ask how can I generate a depth image for each rgb image from the provided colmap files?

Some colmap caches seem incomplete

Hi Team,

It seems that some COLMAP caches are broken for certain scenes. For example, in 4K/7f5223dfae59ed4cc2be125420b71f2b1e93556dd086cd091df55d9f30e51b99, we only have "database.db" instead of "cameras.bin database.db images.bin models points3D.bin". Could you help confirm does it mean this scene should be filtered out?

Best,
Jianyuan

Could you release colmap code for reconstructing scene?

Can only 1K-7K be downloaded for 960P?

Traceback (most recent call last):
File "/group/40046/public_datasets/3d_datasets/DL3DV-10K/download.py", line 229, in
assert params.subset in ['1K', '2K', '3K', '4K', '5K', '6K', '7K'], 'Only support subset 1K-7K so far'
AssertionError: Only support subset 1K-7K so far

Some files seem missing

Hi,

Thanks for sharing this great dataset! I have successfully downloaded 1k-4k. When trying to download 5k-7k, it works for colmap cache while some image+poses files are missing. For example:

Downloading:  98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊   | 979/1000 [1:18:30<01:41,  4.81s/it]
Traceback (most recent call last):
  File "/data/home/jianyuan/miniconda3/envs/vggsfm/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    response.raise_for_status()
  File "/data/home/jianyuan/miniconda3/envs/vggsfm/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/datasets/DL3DV/DL3DV-ALL-960P/resolve/main/5K/bcb0f8befe19e48eb395f40fdde9a0e8b1d2ed300d951bf041f6ae88fa9410e0.zip


Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 998/1000 [1:04:36<00:07,  3.88s/it]
Traceback (most recent call last):
  File "/data/home/jianyuan/miniconda3/envs/vggsfm/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    response.raise_for_status()
  File "/data/home/jianyuan/miniconda3/envs/vggsfm/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/datasets/DL3DV/DL3DV-ALL-960P/resolve/main/6K/c6626ee2d9f843422628367e008052e9e0bf52f7c2db041ffa7401c808a9f4a7.zip


Retry 1
Downloading:  99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 989/1000 [1:18:43<00:52,  4.78s/it]
Traceback (most recent call last):
  File "/data/home/jianyuan/miniconda3/envs/vggsfm/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    response.raise_for_status()
  File "/data/home/jianyuan/miniconda3/envs/vggsfm/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/datasets/DL3DV/DL3DV-ALL-960P/resolve/main/7K/cb6f596cb5fcfea042bc5cd5ec37baf472cc4600034160eb04029a9d16444999.zip

All of them happen at the last few scenes.

camera infos for the 1k video not found

Great job! The corresponding camera infos for the 1K video were not found.

"applied_transform" in camera json file

Thanks a lot for your great work!

There is a key called "applied_transform" in the camera json file. I wonder what this denotes.