GithubHelp home page GithubHelp logo

hope-dataset's Introduction

Household Objects for Pose Estimation (HOPE)

The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. Further, we provide 3D textured meshes for generating synthetic training data.

The HOPE-Image dataset shows the objects in 50 scenes from 10 household/office environments. Up to 5 lighting variations are captured for each scene, including backlighting and angled direct lighting with cast shadows. Scenes are cluttered with varying levels of occlusion.

The HOPE-Video dataset consists of ten sequences captured by a camera mounted to a robotic arm. The camera is moved to survey a set of objects placed in the robot's workspace. Accompanying each video sequence is a point cloud scene reconstruction computed by CascadeStereo.

Download

To download the dataset, install the Python package gdown via pip install gdown, then run python setup.py to download the dataset from Google Drive and unpack the zip archives.

By default, this tool will download the HOPE-Image validation and test sets (hope_image/valid, 50MB; hope_image/test; 179MB), the HOPE-Video set (hope_video/, 2.9GB), and low-resolution (meshes/eval/, 33MB) and high-resolution (meshes/full/, 98MB) meshes. Use the command line options to download specific subsets.

Note that the HOPE-Image dataset is also part of the BOP challenge and can be downloaded there.

HOPE-Image

The HOPE-Image dataset contains 188 test images taken in 8 environments, with a total of 40 scenes (unique camera and object poses). An additional 50 validation images are included from 2 environments in 10 scene arrangements.

Within each scene, up to 5 lighting variations are captured with the same camera and object poses. For example, the captures in valid/scene_0000/*.json all depict the same camera pose and arrangement of objects, but each individual capture (0000.json, 0001.json, ...) has a different lighting condition. For this reason, each image should be treated independently for purposes of pose prediction. The most favorable lighting condition for each scene is found in image 0000.json.

Images were captured using a RealSense D415 RGBD camera. We observed systematic errors in the depth values relative to the estimated distance of a calibration grid. To correct for this, we scaled depth frames by a factor of 0.98042517 before registering to RGB. Annotations were made manually using these corrected RGBD frames.

NOTE: Only validation set annotations are included. Test annotations are managed by the BOP challenge.

If you use HOPE-Image in your own experiments, please cite the following paper (arXiv):

@inproceedings{tyree2022hope,
  author={Tyree, Stephen and Tremblay, Jonathan and To, Thang and Cheng, Jia and Mosier, Terry and Smith, Jeffrey and Birchfield, Stan},
  title={6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An Accessible Dataset and Benchmark},
  booktitle={International Conference on Intelligent Robots and Systems (IROS)},
  year={2022}
}

HOPE-Video

The HOPE-Video dataset contains 10 video sequences (2038 frames) with 5-20 objects on a tabletop scene captured by a robot arm-mounted RealSense D415 RGBD camera. In each sequence, the camera is moved to capture multiple views of a set of objects in the robotic workspace. We first applied COLMAP to refine the camera poses (keyframes at 6~fps) provided by forward kinematics and RGB calibration from RealSense to Baxter's wrist camera. 3D dense point cloud was then generated via CascadeStereo (included for each sequence in scene.ply). Ground truth poses for the HOPE objects models in the world coordinate system were annotated manually using the CascadeStereo point clouds. The following are provided for each frame:

  • Camera intrinsics/extrinsics
  • RGB images of 640x480
  • Depth images of 640x480
  • 3D scene reconstruction from CascadeStereo
  • Object pose annotation in the camera frame

NOTE! It was brought to out attention that the camera extrinsic matrices in HOPE-Video have a mistake in units and are unclearly labeled. Object poses are expressed in cm, but we mistakenly expressed camera extrinsics in m. In addition, extrinsics are written as world-to-camera matrices. If you would like to transform a pose in camera coordinates to world coordinates, use this correction for now:

extrinsics_w2c = annots['camera']['extrinsics']
extrinsics_w2c[:3,-1] *= 100  # correct the translation units from m to cm
extrinsics_c2w = np.linalg.inv(extrinsics_w2c)
pose_world = extrinsics_c2w @ pose_camera

If you use HOPE-Video in your own experiments, please cite the following paper (website, arXiv):

@inproceedings{lin2021fusion,
  author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A. and Birchfield, Stan},  
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},   
  title={Multi-view Fusion for Multi-level Robotic Scene Understanding},   
  year={2021},
  pages={6817-6824},
  doi={10.1109/IROS51168.2021.9635994}
}

Objects

Objects consist of a set of 28 toy grocery items selected for compatibility with robot manipulation and widespread availability. Textured models were generated by an EinScan-SE 3D Scanner, units were converted to centimeters, and the centers/rotations of the meshes were aligned to a canonical pose.

Full size textured models may be downloaded here.

As of September 2021, all objects could be obtained from online retailers for about 60 USD:

Preview tool

Use the included visualization tool preview.py to view annotated images in the validation set. The tool requires the following Python packages: numpy, open3d, trimesh, networkx, pyglet, and PIL. The packages can be installed with the following command: pip install numpy open3d trimesh networkx pyglet Pillow. There is a known issue with recent versions of Open3D and Ubuntu 16.04. If you run into this problem, you may need to use Python 3.7 or earlier and revert to an older version of Open3d: pip install open3d==0.9.0.

Usage: `preview.py [-h] [--showrgb] [--rgbpath PATH] [--depthpath PATH] [--pcpath PATH] [--meshdir PATH] PATH`

Display a scene from the HOPE-Image or HOPE-Video datasets. By default, object
annotations are overlaid on either the reconstructed scene point cloud or RGBD.
An overlay on the RGB image can be shown with `--showrgb`. File paths are
automatically attempted relative to the annotation file path. Press O to toggle
objects, D to toggle RGBD (if present), P to toggle scene point cloud (if
present), and Q to quit.

positional arguments:
  PATH              Path to scene annotation file

optional arguments:
  -h, --help        show this help message and exit
  --showrgb         Show RGB image instead of RGBD and/or point cloud
                     
  --rgbpath PATH    Path to RGB image
                    (optional, default: annotspath.replace(".json","_rgb.jpg"))
  --depthpath PATH  Path to depth image
                    (optional, default: annotspath.replace(".json","_depth.png"))
  --pcpath PATH     Path to scene point cloud
                    (optional, default: dirname(annotspath)+"/scene.ply"))
  --meshdir PATH    Path to object meshes
                    (optional, default: meshes/eval/)

License

Copyright (C) 2021 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license.

hope-dataset's People

Contributors

swtyree avatar uio96 avatar sbirchfield avatar

Stargazers

Changmin Jeon avatar Beytullah Yayla avatar  avatar  avatar  avatar  avatar Yue Zhan avatar Taeyeop Lee avatar Shenglin avatar Zhifeng Gu avatar  avatar ignorantimt avatar  avatar Nguyễn Quang Gia Thuận avatar Jintan Zhang avatar Jeff Carpenter avatar koreandrew avatar Mona Jalal avatar Bharath Raj M avatar sparrow avatar  avatar Arul Selvam avatar Shubham Agrawal avatar cxlcl avatar  avatar Dehao Huang avatar  avatar  avatar Daoyi Gao avatar  avatar  avatar Chongkai Gao avatar Scott Laue avatar  avatar Xingyu Liu avatar yangchao avatar Ben Duffy avatar Sudharshan Suresh avatar Dave Weatherwax avatar lomehaha avatar ZZ Si avatar Adam Sherwood avatar Alberto Remus avatar Daniel Wild avatar Allen Feng avatar Pamikk avatar  avatar Tomas Hodan avatar Jonathan Tremblay avatar

Watchers

James Cloos avatar Dave Weatherwax avatar  avatar  avatar

hope-dataset's Issues

Object pose of HOPE-Video

Hi @swtyree @Uio96 @sbirchfield,

Thanks for sharing the HOPE cad models and dataset!

My question is, when I try to project object pose back to the world frame from different scenes, I found that their pose in world frame are not the same, which means the pose has some errors. So is this error acceptable?

Thanks.

Follow your paper to make a new dataset

I want to follow your paper or similar paper to make a new datasets, from scanning to take pictures and annotate
pose. Can you show me some tools or code please? Thank you!

Low quality of the visible mask images in BOP format

Hi all, it's me again.

Thanks for your previous help!

Recently I made up the annotations for the HOPE-video dataset to generate a BOP format dataset for our research.
However, after I used the bop_toolkit to obtain the mask and visible-mask images of each view, I found that the quality of the visible images was low; meanwhile, the mask images looked nice and sound. Below is an example of RGB image, visible mask image, and mask image.

RGB Image (Orange Juice Box)
000000
Visible Mask Image (Orange Juice Box)
000000_000005
Mask Image (Orange Juice Box)
000000_000005
It's clear to see that there are some erosions on the side in the visible mask image of the orange juice box.
So I wonder if you have ever met this issue or not.

clarification on object-se

Could you please clarify what is "TWO" here since there is only one barbecue sauce here

[{"label": "barbecue-sauce", "TWO": [[0.5436763454963882, 0.622191933760276, -0.4349827125343375, 0.3578873048305161], [0.1073753759264946, 0.07318738847970963, 0.4580722153186798]]}]

Also,

If there are two detected objects, why one has an array of size 4 and the other has an array of size 3?

Thanks
megapose6d/megapose6d#19 (comment)

image_rgb

HOPE-Video in BOP format

Hi Stephen @swtyree,

For BOP'24, we would like to add more testing instances for HOPE dataset. One simple approach is to include instances from HOPE-Video, which could be useful for multi-view or tracking settings in the future.

Do you have HOPE-Video in BOP format so that we can proceed with this? (I downloaded and found that HOPE-Video is not in BOP format for now.)

cc @thodan, @TontonTremblay

Thanks,
Nguyen

Citation?

Hi,

I'm using the dataset for evaluation in a paper I will submit soon. Is there any associated paper with the dataset or something I can cite?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.