facebookresearch / contactpose Goto Github PK

View Code? Open in Web Editor NEW

322.0 14.0 33.0 16.86 MB

Large dataset of hand-object contact, hand- and object-pose, and 2.9 M RGB-D grasp images.

Home Page: http://contactpose.cc.gatech.edu/

License: MIT License

Jupyter Notebook 99.07% Python 0.92% Shell 0.01%

hand-object-interaction grasps dataset mano rgb computer-vision robotics contact rgbd

contactpose's Introduction

ContactPose

Download and pre-processing utilities + Python dataloader for the ContactPose dataset. The dataset was introduced in the following ECCV 2020 paper: ContactPose: A Dataset of Grasps with Object Contact and Hand Pose -

Samarth Brahmbhatt, Chengcheng Tang, Christopher D. Twigg, Charles C. Kemp, and James Hays

Example ContactPose data: Contact Maps, 3D hand pose, and RGB-D grasp images for functional grasps.

Companion Repositories/Websites:

Citation

@InProceedings{Brahmbhatt_2020_ECCV,
author = {Brahmbhatt, Samarth and Tang, Chengcheng and Twigg, Christopher D. and Kemp, Charles C. and Hays, James},
title = {{ContactPose}: A Dataset of Grasps with Object Contact and Hand Pose},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {August},
year = {2020}
}

Documentation Link

Data Changelog

We have made some data and annotation corrections. The link above mentions the correction date and the exact data that was corrected. If you got that data before the correction date, please re-download it.

Licensing

Code: MIT License
3D models: each model has its own license, see README.txt and licenses.json in the downloads
All other data: MIT License

Updates

✔️ Fix annotation errors in data from participants 31-35.
🔲 Use rclone for Dropbox downloads
🔲 Make depth images optional in cropping script
✔️ Robust networking utilities for data download with exponential backoff in case of connection failure
✔️ Speed up dataset download by organizing images into videos
✔️ Release object 3D models
✔️ Code for cropping images around hand-object
✔️ Release contact modeling ML code
🔲 Release more data analysis code
✔️ Release MANO fitting code | demo at end of notebook
✔️ RGB-D image background randomization support
✔️ new Release ROS code used for recording the dataset
✔️ MANO and object mesh rendering
🔲 Documentation using Read the Docs

contactpose's People

Contributors

Stargazers

Watchers

contactpose's Issues

About obtaining the key point coordinates of the hand in the camera coordinate system

First, thank you for your great work! The following is the calculation of the coordinates of the key points of the hand in the camera coordinate system. Is it correct? Do other transformations need to be done? For example, affine transformation.
# X: Nx3
P = _cTo[camera_name][frame_idx][:3]
X = np.vstack((X.T, np.ones(len(X))))
x = P @ X
x = x.T

Create Point Cloud from depth image.

Hi @samarth-robo:

I tried to create point cloud from depth image to match the 3D joint coordinates that got from _cTo@oX (which has been discussed
in #23), but here's a problem about the creation of Point Cloud.
Following is my code:

point_image = o3dio.read_image(depth_dir)
intrinsics = o3dc.PinholeCameraIntrinsic()
intrinsics.intrinsic_matrix = np.dot(cp.A(camera_name), cp.K(camera_name))
extrinsics = cp.object_pose(camera_name,frame_idx)
pc = o3dg.PointCloud.create_from_depth_image(depth=point_image, intrinsic = intrinsics, extrinsic = extrinsics,depth_scale = 1000.0)
print(np.asarray(pc.points))
o3dv.draw_geometries([pc])

The output in terminal is:

[[nan nan nan]
 [nan nan nan]
 [nan nan nan]
 ...
 [nan nan nan]
 [nan nan nan]
 [nan nan nan]]

And nothing is showed on the Open3d window. Do you have any idea about what is the reason for this?

Thanks in advance!

Mano fits question

Thanks a lot for creating the dataset! One question about the mano fitting, I notice that there is only one fitting result in the json file (mano_fits_15.json) for each sequence, is this the fitting result for the first frame? Do you have the per-frame mano fitting result?

Missning sequence for some cameras

Hi @samarth-robo

I downloaded the dataset but for some subjects with certain objects sequences is missing for one of the 3 cameras.
For ex: full3_handoff, scissors, kinect2_left case the color images dir do not exist. Is this intentional? or am I missing something in the download?

Thanks in advance for the response±

Frame transformation related problem

Thanks a lot for the dataset!
I have a question about the transformation from oth to cth. I tried to use the following transformation equation to get the 3D joints coordinates w.r.t camera, but the results seems not lie in the camera frame.

(np.linalg.inv(self._cTo[camera_name][frame_idx]) @ np.vstack(self._oX[frame_idx][hand_idx].T, np.ones(len(self._oX[frame_idx][hand_idx])))).T[:,:3]

Here is the output:

kinect2_left
(array([[0.59586708, 0.57183709, 0.27451977],
       [0.57346818, 0.55475388, 0.25108291],
       [0.55723432, 0.54825791, 0.22123297],
       [0.54819433, 0.56737579, 0.19576846],
       [0.53468998, 0.58279761, 0.18603394],
       [0.56498216, 0.59200997, 0.20027606],
       [0.53212793, 0.60429002, 0.17582905],
       [0.50474777, 0.60782087, 0.16985572],
       [0.48186989, 0.61010318, 0.17019179],
       [0.57017289, 0.6114515 , 0.20931277],
       [0.5364321 , 0.62565416, 0.18285898],
       [0.50638421, 0.63366412, 0.17872098],
       [0.48237124, 0.63816475, 0.18142182],
       [0.57398631, 0.62594793, 0.22155912],
       [0.54427922, 0.6428173 , 0.19770902],
       [0.51716611, 0.64726235, 0.19323866],
       [0.49550774, 0.6499683 , 0.19336389],
       [0.57512528, 0.63672196, 0.23706421],
       [0.55397417, 0.65150359, 0.21775499],
       [0.53727593, 0.65383735, 0.21031583],
       [0.52000723, 0.65770182, 0.20670111]]), 
array([[0.56802887, 0.60250985, 0.34944253],
       [0.53132207, 0.59137744, 0.34292538],
       [0.49572316, 0.60188097, 0.33573266],
       [0.47328403, 0.62413895, 0.32591359],
       [0.46454028, 0.64074109, 0.30549669],
       [0.49526903, 0.64764002, 0.3199572 ],
       [0.46599222, 0.66376217, 0.29363614],
       [0.45486717, 0.66508935, 0.2700846 ],
       [0.44823189, 0.663057  , 0.24861838],
       [0.51577966, 0.66108361, 0.31025374],
       [0.49215595, 0.67836392, 0.27566575],
       [0.48336023, 0.67504445, 0.24672853],
       [0.47885343, 0.67136749, 0.22352431],
       [0.53498583, 0.66422097, 0.30099225],
       [0.51555078, 0.68000147, 0.27003876],
       [0.50632494, 0.67517748, 0.24434139],
       [0.49955107, 0.67154132, 0.22512216],
       [0.55252829, 0.65772819, 0.29191584],
       [0.53725348, 0.66866654, 0.26480445],
       [0.52858907, 0.66907577, 0.24578504],
       [0.52081835, 0.66914294, 0.22914286]]))

kinect2_middle
(array([[ 0.25805524, -0.14330683, -0.70323838],
       [ 0.23755131, -0.15640598, -0.67583703],
       [ 0.21632717, -0.17810507, -0.65924064],
       [ 0.19305602, -0.19927904, -0.66952522],
       [ 0.17236613, -0.20404604, -0.67753551],
       [ 0.19974594, -0.2020707 , -0.69879029],
       [ 0.15872292, -0.21343358, -0.69480874],
       [ 0.13170234, -0.20923564, -0.68773154],
       [ 0.11109021, -0.20073987, -0.68210294],
       [ 0.199793  , -0.19625675, -0.7200688 ],
       [ 0.15662736, -0.20924095, -0.71722308],
       [ 0.1261851 , -0.20253697, -0.71368982],
       [ 0.10450785, -0.19151102, -0.71012645],
       [ 0.20154913, -0.1867693 , -0.73684871],
       [ 0.16171137, -0.19888559, -0.73828475],
       [ 0.13505946, -0.19341821, -0.7323986 ],
       [ 0.11527754, -0.1855755 , -0.72754161],
       [ 0.20344128, -0.17313965, -0.74982706],
       [ 0.17327953, -0.18403933, -0.75301739],
       [ 0.15551203, -0.18502268, -0.74822493],
       [ 0.13789903, -0.18229422, -0.74530153]]), 
array([[ 0.24636096, -0.06459424, -0.73481013],
       [ 0.21662407, -0.05698227, -0.71090262],
       [ 0.17943783, -0.05121156, -0.70726615],
       [ 0.14837729, -0.05308462, -0.71853917],
       [ 0.12793847, -0.06956868, -0.72745343],
       [ 0.1566435 , -0.06746089, -0.74675463],
       [ 0.1166679 , -0.08200674, -0.74722766],
       [ 0.09893953, -0.09997769, -0.74067378],
       [ 0.08705045, -0.11750248, -0.73289643],
       [ 0.16623452, -0.08441807, -0.76453212],
       [ 0.12806941, -0.10875236, -0.766602  ],
       [ 0.11239126, -0.13240845, -0.75563243],
       [ 0.10241367, -0.15226003, -0.74676509],
       [ 0.17873854, -0.10010551, -0.77240924],
       [ 0.14594503, -0.12251127, -0.77513529],
       [ 0.13150476, -0.14293707, -0.76317673],
       [ 0.12082502, -0.15825822, -0.75424971],
       [ 0.19350514, -0.11465806, -0.77084525],
       [ 0.16739107, -0.13480626, -0.77116211],
       [ 0.15360272, -0.14940971, -0.7653657 ],
       [ 0.14148368, -0.16210854, -0.75996026]]))

kinect2_right
(array([[ 0.22794149, -0.34004319,  0.55556067],
       [ 0.22134314, -0.30439049,  0.5608691 ],
       [ 0.21544726, -0.27515871,  0.57840468],
       [ 0.20176543, -0.26881977,  0.6078713 ],
       [ 0.18387391, -0.26669128,  0.62166709],
       [ 0.20256197, -0.29594279,  0.62101284],
       [ 0.17215723, -0.27376277,  0.6412972 ],
       [ 0.1481243 , -0.25900016,  0.64281529],
       [ 0.1275925 , -0.2492314 ,  0.63938954],
       [ 0.19557289, -0.31640372,  0.62538112],
       [ 0.16372857, -0.29391005,  0.64818318],
       [ 0.13486196, -0.28178748,  0.65016326],
       [ 0.11187605, -0.2742406 ,  0.64581987],
       [ 0.18949139, -0.33473868,  0.62415392],
       [ 0.15928782, -0.31748448,  0.64708754],
       [ 0.13478475, -0.3043036 ,  0.64793996],
       [ 0.11509513, -0.29530475,  0.64515381],
       [ 0.18258538, -0.35112857,  0.61771471],
       [ 0.16000008, -0.33934967,  0.63745995],
       [ 0.14581676, -0.3283236 ,  0.64156818],
       [ 0.12979234, -0.32017404,  0.64330348]]), 
array([[ 0.17745971, -0.38850173,  0.50623965],
       [ 0.15319552, -0.3591347 ,  0.49832638],
       [ 0.11881537, -0.34414638,  0.50309058],
       [ 0.08987343, -0.34198408,  0.51899765],
       [ 0.07698495, -0.33698899,  0.54303658],
       [ 0.0972012 , -0.36506344,  0.54101507],
       [ 0.06811461, -0.34614034,  0.56562686],
       [ 0.06155757, -0.32815589,  0.58334037],
       [ 0.06020955, -0.31139042,  0.59837558],
       [ 0.1090279 , -0.37868767,  0.5602526 ],
       [ 0.08534843, -0.35869282,  0.59330569],
       [ 0.0839557 , -0.33578626,  0.61328308],
       [ 0.08551082, -0.31802657,  0.62923432],
       [ 0.12499995, -0.38514145,  0.57320395],
       [ 0.10508567, -0.36830961,  0.60328674],
       [ 0.10362195, -0.34602734,  0.61972146],
       [ 0.10264829, -0.32940694,  0.63202187],
       [ 0.14450407, -0.38454387,  0.58038083],
       [ 0.13002273, -0.3687835 ,  0.60547856],
       [ 0.12533885, -0.35398503,  0.61948015],
       [ 0.12123062, -0.34075045,  0.63153497]]))

The L2 norm of the coordinates looks fair enough, but the distance distributes mainly in different directions in these three cameras, i.e. x,y_axis in kinect2_left, z_axis in kinect2_middle, y,z_axis in kinect2_right. I thought it might because the coordinates are lying in world frame or object frame. How can I rotate the coordinates to make it lie in camera frame?
Thanks in advance

Generate active area

Hi Everyone,

I really appreciate @samarth-robo's work of ContactPose datasets.

I am quite interested in the Data Analysis part of the paper. I am planning to generate active area for each object as you described in this part. The goal is to generate active areas for specific hand parts. But I only find the source code of hand contact probability. Could you please give me some ideas about how to realize automatic active area discovery?

Many thanks!

fix annotation errors for participants 31-35

We have noticed that some grasps for p_num in [31, 35] have annotation errors. We are working on fixing them.

This issue will be updated to reflect progress, we anticipate 1 week from today.

Meanwhile, please ignore data from these participants.

hand and object Masks

Dear @samarth-robo ,

Is it possible to download the hand and object masks readily? I mean without running rendering scripts for all the sequences.
Is it possible to recover the mask from the depth maps provided?

Thank you!

Parse whole dataset

How to access whole dataset in loops ?

We can access a single instance using ContactPose class. However, how to get exhaustive list of p_num, intent, object_name etc. which are parameters of ContactPose ?

This is needed to convert the dataset to desired usage.

Could u provide the url of dataset?

First, thanks for such great work!
I found it's not very convenient to download the dataset using the script.
Personally, it would be better if u provide the download links also.
Thanks!

Masks for object margin issue

We can generate masks for objects and hands. ( This issue does not concern the hands masks as they use MANO and I can't use them due to license issue. )

When I am trying to generate masks for objects, there is margin of blue that is left. Can you give some suggestion what might be going wrong ?

Using grabcut to optimize gives result :

My logic :

      # create renderer
      object_renderer = rutils.DepthRenderer(object_name, cp.K(camera_name), camera_name, mesh_scale=1e-3)

      # render object
      object_pose = cp.object_pose(camera_name, frame_idx)
      object_rendering = object_renderer.render(object_pose)

      # create mask from rendering 
      object_mask = object_rendering > 0 

      # color img area of interest
      color_im_bgr_interest = color_im_bgr.copy()
      color_im_bgr_interest[np.logical_not(object_mask)] = 0 

      # due to hand object touch there are parts of hands that are included in 
      # mask.
      # since the objects are of blue color
      # based on mask refine the channel. 
      b, g, r = cv2.split(color_im_bgr_interest)
      color_mask = np.logical_and(b > g, b > r)

      # # dilate the color_mask to cover boundries better. 
      # color_mask = np.uint8(color_mask)
      # kernel = np.ones((5, 5), np.uint8)
      # color_mask = cv2.dilate(color_mask, kernel, iterations=1)

      # # convert to logical bool
      # color_mask = np.array(color_mask, dtype=np.bool)
      
      # object_mask = np.logical_and(object_mask, color_mask)
      object_mask = np.logical_and(object_mask, color_mask)

      object_mask = mutils.grabcut_mask(color_im_bgr, object_mask, n_iters=10

      mask = np.logical_not(object_mask)

Background only images

Hi @samarth-robo

Is it possible to release the background only image for the ho3d sequences? I need this to extract only the foreground of each sequence.

Thanks in advance!

Mano parameters

@samarth-robo issue continuing from an email conversation

Hi Samarth,

I have one more question, it would be a great help if you could clarify it as well. Thanks in Advance! 
1. I see there are only 6 mano_fits_**.json given for each sequence. How can we obtain the mano parameters of hand in each frame? More precisely, Is there a way to get mano axis angle parameters for each frame in the contactpoose dataset?

Looking forward to hearing from you!


Hi Anil,
 
Those 6 json files represent 6 different MANO parameter sizes. The MANO model allows you to represent the hand pose with different parameter sizes through PCA.
 
They don't correspond to frame numbers.
 
Regardless of which one you choose from those 6, you can get that hand model for each frame in the sequence. Please see the [mano_meshes()](https://github.com/facebookresearch/ContactPose/blob/main/utilities/dataset.py#L300) function of the ContactPose dataset.
 
These meshes are in the object coordinate frame. Then you can use [ContactPose.object_pose()](https://github.com/facebookresearch/ContactPose/blob/main/utilities/dataset.py#L278) to transform them into the camera coordinate frame.
 
You can see a working example of all this in [this demo notebook](https://github.com/facebookresearch/ContactPose/blob/main/rendering.ipynb), where the posed MANO meshes are used to render hand masks in the image.
 
Please use GitHub issues for these questions, so the answers are publicly documented and others can see them later if they have the same questions.

projecting 3D points to images

[from an email query]

Basically, I am trying to project the 3D vertices of objects on to provided 2D images.
I use the following transformations:

projected2d_verts = (cam_intrinsics @ (cp.obj_poses[:3, :3] @ obj3d_verts.T) + cp.obj_poses[:3, 3].reshape(3,1))

But I am getting below(wrong) projection(object projection is at the top left which should have been overlapping the mug in the hand)

I am not able to find what's wrong here. Would you have any suggestions to rectify this?

Thanks a lot in Advance for your time on this issue.

Affine transformation for each cameras

Dear @samarth-robo ,

I understand the purpose of affine(basically to map the points in camera space to pixel space correctly). But I do not understand why each camera has different affine and why were the affines designed in the first place?

Is it because of the way the cameras are mounted while capturing the data?

Also affine's does not have just change in axes but also shift in pixels. Could you please provide little more insight on this design?

Thank you very much!

speed up image downloads

Transition the API from downloading zip files of images to downloading videos of images. Video compression algorithms can greatly reduce the file sizes and speed up dataset download.

How to get mano pose, 3d points and vertex in camera coordinate system for each frame

Hi,
How I can get the pose and shape for each frame？It seems cp.mano_meshes(frame_idx) only return the mano pose for three frames .

data License?

Hi,

Does the MIT license also applies to the data ?
Thanks 👍

-Thibault

missing f in the beginning of the line making preprocess_images impossible

ContactPose/scripts/preprocess_images.py

Line 25 in 7ff9487

ilenames = [osp.join(dirname, f) for f in filenames]

Re-organize scripts/preprocess_images.py

Make depth images optional in scripts/preprocess_images.py.

depth image -> point cloud

(from anonymous user via email)

Recently I have been working with your ContactPose dataset (I appreciate how thorough the analyses are in the paper!). Specifically, I have been trying to convert the provided depth images to pointclouds, but I think I am missing some information about the depth imagers.

The depth images are provided in integer format. Usually when converting images to pointclouds, we multiply values of the depth image by a scalar, which converts the integers to real-world units (meters). Looking through the annotations, I cannot find this scalar.

Do you know what the scale of the depth image is, or how I can find this out myself if I don't have access to a Kinect v2 or the .bag recordings?

Why the extrinsic camera matrix is not explicit ?

the contactpose class gives the access to the properties and information we need. however, the api gives us the projection matrix directly. After reading the code also, it is not very clear to me what could be the extrinsic matrix. I am mentioning points in code as comment below :

    def K(self, camera_name):
        """
        Camera intrinsics 3x3
        You will almost never need this. Use self.P() for projection
        """
        return self._K[camera_name]

    def A(self, camera_name):
        """
        Affine transform to be applied to 2D points after projection
        Included in self.P
        """
        # Can you comment on why we need to apply affine transform over the projection ?
        return mutils.get_A(camera_name, 960, 540)

    def P(self, camera_name, frame_idx):
        """
        3x4 3D -> 2D projection matrix
        Use this for all projection operations, not self.K
       
        """
        # --- >  the familiar notation to me is : K * [R|t] - for reference.
        # --- > can you explain the code ? I would like to retrieve the R, t or extrinsic matrix. 
        P = self.K(camera_name) @ self.object_pose(camera_name, frame_idx)[:3]
        P = self.A(camera_name) @ P
        return P

    def object_pose(self, camera_name, frame_idx):
        """
        Pose of obj_name w.r.t. camera at frame frame_idx
        4x4 homogeneous matrix
        """
        return self._cto[camera_name][frame_idx]

Joint Angles

Hello everyone,

thank you for your great work! As far as I have seen, the coordinates of the joints of the simple hands model are given as cartesian coordinates. Is there also a possibility provided to get them as joint coordinates/angles?

Thank you and best,
Kyra

dataset google drive

it's hard to download all the dataset, can you upload to google drive or Baidu drive. thank you very much

Missing frames in sequence

Hi @samarth-robo ,

In the provided sequences(in the dataset), for a few sequences, a couple of frames are missing in between. Is there a reason why they are missing and a way to find out the details about missing frames for all the sequences?

Eg. seq: full15_use/bowl/images_full/kinect2_left

This information would be very useful to know :)

Looking forward to hearing from you!

Thanks in Advance!

facebookresearch / contactpose Goto Github PK

contactpose's Introduction

Companion Repositories/Websites:

Citation

Licensing

Updates

contactpose's People

Contributors

Stargazers

Watchers

Forkers

contactpose's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs