GithubHelp home page GithubHelp logo

facebookresearch / contactpose Goto Github PK

View Code? Open in Web Editor NEW
318.0 12.0 32.0 16.86 MB

Large dataset of hand-object contact, hand- and object-pose, and 2.9 M RGB-D grasp images.

Home Page: http://contactpose.cc.gatech.edu/

License: MIT License

Jupyter Notebook 99.07% Python 0.92% Shell 0.01%
hand-object-interaction grasps dataset mano rgb computer-vision robotics contact rgbd

contactpose's Introduction

PWC

Download and pre-processing utilities + Python dataloader for the ContactPose dataset. The dataset was introduced in the following ECCV 2020 paper: ContactPose: A Dataset of Grasps with Object Contact and Hand Pose -

Samarth Brahmbhatt, Chengcheng Tang, Christopher D. Twigg, Charles C. Kemp, and James Hays

Example ContactPose data: Contact Maps, 3D hand pose, and RGB-D grasp images for functional grasps.

Companion Repositories/Websites:

Citation

@InProceedings{Brahmbhatt_2020_ECCV,
author = {Brahmbhatt, Samarth and Tang, Chengcheng and Twigg, Christopher D. and Kemp, Charles C. and Hays, James},
title = {{ContactPose}: A Dataset of Grasps with Object Contact and Hand Pose},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {August},
year = {2020}
}

We have made some data and annotation corrections. The link above mentions the correction date and the exact data that was corrected. If you got that data before the correction date, please re-download it.

Licensing

Updates

contactpose's People

Contributors

akarshkumar0101 avatar samarth-robo avatar tangchengcheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

contactpose's Issues

Why the extrinsic camera matrix is not explicit ?

the contactpose class gives the access to the properties and information we need. however, the api gives us the projection matrix directly. After reading the code also, it is not very clear to me what could be the extrinsic matrix. I am mentioning points in code as comment below :

    def K(self, camera_name):
        """
        Camera intrinsics 3x3
        You will almost never need this. Use self.P() for projection
        """
        return self._K[camera_name]

    def A(self, camera_name):
        """
        Affine transform to be applied to 2D points after projection
        Included in self.P
        """
        # Can you comment on why we need to apply affine transform over the projection ?
        return mutils.get_A(camera_name, 960, 540)

    def P(self, camera_name, frame_idx):
        """
        3x4 3D -> 2D projection matrix
        Use this for all projection operations, not self.K
       
        """
        # --- >  the familiar notation to me is : K * [R|t] - for reference.
        # --- > can you explain the code ? I would like to retrieve the R, t or extrinsic matrix. 
        P = self.K(camera_name) @ self.object_pose(camera_name, frame_idx)[:3]
        P = self.A(camera_name) @ P
        return P

    def object_pose(self, camera_name, frame_idx):
        """
        Pose of obj_name w.r.t. camera at frame frame_idx
        4x4 homogeneous matrix
        """
        return self._cto[camera_name][frame_idx]

dataset google drive

it's hard to download all the dataset, can you upload to google drive or Baidu drive. thank you very much

Frame transformation related problem

Thanks a lot for the dataset!
I have a question about the transformation from oth to cth. I tried to use the following transformation equation to get the 3D joints coordinates w.r.t camera, but the results seems not lie in the camera frame.

(np.linalg.inv(self._cTo[camera_name][frame_idx]) @ np.vstack(self._oX[frame_idx][hand_idx].T, np.ones(len(self._oX[frame_idx][hand_idx])))).T[:,:3]

Here is the output:

kinect2_left
(array([[0.59586708, 0.57183709, 0.27451977],
       [0.57346818, 0.55475388, 0.25108291],
       [0.55723432, 0.54825791, 0.22123297],
       [0.54819433, 0.56737579, 0.19576846],
       [0.53468998, 0.58279761, 0.18603394],
       [0.56498216, 0.59200997, 0.20027606],
       [0.53212793, 0.60429002, 0.17582905],
       [0.50474777, 0.60782087, 0.16985572],
       [0.48186989, 0.61010318, 0.17019179],
       [0.57017289, 0.6114515 , 0.20931277],
       [0.5364321 , 0.62565416, 0.18285898],
       [0.50638421, 0.63366412, 0.17872098],
       [0.48237124, 0.63816475, 0.18142182],
       [0.57398631, 0.62594793, 0.22155912],
       [0.54427922, 0.6428173 , 0.19770902],
       [0.51716611, 0.64726235, 0.19323866],
       [0.49550774, 0.6499683 , 0.19336389],
       [0.57512528, 0.63672196, 0.23706421],
       [0.55397417, 0.65150359, 0.21775499],
       [0.53727593, 0.65383735, 0.21031583],
       [0.52000723, 0.65770182, 0.20670111]]), 
array([[0.56802887, 0.60250985, 0.34944253],
       [0.53132207, 0.59137744, 0.34292538],
       [0.49572316, 0.60188097, 0.33573266],
       [0.47328403, 0.62413895, 0.32591359],
       [0.46454028, 0.64074109, 0.30549669],
       [0.49526903, 0.64764002, 0.3199572 ],
       [0.46599222, 0.66376217, 0.29363614],
       [0.45486717, 0.66508935, 0.2700846 ],
       [0.44823189, 0.663057  , 0.24861838],
       [0.51577966, 0.66108361, 0.31025374],
       [0.49215595, 0.67836392, 0.27566575],
       [0.48336023, 0.67504445, 0.24672853],
       [0.47885343, 0.67136749, 0.22352431],
       [0.53498583, 0.66422097, 0.30099225],
       [0.51555078, 0.68000147, 0.27003876],
       [0.50632494, 0.67517748, 0.24434139],
       [0.49955107, 0.67154132, 0.22512216],
       [0.55252829, 0.65772819, 0.29191584],
       [0.53725348, 0.66866654, 0.26480445],
       [0.52858907, 0.66907577, 0.24578504],
       [0.52081835, 0.66914294, 0.22914286]]))

kinect2_middle
(array([[ 0.25805524, -0.14330683, -0.70323838],
       [ 0.23755131, -0.15640598, -0.67583703],
       [ 0.21632717, -0.17810507, -0.65924064],
       [ 0.19305602, -0.19927904, -0.66952522],
       [ 0.17236613, -0.20404604, -0.67753551],
       [ 0.19974594, -0.2020707 , -0.69879029],
       [ 0.15872292, -0.21343358, -0.69480874],
       [ 0.13170234, -0.20923564, -0.68773154],
       [ 0.11109021, -0.20073987, -0.68210294],
       [ 0.199793  , -0.19625675, -0.7200688 ],
       [ 0.15662736, -0.20924095, -0.71722308],
       [ 0.1261851 , -0.20253697, -0.71368982],
       [ 0.10450785, -0.19151102, -0.71012645],
       [ 0.20154913, -0.1867693 , -0.73684871],
       [ 0.16171137, -0.19888559, -0.73828475],
       [ 0.13505946, -0.19341821, -0.7323986 ],
       [ 0.11527754, -0.1855755 , -0.72754161],
       [ 0.20344128, -0.17313965, -0.74982706],
       [ 0.17327953, -0.18403933, -0.75301739],
       [ 0.15551203, -0.18502268, -0.74822493],
       [ 0.13789903, -0.18229422, -0.74530153]]), 
array([[ 0.24636096, -0.06459424, -0.73481013],
       [ 0.21662407, -0.05698227, -0.71090262],
       [ 0.17943783, -0.05121156, -0.70726615],
       [ 0.14837729, -0.05308462, -0.71853917],
       [ 0.12793847, -0.06956868, -0.72745343],
       [ 0.1566435 , -0.06746089, -0.74675463],
       [ 0.1166679 , -0.08200674, -0.74722766],
       [ 0.09893953, -0.09997769, -0.74067378],
       [ 0.08705045, -0.11750248, -0.73289643],
       [ 0.16623452, -0.08441807, -0.76453212],
       [ 0.12806941, -0.10875236, -0.766602  ],
       [ 0.11239126, -0.13240845, -0.75563243],
       [ 0.10241367, -0.15226003, -0.74676509],
       [ 0.17873854, -0.10010551, -0.77240924],
       [ 0.14594503, -0.12251127, -0.77513529],
       [ 0.13150476, -0.14293707, -0.76317673],
       [ 0.12082502, -0.15825822, -0.75424971],
       [ 0.19350514, -0.11465806, -0.77084525],
       [ 0.16739107, -0.13480626, -0.77116211],
       [ 0.15360272, -0.14940971, -0.7653657 ],
       [ 0.14148368, -0.16210854, -0.75996026]]))

kinect2_right
(array([[ 0.22794149, -0.34004319,  0.55556067],
       [ 0.22134314, -0.30439049,  0.5608691 ],
       [ 0.21544726, -0.27515871,  0.57840468],
       [ 0.20176543, -0.26881977,  0.6078713 ],
       [ 0.18387391, -0.26669128,  0.62166709],
       [ 0.20256197, -0.29594279,  0.62101284],
       [ 0.17215723, -0.27376277,  0.6412972 ],
       [ 0.1481243 , -0.25900016,  0.64281529],
       [ 0.1275925 , -0.2492314 ,  0.63938954],
       [ 0.19557289, -0.31640372,  0.62538112],
       [ 0.16372857, -0.29391005,  0.64818318],
       [ 0.13486196, -0.28178748,  0.65016326],
       [ 0.11187605, -0.2742406 ,  0.64581987],
       [ 0.18949139, -0.33473868,  0.62415392],
       [ 0.15928782, -0.31748448,  0.64708754],
       [ 0.13478475, -0.3043036 ,  0.64793996],
       [ 0.11509513, -0.29530475,  0.64515381],
       [ 0.18258538, -0.35112857,  0.61771471],
       [ 0.16000008, -0.33934967,  0.63745995],
       [ 0.14581676, -0.3283236 ,  0.64156818],
       [ 0.12979234, -0.32017404,  0.64330348]]), 
array([[ 0.17745971, -0.38850173,  0.50623965],
       [ 0.15319552, -0.3591347 ,  0.49832638],
       [ 0.11881537, -0.34414638,  0.50309058],
       [ 0.08987343, -0.34198408,  0.51899765],
       [ 0.07698495, -0.33698899,  0.54303658],
       [ 0.0972012 , -0.36506344,  0.54101507],
       [ 0.06811461, -0.34614034,  0.56562686],
       [ 0.06155757, -0.32815589,  0.58334037],
       [ 0.06020955, -0.31139042,  0.59837558],
       [ 0.1090279 , -0.37868767,  0.5602526 ],
       [ 0.08534843, -0.35869282,  0.59330569],
       [ 0.0839557 , -0.33578626,  0.61328308],
       [ 0.08551082, -0.31802657,  0.62923432],
       [ 0.12499995, -0.38514145,  0.57320395],
       [ 0.10508567, -0.36830961,  0.60328674],
       [ 0.10362195, -0.34602734,  0.61972146],
       [ 0.10264829, -0.32940694,  0.63202187],
       [ 0.14450407, -0.38454387,  0.58038083],
       [ 0.13002273, -0.3687835 ,  0.60547856],
       [ 0.12533885, -0.35398503,  0.61948015],
       [ 0.12123062, -0.34075045,  0.63153497]]))

The L2 norm of the coordinates looks fair enough, but the distance distributes mainly in different directions in these three cameras, i.e. x,y_axis in kinect2_left, z_axis in kinect2_middle, y,z_axis in kinect2_right. I thought it might because the coordinates are lying in world frame or object frame. How can I rotate the coordinates to make it lie in camera frame?
Thanks in advance

Masks for object margin issue

We can generate masks for objects and hands. ( This issue does not concern the hands masks as they use MANO and I can't use them due to license issue. )

When I am trying to generate masks for objects, there is margin of blue that is left. Can you give some suggestion what might be going wrong ?

result_1

Using grabcut to optimize gives result :
image

My logic :

      # create renderer
      object_renderer = rutils.DepthRenderer(object_name, cp.K(camera_name), camera_name, mesh_scale=1e-3)

      # render object
      object_pose = cp.object_pose(camera_name, frame_idx)
      object_rendering = object_renderer.render(object_pose)

      # create mask from rendering 
      object_mask = object_rendering > 0 

      # color img area of interest
      color_im_bgr_interest = color_im_bgr.copy()
      color_im_bgr_interest[np.logical_not(object_mask)] = 0 

      # due to hand object touch there are parts of hands that are included in 
      # mask.
      # since the objects are of blue color
      # based on mask refine the channel. 
      b, g, r = cv2.split(color_im_bgr_interest)
      color_mask = np.logical_and(b > g, b > r)

      # # dilate the color_mask to cover boundries better. 
      # color_mask = np.uint8(color_mask)
      # kernel = np.ones((5, 5), np.uint8)
      # color_mask = cv2.dilate(color_mask, kernel, iterations=1)

      # # convert to logical bool
      # color_mask = np.array(color_mask, dtype=np.bool)
      
      # object_mask = np.logical_and(object_mask, color_mask)
      object_mask = np.logical_and(object_mask, color_mask)

      object_mask = mutils.grabcut_mask(color_im_bgr, object_mask, n_iters=10

      mask = np.logical_not(object_mask)
     

Mano fits question

Thanks a lot for creating the dataset! One question about the mano fitting, I notice that there is only one fitting result in the json file (mano_fits_15.json) for each sequence, is this the fitting result for the first frame? Do you have the per-frame mano fitting result?

fix annotation errors for participants 31-35

We have noticed that some grasps for p_num in [31, 35] have annotation errors. We are working on fixing them.

This issue will be updated to reflect progress, we anticipate 1 week from today.

Meanwhile, please ignore data from these participants.

data License?

Hi,

Does the MIT license also applies to the data ?
Thanks πŸ‘

-Thibault

Affine transformation for each cameras

Dear @samarth-robo ,

I understand the purpose of affine(basically to map the points in camera space to pixel space correctly). But I do not understand why each camera has different affine and why were the affines designed in the first place?

Is it because of the way the cameras are mounted while capturing the data?

Also affine's does not have just change in axes but also shift in pixels. Could you please provide little more insight on this design?

Thank you very much!

Missing frames in sequence

Hi @samarth-robo ,

In the provided sequences(in the dataset), for a few sequences, a couple of frames are missing in between. Is there a reason why they are missing and a way to find out the details about missing frames for all the sequences?

Eg. seq: full15_use/bowl/images_full/kinect2_left

This information would be very useful to know :)

Looking forward to hearing from you!

Thanks in Advance!

projecting 3D points to images

[from an email query]

Basically, I am trying to project the 3D vertices of objects on to provided 2D images.
I use the following transformations:
​
projected2d_verts = (cam_intrinsics @ (cp.obj_poses[:3, :3] @ obj3d_verts.T) + cp.obj_poses[:3, 3].reshape(3,1))

But I am getting below(wrong) projection(object projection is at the top left which should have been overlapping the mug in the hand)
​
wrong_projection
​
I am not able to find what's wrong here. Would you have any suggestions to rectify this?
​
Thanks a lot in Advance for your time on this issue.

Parse whole dataset

How to access whole dataset in loops ?

We can access a single instance using ContactPose class. However, how to get exhaustive list of p_num, intent, object_name etc. which are parameters of ContactPose ?

This is needed to convert the dataset to desired usage.

Missning sequence for some cameras

Hi @samarth-robo

I downloaded the dataset but for some subjects with certain objects sequences is missing for one of the 3 cameras.
For ex: full3_handoff, scissors, kinect2_left case the color images dir do not exist. Is this intentional? or am I missing something in the download?

Thanks in advance for the responseΒ±

Generate active area

Hi Everyone,

I really appreciate @samarth-robo's work of ContactPose datasets.

I am quite interested in the Data Analysis part of the paper. I am planning to generate active area for each object as you described in this part. The goal is to generate active areas for specific hand parts. But I only find the source code of hand contact probability. Could you please give me some ideas about how to realize automatic active area discovery?

Many thanks!

Create Point Cloud from depth image.

Hi @samarth-robo:

I tried to create point cloud from depth image to match the 3D joint coordinates that got from _cTo@oX (which has been discussed
in #23), but here's a problem about the creation of Point Cloud.
Following is my code:

point_image = o3dio.read_image(depth_dir)
intrinsics = o3dc.PinholeCameraIntrinsic()
intrinsics.intrinsic_matrix = np.dot(cp.A(camera_name), cp.K(camera_name))
extrinsics = cp.object_pose(camera_name,frame_idx)
pc = o3dg.PointCloud.create_from_depth_image(depth=point_image, intrinsic = intrinsics, extrinsic = extrinsics,depth_scale = 1000.0)
print(np.asarray(pc.points))
o3dv.draw_geometries([pc])

The output in terminal is:

[[nan nan nan]
 [nan nan nan]
 [nan nan nan]
 ...
 [nan nan nan]
 [nan nan nan]
 [nan nan nan]]

And nothing is showed on the Open3d window. Do you have any idea about what is the reason for this?

Thanks in advance!

Joint Angles

Hello everyone,

thank you for your great work! As far as I have seen, the coordinates of the joints of the simple hands model are given as cartesian coordinates. Is there also a possibility provided to get them as joint coordinates/angles?

Thank you and best,
Kyra

Background only images

Hi @samarth-robo

Is it possible to release the background only image for the ho3d sequences? I need this to extract only the foreground of each sequence.

Thanks in advance!

Could u provide the url of dataset?

First, thanks for such great work!
I found it's not very convenient to download the dataset using the script.
Personally, it would be better if u provide the download links also.
Thanks!

hand and object Masks

Dear @samarth-robo ,

Is it possible to download the hand and object masks readily? I mean without running rendering scripts for all the sequences.
Is it possible to recover the mask from the depth maps provided?

Thank you!

depth image -> point cloud

(from anonymous user via email)

Recently I have been working with your ContactPose dataset (I appreciate how thorough the analyses are in the paper!). Specifically, I have been trying to convert the provided depth images to pointclouds, but I think I am missing some information about the depth imagers.

The depth images are provided in integer format. Usually when converting images to pointclouds, we multiply values of the depth image by a scalar, which converts the integers to real-world units (meters). Looking through the annotations, I cannot find this scalar.

Do you know what the scale of the depth image is, or how I can find this out myself if I don't have access to a Kinect v2 or the .bag recordings?

speed up image downloads

Transition the API from downloading zip files of images to downloading videos of images. Video compression algorithms can greatly reduce the file sizes and speed up dataset download.

About obtaining the key point coordinates of the hand in the camera coordinate system

First, thank you for your great work! The following is the calculation of the coordinates of the key points of the hand in the camera coordinate system. Is it correct? Do other transformations need to be done? For example, affine transformation.
# X: Nx3
P = _cTo[camera_name][frame_idx][:3]
X = np.vstack((X.T, np.ones(len(X))))
x = P @ X
x = x.T

Mano parameters

@samarth-robo issue continuing from an email conversation

Hi Samarth,
​
I have one more question, it would be a great help if you could clarify it as well. Thanks in Advance! 
1. I see there are only 6 mano_fits_**.json given for each sequence. How can we obtain the mano parameters of hand in each frame? More precisely, Is there a way to get mano axis angle parameters for each frame in the contactpoose dataset?
​
Looking forward to hearing from you!

Hi Anil,
 
Those 6 json files represent 6 different MANO parameter sizes. The MANO model allows you to represent the hand pose with different parameter sizes through PCA.
 
They don't correspond to frame numbers.
 
Regardless of which one you choose from those 6, you can get that hand model for each frame in the sequence. Please see the [mano_meshes()](https://github.com/facebookresearch/ContactPose/blob/main/utilities/dataset.py#L300) function of the ContactPose dataset.
 
These meshes are in the object coordinate frame. Then you can use [ContactPose.object_pose()](https://github.com/facebookresearch/ContactPose/blob/main/utilities/dataset.py#L278) to transform them into the camera coordinate frame.
 
You can see a working example of all this in [this demo notebook](https://github.com/facebookresearch/ContactPose/blob/main/rendering.ipynb), where the posed MANO meshes are used to render hand masks in the image.
 
Please use GitHub issues for these questions, so the answers are publicly documented and others can see them later if they have the same questions.

​

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.