xiexh20 / behave-dataset Goto Github PK

View Code? Open in Web Editor NEW

138.0 138.0 6.0 4.16 MB

Code to access BEHAVE dataset, CVPR'22

Home Page: https://virtualhumans.mpi-inf.mpg.de/behave/

License: Other

Python 100.00%

3d-reconstruction computer-vision dataset human-object-interaction

behave-dataset's People

Contributors

Stargazers

Watchers

Forkers

bharat-b7 hzhang57 aycatakmaz arthur151 jingyi-zhang ale-burzio bruinxiong

behave-dataset's Issues

Where is the depth scale for each camera mentioned?

What is the depth scale for each of the 4 cameras?

Extracting frames from videos

I downloaded the frame timestamps.

frame timestamps

How should I extract the frames from the raw color videos? Since it has been captured at 30fps, should I extract it at the same rate?

Mismatch between number of frames for object_fit_all.pkl and smpl_fit_all.pkl

Hi, I noticed that for some sequences, the number of frames in object_fit_all.pkl and smpl_fit_all.pkl is not the same.

I downloaded the SMPL and object parameters here: https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/behave-30fps-params-v1.tar

Is this intended? The number of object frames is always less than the number of human frames, so should I only be processing up to the number of object frames?

Transfer coordinate from Kinect1 to Kinect0

I am trying to transfer the 15th joint from Kinect1 (behave\sequences\Date01_Sub01_backpack_back\t0005.000\k1.color.json) to Kinect0 (behave\sequences\Date01_Sub01_backpack_back\t0005.000\k0.color.json), because 15th joint isn't annotated correctly for Kinect0 in this particular example.

To begin with, what I am doing is, taking the Kinect1 15th joint XY coordinates from Date01_Sub01_backpack_back\t0005.000\k1.color.json, do back-projection to get camera-relative coordinates. Multiply with depth to get 3D coordinates.

image_coordinates = body_joints # [1072.214111328125, 586.692626953125] 
homogeneous_coordinates = [image_coordinates[0], image_coordinates[1], 1] 
# [1072.214111328125, 586.692626953125, 1]

homogeneous_coordinates = np.atleast_2d(homogeneous_coordinates).T
# [[1072.214111328125], 
# [586.692626953125],
# [1]]

_, inv_K = load_intrinsics(camera="1")
# [[0.0010206326776383019,             0.0,                         -1.0399760465284908],
# [0.0,                                   0.0010205747597485165,   -0.7955244457990647]
# [0.0,                                                0.0,                                          1.0               ]

inv_K_homogeneous_coordinates = np.matmul(inv_K, homogeneous_coordinates)
# [[0.05436071291790556],
# [-0.1967607590001531],
# [1.0]]

_3D_camera_relative_coordinates = inv_K_homogeneous_coordinates * depth
# [[133.78171449096558],
# [-484.2282278993768],
# [2461.0]]

Now transfer 3D coordinate to Kinect0

homogeneous_camera_relative_coordinates = np.append(_3D_camera_relative_coordinates, [[1]], axis=0)
# [[133.78171449096558],
# [-484.2282278993768],
# [2461.0],
# [1]]

# inverse of Kinect0 extrinsics
transformation_matrix = [[0.21706500345078444,-0.004850319291858198,0.9761451012424724,-2.4012728998616777],
[0.0011704507569507935,0.9999882298750086,0.004708519562866193,-0.043870251217095806]
[-0.9761564497158927,0.0001204749574813712,0.21706812562844988,1.7749694834160328]]

transformed = np.matmul(transformation_matrix, homogeneous_camera_relative_coordinates)
# [[2431.27981109], [-472.52214717], [ 405.32940583]]

K, _ = load_intrinsics(camera="0")
# [[976.2120971679688, 0, 1017.9580078125], [0, 976.0467529296875, 787.3128662109375], [0, 0, 1]]

temp = np.matmul(K, transformed)
# [[ 2.78605308e+06], [-1.42082651e+05], [ 4.05329406e+02]]

x = temp[0][0] / temp[2][0] # 6873.552813113347
y = temp[1][0] / temp[2][0] # -350.5362530603442

These x, and y values are wrong, because the 15th coordinate in Kinect0 should be within the image

Object Pose Coordinate System in 30 fps Dataset

Hello,

I hope this message finds you well. I have been working with the 30 fps version of the dataset and have come across a discrepancy in the object pose information. As mentioned in the documentation [1], the object registration parameters are saved as axis-angle and translation in the file [obj_name]_fit.pkl. These parameters are used to transform the centered canonical templates to the Kinect camera coordinate system.

However, while examining the dataset, I noticed that for the same sequence and timestamp, the object pose stored in [obj_name]_fit.pkl and object_fit_all.npz files of the sequence Date01_Sub01_backpack_back directory do not match. This has raised some confusion regarding the coordinate system used for the object pose in the object_fit_all.npz file.

Could you please clarify the coordinate system in which the object poses are defined in the object_fit_all.npz file of the 30 fps dataset? I would greatly appreciate any insights or additional information you can provide on this matter.

Thank you for your assistance.

Best regards,
kamzero

[1] Dataset documentation: https://github.com/xiexh20/behave-dataset#parse-object-pose-parameters

Depth end_time bugfix

This issues references PR #25, where a bug that caused a crash in extracting depth in tools video2image.py is fixed.

About the person_fit.pkl and person_fit.ply

Hi, thanks for the great work! When I open the person_fit.ply, I found that the hands are the same in all files, and I check the person_fit.pkl, the pose param size is 156, I guess it contains the hand poses param, is it 156=21x3+15x3+15x3+1x3?(21 means the body pose, 15 means the hand pose, and 1 is the root pose). Besides, following the readme file, you use the V1.2 MANO, but the SMPLH_male(female).pkl in it seems can not fit the hand pose. Could you please figure out what the '156 params' represents and how to fit the mesh with hand poses?

Editing animation of person+object interaction

I would like to combine the person and object point clouds into a single point cloud.

I tried using the open3d library in the behave_demo.py file:

import open3d as o3d
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(fit_meshes_local) #where fit_meshes_local is at line 67 in behave_demo.py
o3d.io.write_point_cloud("/combined.ply", pcd)

But, I get an error
RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details)
since open3d expects dimensions of (N,3) which I don't know how to convert to.

I tried using the concatenate function from psbody.mesh. That too did not work.

Or alternatively, is it possible to load the individual point clouds into a software like Maya or Blender such that the interaction between the two point clouds is retained?

My end goal is to be able to edit the animation in Maya/Blender.

intrinsic and depth back projection

Hi
could you explain a bit why depth backprojection is not directly performed with pixel coordinates and intrinsic matrix ?
(i.e. will be a problem if I doing a normal back-projection as kinectFusion?
e.g.

fp='/behave_dataset/calibs/intrinsics/0/calibration.json'
with open(fp,'r') as fin:
    x=json.load(fin)
fx=x['color']['fx']
fy=x['color']['fy']
cx=x['color']['cx']
cy=x['color']['cy']
H=x['color']['height']
W=x['color']['width']

ix, iy = torch.meshgrid(torch.linspace(0, W-1, W-0), torch.linspace(0, H-1, H-0))
xx = (ix-cx)/fx
yy = (iy-cy)/fy
zz = torch.ones_like(ix)
...

I saw an example here using a pre-computed table.

behave-dataset/data/kinect_calib.py

Line 78 in 953a0d9

use precomputed table to convert depth map to point cloud

Inquiry about Undistortion of RGB and Depth Images

Hello,

I'm currently working with the behave-dataset and I'm wondering if the camera calibration parameters provided with the dataset have been used to undistort the RGB and depth images. Whether rgb and depth images in sequences have been undistorted？

I noticed that your calibration.json file provides the extrinsic parameters for color_to_depth. Could you please clarify whether the RGB and depth images are aligned in this case?

Thank you for your help.

Fail to run video2image.py with -nodepth arg on Date06_Sub07_backpack_back

Hi,

I tried to run video2image.py to extract images from videos, and all things work well except Date06_Sub07_backpack_back.

The error info shows that:

Is this due to synchronization problem or my mistake? Could you please have a check on this sequence?

Thanks a lot!

libc10_cuda.so issue

Complete sequences

Thanks for releasing this dataset!

One question, the rate for the sequences in the dataset seems to be at 1FPS, is there a way to download the video sequences and perhaps the registration at the original recording rate? I guess something like 30FPS.

Cheers,

Will you release the train/test code of BEHAVE?

Questions about pose

Hi, Thanks for your dataset.

May I ask how to get 3d joints in the dataset?

No annotations at behave\sequences\Date01_Sub01_backpack_back\t0009.000\k2.color.json

Why there are no annotations? The content of file this file is as follows:

{
  "body_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ],
  "face_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ],
  "left_hand_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ],
  "right_hand_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ]
}

Multiple registration results for one sequence

Hi, thank you for your excellent work!

I found that some sequences like Date02_Sub02_monitor_move and Date04_Sub05_monitor have multiple registration results in behave-30fps-params-v1/(e.g. behave-30fps-params-v1/Date02_Sub02_monitor_move and behave-30fps-params-v1/Date02_Sub02_monitor_move2). I wonder what's the differences between them and which one should be treated as the ground truth.

Thanks a lot!

30fps Registration result of several sequences are misaligned with the rgbd data

Hi,

I found that the registration result in behave-30fps-params-v1/ of the following sequences are misaligned with corresponding rgbd data in time series(e.g. the sequence length are different), and no information about how to align them is released(e.g. the registration result starts at which frame and end at which frame according to the rgbd sequence).

I hope these sequences can be checked and some information about how to align registration result with rgbd data in time series can be reported. Thank you!

Recover mesh using SMPLH parameters, not exactly match to the provided person_fit.ply

Hi there,

I tried to convert the person_fit.pkl to a .obj file, but the recovered mesh is a bit lower than the given person_fit.ply. Could you share you code for this conversion? not sure why this happens... I used the SMPLH neutral model. Thanks!

hi, person_fit.ply has 6890 points but readme said it was stored in SMPL+H format

Hi @xiexh20
Thanks for your wonderful work. I have used your dataset for several weeks, and I found something that I don't understand.

I take the data out of person_fit.ply and found there are 6890 vertex, but readme said person_fit.ply was stored in SMPL+H format. 6890 vertex is for SMPL format; and SMPL+H format should have more vertex.
and if it is SMPL format, is the order of 6890 vertex the same with SMPL format?
when i use SMPLX repository to transform SMPLH to SMPL, I found that person_fit.ply don't have face key, only have 'vertex' key, and the code go wrong because it needs 'vertex' key in .ply file.

Maybe I miss something, thanks for any help!

Question: Detailed Computation of SMPL v2v (cm) and obj. v2v (cm)

Hello Maintainers,

I hope this message finds you well. I've been using your dataset for a while now and I've found it to be extremely helpful for my research. However, I've run into a snag with some specific aspects. Specifically, I would like to understand how SMPL v2v (cm) and obj. v2v (cm) are computed: could you elaborate on the underlying preprocessing (e.g. PA) or postprocessing (e.g. averaging) steps involved? What is the metric used?

Thank you in advance for your time and help.

Best Regards,
Lorenzo

Not able to see the output in kinect1 and kinect2

I have pytorch3d version 0.3. I tried it for following sequences, Date01_Sub01_backpack_back, Date03_Sub03_stool_lift, Date03_Sub04_backpack_hand, Date03_Sub05_basketball etc. For all, I was not able to see the mesh in both kinect1 and kincet2.

I am using conda env. This is my environment.yml file.
https://drive.google.com/file/d/1dwj81iFLQdIVc2m77KVS3WV0N38M38Zh/view?usp=sharing
@xiexh20 Can you please look into this?

I can't see the mesh in the video

Hi,

I have run your code, but the result in the video is like this, I can't see the mesh of kinect1 and kinect2. what's the problem?

Confusion regarding extrinsic params

The supplementary paper states that:

"We use checkerboard to calibrate the relative poses between different kinects in a pairwise manner. Specifically, we capture 20 pairs of RGB-D images from two kinects and then register each color image with corresponding depth image such that they have the same resolution. We then use OpenCV to extract the checkerboard corners in the color images and obtain their 3D camera coordinates utilizing the registered depth map. Finally, we perform a Procrustes registration on these ordered 3D checkerboard corners to obtain the relative transformation between two kinects. We obtain 3 pairs of relative transformation for 4 kinects and combine them to compute the transformation under a common world coordinate."

I was hoping that extrinsic parameters are the position of a camera with respect to the world coordinates of checkerboard, but it looks like I got it wrong. Reading the supplementary paper statement about how extrinsic params were obtained confused me even more.

Q1: Can you please explain what the extrinsics for each camera represent in this paper? Are these relative to cam1? because the rotation of cam1 is the identity matrix and the translation matrix is a zero matrix. But what does it mean "We obtain 3 pairs of relative transformation for 4 kinects and combine them to compute the transformation under a common world coordinate"?

{
  "rotation": [
    1.0,
    0.0,
    0.0,
    0.0,
    1.0,
    0.0,
    0.0,
    0.0,
    1.0
  ],
  "translation": [
    0.0,
    0.0,
    0.0
  ]
}

Q2: Are depth and color images already aligned or do I need to transform coordinates of depth image to color camera coordinate system?

Any suggestions on getting the extrinsics of camera 1?

Hi, thanks for making this great dataset public. I'm trying to align the pose of this dataset with the AMASS dataset and found the human pose is not in the world coordinate (by world I mean the upright coordinate).

From this reply, I realized that the word coordinate is set to be the coordinate of camera 1.

Have you calibrated the extrinsics of camera 1 with respect to the surrounding environment? If not, do you have any suggestions on how to acquire this transformation from camera 1 to any upright coordinate?

Thank you in advance!

The keyboard and basketball's mask seems to be missing.

Thank you very much for your BEHAVE dataset.

As with BEHAVE, each frame contains a person and object mask. However, there appear to be few keyboard and basketball masks.

https://virtualhumans.mpi-inf.mpg.de/behave/license.html

Human and object segmentation masks:
part1 (~7GB)
part2 (~10GB)
part3 (~10GB)
part4 (~7GB)

I download human and object segmentation masks from upper URL, but only find few segmentation masks of keyboard and basketball.

Would you upload all mask of keyboard and basketball in the video data?

Regarding the hand pose parts in the provided pose parameters

Hi, thanks for the excellent work!
I'm wondering if the hand pose parameters in the person_fit.pkl['pose'][66:156] useful in the dataset? or should I ignore these parameters and focus on the body poses?

text description

I wonder if there is any text description paired with each sequence in this dataset since I didn't find any.

Different pose fit?

What is the difference between person_fit.pkl in fit_02 and k[0-3].mocap.json? Also within k[0-3].mocap.json, why does the shape parameters change a lot too? I assumed, that fit_02 had the main SMPLH fits, wheras, indiviual fits would just be a function of the original fit and the extrinsics know from the kinect?

Error when executing behave_demo.py code

(/s/red/a/nobackup/vision/anju/behave/cvenv) carnap:/s/red/a/nobackup/vision/anju/behave/behave-dataset$ behave_demo.py -s /s/red/a/nobackup/vision/anju/datasets/behave/sequences/Date04_Sub05_boxlong -v /s/red/a/nobackup/vision/anju/behave/behave-dataset/visualize_path -vc
./behave_demo.py: line 5: $'\na simple demo script to show how to load different data given a sequence path\nAuthor: Xianghui\nCite: BEHAVE: Dataset and Method for Tracking Human Object Interaction\n': command not found

^C./behave_demo.py: line 7: syntax error near unexpected token os.getcwd' ./behave_demo.py: line 7: sys.path.append(os.getcwd())'

Camera matrix - Should the distortion params be incorporated?

I am using a very simple camera matrix
[[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]]

Should any other distortion parameters be incorporated like tangential and radial distortion?

One frame more registration result than rgbd data

Hi,

I found that the following sequences have one frame more registration result than corresponding rgbd data(e.g. both of them start at frame t0003.000, then registration result ends at frame t0050.433 but rgbd data ends at frame t0050.400). Should I just ignore the registration result at t0050.433 or any special operations should be taken to address this problem?

And I wonder how the "one frame more" registration result is fitted without corresponding rgbd data?

Thanks a lot!

xiexh20 / behave-dataset Goto Github PK

behave-dataset's People

Contributors

Stargazers

Watchers

Forkers

behave-dataset's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs