GithubHelp home page GithubHelp logo

xiexh20 / behave-dataset Goto Github PK

View Code? Open in Web Editor NEW
138.0 138.0 6.0 4.16 MB

Code to access BEHAVE dataset, CVPR'22

Home Page: https://virtualhumans.mpi-inf.mpg.de/behave/

License: Other

Python 100.00%
3d-reconstruction computer-vision dataset human-object-interaction

behave-dataset's People

Contributors

ale-burzio avatar bharat-b7 avatar ptrvilya avatar xiexh20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

behave-dataset's Issues

Transfer coordinate from Kinect1 to Kinect0

I am trying to transfer the 15th joint from Kinect1 (behave\sequences\Date01_Sub01_backpack_back\t0005.000\k1.color.json) to Kinect0 (behave\sequences\Date01_Sub01_backpack_back\t0005.000\k0.color.json), because 15th joint isn't annotated correctly for Kinect0 in this particular example.

To begin with, what I am doing is, taking the Kinect1 15th joint XY coordinates from Date01_Sub01_backpack_back\t0005.000\k1.color.json, do back-projection to get camera-relative coordinates. Multiply with depth to get 3D coordinates.

image_coordinates = body_joints # [1072.214111328125, 586.692626953125] 
homogeneous_coordinates = [image_coordinates[0], image_coordinates[1], 1] 
# [1072.214111328125, 586.692626953125, 1]

homogeneous_coordinates = np.atleast_2d(homogeneous_coordinates).T
# [[1072.214111328125], 
# [586.692626953125],
# [1]]

_, inv_K = load_intrinsics(camera="1")
# [[0.0010206326776383019,             0.0,                         -1.0399760465284908],
# [0.0,                                   0.0010205747597485165,   -0.7955244457990647]
# [0.0,                                                0.0,                                          1.0               ]

inv_K_homogeneous_coordinates = np.matmul(inv_K, homogeneous_coordinates)
# [[0.05436071291790556],
# [-0.1967607590001531],
# [1.0]]

_3D_camera_relative_coordinates = inv_K_homogeneous_coordinates * depth
# [[133.78171449096558],
# [-484.2282278993768],
# [2461.0]]

Now transfer 3D coordinate to Kinect0

homogeneous_camera_relative_coordinates = np.append(_3D_camera_relative_coordinates, [[1]], axis=0)
# [[133.78171449096558],
# [-484.2282278993768],
# [2461.0],
# [1]]

# inverse of Kinect0 extrinsics
transformation_matrix = [[0.21706500345078444,-0.004850319291858198,0.9761451012424724,-2.4012728998616777],
[0.0011704507569507935,0.9999882298750086,0.004708519562866193,-0.043870251217095806]
[-0.9761564497158927,0.0001204749574813712,0.21706812562844988,1.7749694834160328]]

transformed = np.matmul(transformation_matrix, homogeneous_camera_relative_coordinates)
# [[2431.27981109], [-472.52214717], [ 405.32940583]]

K, _ = load_intrinsics(camera="0")
# [[976.2120971679688, 0, 1017.9580078125], [0, 976.0467529296875, 787.3128662109375], [0, 0, 1]]

temp = np.matmul(K, transformed)
# [[ 2.78605308e+06], [-1.42082651e+05], [ 4.05329406e+02]]

x = temp[0][0] / temp[2][0] # 6873.552813113347
y = temp[1][0] / temp[2][0] # -350.5362530603442

These x, and y values are wrong, because the 15th coordinate in Kinect0 should be within the image

Object Pose Coordinate System in 30 fps Dataset

Hello,

I hope this message finds you well. I have been working with the 30 fps version of the dataset and have come across a discrepancy in the object pose information. As mentioned in the documentation [1], the object registration parameters are saved as axis-angle and translation in the file [obj_name]_fit.pkl. These parameters are used to transform the centered canonical templates to the Kinect camera coordinate system.

However, while examining the dataset, I noticed that for the same sequence and timestamp, the object pose stored in [obj_name]_fit.pkl and object_fit_all.npz files of the sequence Date01_Sub01_backpack_back directory do not match. This has raised some confusion regarding the coordinate system used for the object pose in the object_fit_all.npz file.

Could you please clarify the coordinate system in which the object poses are defined in the object_fit_all.npz file of the 30 fps dataset? I would greatly appreciate any insights or additional information you can provide on this matter.

Thank you for your assistance.

Best regards,
kamzero

[1] Dataset documentation: https://github.com/xiexh20/behave-dataset#parse-object-pose-parameters

Depth end_time bugfix

This issues references PR #25, where a bug that caused a crash in extracting depth in tools video2image.py is fixed.

About the person_fit.pkl and person_fit.ply

Hi, thanks for the great work! When I open the person_fit.ply, I found that the hands are the same in all files, and I check the person_fit.pkl, the pose param size is 156, I guess it contains the hand poses param, is it 156=21x3+15x3+15x3+1x3?(21 means the body pose, 15 means the hand pose, and 1 is the root pose). Besides, following the readme file, you use the V1.2 MANO, but the SMPLH_male(female).pkl in it seems can not fit the hand pose. Could you please figure out what the '156 params' represents and how to fit the mesh with hand poses?

Editing animation of person+object interaction

I would like to combine the person and object point clouds into a single point cloud.

I tried using the open3d library in the behave_demo.py file:

import open3d as o3d
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(fit_meshes_local) #where fit_meshes_local is at line 67 in behave_demo.py
o3d.io.write_point_cloud("/combined.ply", pcd)

But, I get an error
RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details)
since open3d expects dimensions of (N,3) which I don't know how to convert to.

I tried using the concatenate function from psbody.mesh. That too did not work.

Or alternatively, is it possible to load the individual point clouds into a software like Maya or Blender such that the interaction between the two point clouds is retained?

My end goal is to be able to edit the animation in Maya/Blender.

intrinsic and depth back projection

Hi
could you explain a bit why depth backprojection is not directly performed with pixel coordinates and intrinsic matrix ?
(i.e. will be a problem if I doing a normal back-projection as kinectFusion?
e.g.

fp='/behave_dataset/calibs/intrinsics/0/calibration.json'
with open(fp,'r') as fin:
    x=json.load(fin)
fx=x['color']['fx']
fy=x['color']['fy']
cx=x['color']['cx']
cy=x['color']['cy']
H=x['color']['height']
W=x['color']['width']

ix, iy = torch.meshgrid(torch.linspace(0, W-1, W-0), torch.linspace(0, H-1, H-0))
xx = (ix-cx)/fx
yy = (iy-cy)/fy
zz = torch.ones_like(ix)
...

I saw an example here using a pre-computed table.

use precomputed table to convert depth map to point cloud

Inquiry about Undistortion of RGB and Depth Images

Hello,

I'm currently working with the behave-dataset and I'm wondering if the camera calibration parameters provided with the dataset have been used to undistort the RGB and depth images. Whether rgb and depth images in sequences have been undistorted?

I noticed that your calibration.json file provides the extrinsic parameters for color_to_depth. Could you please clarify whether the RGB and depth images are aligned in this case?

Thank you for your help.

Complete sequences

Thanks for releasing this dataset!

One question, the rate for the sequences in the dataset seems to be at 1FPS, is there a way to download the video sequences and perhaps the registration at the original recording rate? I guess something like 30FPS.

Cheers,

No annotations at behave\sequences\Date01_Sub01_backpack_back\t0009.000\k2.color.json

Why there are no annotations? The content of file this file is as follows:

{
  "body_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ],
  "face_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ],
  "left_hand_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ],
  "right_hand_joints": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0
  ]
}

Multiple registration results for one sequence

Hi, thank you for your excellent work!

I found that some sequences like Date02_Sub02_monitor_move and Date04_Sub05_monitor have multiple registration results in behave-30fps-params-v1/(e.g. behave-30fps-params-v1/Date02_Sub02_monitor_move and behave-30fps-params-v1/Date02_Sub02_monitor_move2). I wonder what's the differences between them and which one should be treated as the ground truth.

Thanks a lot!

30fps Registration result of several sequences are misaligned with the rgbd data

Hi,

I found that the registration result in behave-30fps-params-v1/ of the following sequences are misaligned with corresponding rgbd data in time series(e.g. the sequence length are different), and no information about how to align them is released(e.g. the registration result starts at which frame and end at which frame according to the rgbd sequence).
4741673882203_ pic

I hope these sequences can be checked and some information about how to align registration result with rgbd data in time series can be reported. Thank you!

hi, person_fit.ply has 6890 points but readme said it was stored in SMPL+H format

Hi @xiexh20
Thanks for your wonderful work. I have used your dataset for several weeks, and I found something that I don't understand.

  • I take the data out of person_fit.ply and found there are 6890 vertex, but readme said person_fit.ply was stored in SMPL+H format. 6890 vertex is for SMPL format; and SMPL+H format should have more vertex.

  • and if it is SMPL format, is the order of 6890 vertex the same with SMPL format?

  • when i use SMPLX repository to transform SMPLH to SMPL, I found that person_fit.ply don't have face key, only have 'vertex' key, and the code go wrong because it needs 'vertex' key in .ply file.

Maybe I miss something, thanks for any help!

Question: Detailed Computation of SMPL v2v (cm) and obj. v2v (cm)

Hello Maintainers,

I hope this message finds you well. I've been using your dataset for a while now and I've found it to be extremely helpful for my research. However, I've run into a snag with some specific aspects. Specifically, I would like to understand how SMPL v2v (cm) and obj. v2v (cm) are computed: could you elaborate on the underlying preprocessing (e.g. PA) or postprocessing (e.g. averaging) steps involved? What is the metric used?

Thank you in advance for your time and help.

Best Regards,
Lorenzo

I can't see the mesh in the video

Hi,

I have run your code, but the result in the video is like this, I can't see the mesh of kinect1 and kinect2. what's the problem?
微信图片_20220508104755

Confusion regarding extrinsic params

The supplementary paper states that:

"We use checkerboard to calibrate the relative poses between different kinects in a pairwise manner. Specifically, we capture 20 pairs of RGB-D images from two kinects and then register each color image with corresponding depth image such that they have the same resolution. We then use OpenCV to extract the checkerboard corners in the color images and obtain their 3D camera coordinates utilizing the registered depth map. Finally, we perform a Procrustes registration on these ordered 3D checkerboard corners to obtain the relative transformation between two kinects. We obtain 3 pairs of relative transformation for 4 kinects and combine them to compute the transformation under a common world coordinate."

I was hoping that extrinsic parameters are the position of a camera with respect to the world coordinates of checkerboard, but it looks like I got it wrong. Reading the supplementary paper statement about how extrinsic params were obtained confused me even more.

Q1: Can you please explain what the extrinsics for each camera represent in this paper? Are these relative to cam1? because the rotation of cam1 is the identity matrix and the translation matrix is a zero matrix. But what does it mean "We obtain 3 pairs of relative transformation for 4 kinects and combine them to compute the transformation under a common world coordinate"?

{
  "rotation": [
    1.0,
    0.0,
    0.0,
    0.0,
    1.0,
    0.0,
    0.0,
    0.0,
    1.0
  ],
  "translation": [
    0.0,
    0.0,
    0.0
  ]
}

Q2: Are depth and color images already aligned or do I need to transform coordinates of depth image to color camera coordinate system?

Any suggestions on getting the extrinsics of camera 1?

Hi, thanks for making this great dataset public. I'm trying to align the pose of this dataset with the AMASS dataset and found the human pose is not in the world coordinate (by world I mean the upright coordinate).

From this reply, I realized that the word coordinate is set to be the coordinate of camera 1.

Have you calibrated the extrinsics of camera 1 with respect to the surrounding environment? If not, do you have any suggestions on how to acquire this transformation from camera 1 to any upright coordinate?

Thank you in advance!

The keyboard and basketball's mask seems to be missing.

Thank you very much for your BEHAVE dataset.

As with BEHAVE, each frame contains a person and object mask. However, there appear to be few keyboard and basketball masks.

https://virtualhumans.mpi-inf.mpg.de/behave/license.html

Human and object segmentation masks:
part1 (~7GB)
part2 (~10GB)
part3 (~10GB)
part4 (~7GB)

I download human and object segmentation masks from upper URL, but only find few segmentation masks of keyboard and basketball.

Would you upload all mask of keyboard and basketball in the video data?

text description

I wonder if there is any text description paired with each sequence in this dataset since I didn't find any.

Different pose fit?

What is the difference between person_fit.pkl in fit_02 and k[0-3].mocap.json? Also within k[0-3].mocap.json, why does the shape parameters change a lot too? I assumed, that fit_02 had the main SMPLH fits, wheras, indiviual fits would just be a function of the original fit and the extrinsics know from the kinect?

Error when executing behave_demo.py code

(/s/red/a/nobackup/vision/anju/behave/cvenv) carnap:/s/red/a/nobackup/vision/anju/behave/behave-dataset$ behave_demo.py -s /s/red/a/nobackup/vision/anju/datasets/behave/sequences/Date04_Sub05_boxlong -v /s/red/a/nobackup/vision/anju/behave/behave-dataset/visualize_path -vc
./behave_demo.py: line 5: $'\na simple demo script to show how to load different data given a sequence path\nAuthor: Xianghui\nCite: BEHAVE: Dataset and Method for Tracking Human Object Interaction\n': command not found

^C./behave_demo.py: line 7: syntax error near unexpected token os.getcwd' ./behave_demo.py: line 7: sys.path.append(os.getcwd())'

One frame more registration result than rgbd data

Hi,

I found that the following sequences have one frame more registration result than corresponding rgbd data(e.g. both of them start at frame t0003.000, then registration result ends at frame t0050.433 but rgbd data ends at frame t0050.400). Should I just ignore the registration result at t0050.433 or any special operations should be taken to address this problem?
Screen Shot 2023-01-17 at 15 45 50

And I wonder how the "one frame more" registration result is fitted without corresponding rgbd data?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.