xiexh20 / behave-dataset Goto Github PK
View Code? Open in Web Editor NEWCode to access BEHAVE dataset, CVPR'22
Home Page: https://virtualhumans.mpi-inf.mpg.de/behave/
License: Other
Code to access BEHAVE dataset, CVPR'22
Home Page: https://virtualhumans.mpi-inf.mpg.de/behave/
License: Other
What is the depth scale for each of the 4 cameras?
I downloaded the frame timestamps.
How should I extract the frames from the raw color videos? Since it has been captured at 30fps, should I extract it at the same rate?
Hi, I noticed that for some sequences, the number of frames in object_fit_all.pkl and smpl_fit_all.pkl is not the same.
I downloaded the SMPL and object parameters here: https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/behave-30fps-params-v1.tar
Is this intended? The number of object frames is always less than the number of human frames, so should I only be processing up to the number of object frames?
I am trying to transfer the 15th joint from Kinect1 (behave\sequences\Date01_Sub01_backpack_back\t0005.000\k1.color.json
) to Kinect0 (behave\sequences\Date01_Sub01_backpack_back\t0005.000\k0.color.json
), because 15th joint isn't annotated correctly for Kinect0 in this particular example.
To begin with, what I am doing is, taking the Kinect1 15th joint XY coordinates from Date01_Sub01_backpack_back\t0005.000\k1.color.json
, do back-projection to get camera-relative coordinates. Multiply with depth to get 3D coordinates.
image_coordinates = body_joints # [1072.214111328125, 586.692626953125]
homogeneous_coordinates = [image_coordinates[0], image_coordinates[1], 1]
# [1072.214111328125, 586.692626953125, 1]
homogeneous_coordinates = np.atleast_2d(homogeneous_coordinates).T
# [[1072.214111328125],
# [586.692626953125],
# [1]]
_, inv_K = load_intrinsics(camera="1")
# [[0.0010206326776383019, 0.0, -1.0399760465284908],
# [0.0, 0.0010205747597485165, -0.7955244457990647]
# [0.0, 0.0, 1.0 ]
inv_K_homogeneous_coordinates = np.matmul(inv_K, homogeneous_coordinates)
# [[0.05436071291790556],
# [-0.1967607590001531],
# [1.0]]
_3D_camera_relative_coordinates = inv_K_homogeneous_coordinates * depth
# [[133.78171449096558],
# [-484.2282278993768],
# [2461.0]]
Now transfer 3D coordinate to Kinect0
homogeneous_camera_relative_coordinates = np.append(_3D_camera_relative_coordinates, [[1]], axis=0)
# [[133.78171449096558],
# [-484.2282278993768],
# [2461.0],
# [1]]
# inverse of Kinect0 extrinsics
transformation_matrix = [[0.21706500345078444,-0.004850319291858198,0.9761451012424724,-2.4012728998616777],
[0.0011704507569507935,0.9999882298750086,0.004708519562866193,-0.043870251217095806]
[-0.9761564497158927,0.0001204749574813712,0.21706812562844988,1.7749694834160328]]
transformed = np.matmul(transformation_matrix, homogeneous_camera_relative_coordinates)
# [[2431.27981109], [-472.52214717], [ 405.32940583]]
K, _ = load_intrinsics(camera="0")
# [[976.2120971679688, 0, 1017.9580078125], [0, 976.0467529296875, 787.3128662109375], [0, 0, 1]]
temp = np.matmul(K, transformed)
# [[ 2.78605308e+06], [-1.42082651e+05], [ 4.05329406e+02]]
x = temp[0][0] / temp[2][0] # 6873.552813113347
y = temp[1][0] / temp[2][0] # -350.5362530603442
These x, and y values are wrong, because the 15th coordinate in Kinect0 should be within the image
Hello,
I hope this message finds you well. I have been working with the 30 fps version of the dataset and have come across a discrepancy in the object pose information. As mentioned in the documentation [1], the object registration parameters are saved as axis-angle and translation in the file [obj_name]_fit.pkl. These parameters are used to transform the centered canonical templates to the Kinect camera coordinate system.
However, while examining the dataset, I noticed that for the same sequence and timestamp, the object pose stored in [obj_name]_fit.pkl and object_fit_all.npz files of the sequence Date01_Sub01_backpack_back directory do not match. This has raised some confusion regarding the coordinate system used for the object pose in the object_fit_all.npz file.
Could you please clarify the coordinate system in which the object poses are defined in the object_fit_all.npz file of the 30 fps dataset? I would greatly appreciate any insights or additional information you can provide on this matter.
Thank you for your assistance.
Best regards,
kamzero
[1] Dataset documentation: https://github.com/xiexh20/behave-dataset#parse-object-pose-parameters
This issues references PR #25, where a bug that caused a crash in extracting depth in tools video2image.py is fixed.
Hi, thanks for the great work! When I open the person_fit.ply, I found that the hands are the same in all files, and I check the person_fit.pkl, the pose param size is 156, I guess it contains the hand poses param, is it 156=21x3+15x3+15x3+1x3?(21 means the body pose, 15 means the hand pose, and 1 is the root pose). Besides, following the readme file, you use the V1.2 MANO, but the SMPLH_male(female).pkl in it seems can not fit the hand pose. Could you please figure out what the '156 params' represents and how to fit the mesh with hand poses?
I would like to combine the person and object point clouds into a single point cloud.
I tried using the open3d library in the behave_demo.py file:
import open3d as o3d
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(fit_meshes_local) #where fit_meshes_local is at line 67 in behave_demo.py
o3d.io.write_point_cloud("/combined.ply", pcd)
But, I get an error
RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details)
since open3d expects dimensions of (N,3) which I don't know how to convert to.
I tried using the concatenate function from psbody.mesh. That too did not work.
Or alternatively, is it possible to load the individual point clouds into a software like Maya or Blender such that the interaction between the two point clouds is retained?
My end goal is to be able to edit the animation in Maya/Blender.
Hi
could you explain a bit why depth backprojection is not directly performed with pixel coordinates and intrinsic matrix ?
(i.e. will be a problem if I doing a normal back-projection as kinectFusion?
e.g.
fp='/behave_dataset/calibs/intrinsics/0/calibration.json'
with open(fp,'r') as fin:
x=json.load(fin)
fx=x['color']['fx']
fy=x['color']['fy']
cx=x['color']['cx']
cy=x['color']['cy']
H=x['color']['height']
W=x['color']['width']
ix, iy = torch.meshgrid(torch.linspace(0, W-1, W-0), torch.linspace(0, H-1, H-0))
xx = (ix-cx)/fx
yy = (iy-cy)/fy
zz = torch.ones_like(ix)
...
I saw an example here using a pre-computed table.
behave-dataset/data/kinect_calib.py
Line 78 in 953a0d9
Hello,
I'm currently working with the behave-dataset and I'm wondering if the camera calibration parameters provided with the dataset have been used to undistort the RGB and depth images. Whether rgb and depth images in sequences have been undistorted?
I noticed that your calibration.json file provides the extrinsic parameters for color_to_depth. Could you please clarify whether the RGB and depth images are aligned in this case?
Thank you for your help.
Thanks for releasing this dataset!
One question, the rate for the sequences in the dataset seems to be at 1FPS, is there a way to download the video sequences and perhaps the registration at the original recording rate? I guess something like 30FPS.
Cheers,
Hi, Thanks for your dataset.
May I ask how to get 3d joints in the dataset?
Why there are no annotations? The content of file this file is as follows:
{
"body_joints": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0
],
"face_joints": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0
],
"left_hand_joints": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0
],
"right_hand_joints": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0
]
}
Hi, thank you for your excellent work!
I found that some sequences like Date02_Sub02_monitor_move
and Date04_Sub05_monitor
have multiple registration results in behave-30fps-params-v1/
(e.g. behave-30fps-params-v1/Date02_Sub02_monitor_move
and behave-30fps-params-v1/Date02_Sub02_monitor_move2
). I wonder what's the differences between them and which one should be treated as the ground truth.
Thanks a lot!
Hi,
I found that the registration result in behave-30fps-params-v1/
of the following sequences are misaligned with corresponding rgbd data in time series(e.g. the sequence length are different), and no information about how to align them is released(e.g. the registration result starts at which frame and end at which frame according to the rgbd sequence).
I hope these sequences can be checked and some information about how to align registration result with rgbd data in time series can be reported. Thank you!
Hi @xiexh20
Thanks for your wonderful work. I have used your dataset for several weeks, and I found something that I don't understand.
I take the data out of person_fit.ply
and found there are 6890 vertex, but readme said person_fit.ply
was stored in SMPL+H format. 6890 vertex is for SMPL format; and SMPL+H format should have more vertex.
and if it is SMPL format, is the order of 6890 vertex the same with SMPL format?
when i use SMPLX repository to transform SMPLH to SMPL, I found that person_fit.ply
don't have face
key, only have 'vertex' key, and the code go wrong because it needs 'vertex' key in .ply file.
Maybe I miss something, thanks for any help!
Hello Maintainers,
I hope this message finds you well. I've been using your dataset for a while now and I've found it to be extremely helpful for my research. However, I've run into a snag with some specific aspects. Specifically, I would like to understand how SMPL v2v (cm) and obj. v2v (cm) are computed: could you elaborate on the underlying preprocessing (e.g. PA) or postprocessing (e.g. averaging) steps involved? What is the metric used?
Thank you in advance for your time and help.
Best Regards,
Lorenzo
I have pytorch3d version 0.3. I tried it for following sequences, Date01_Sub01_backpack_back, Date03_Sub03_stool_lift, Date03_Sub04_backpack_hand, Date03_Sub05_basketball etc. For all, I was not able to see the mesh in both kinect1 and kincet2.
I am using conda env. This is my environment.yml file.
https://drive.google.com/file/d/1dwj81iFLQdIVc2m77KVS3WV0N38M38Zh/view?usp=sharing
@xiexh20 Can you please look into this?
The supplementary paper states that:
"We use checkerboard to calibrate the relative poses between different kinects in a pairwise manner. Specifically, we capture 20 pairs of RGB-D images from two kinects and then register each color image with corresponding depth image such that they have the same resolution. We then use OpenCV to extract the checkerboard corners in the color images and obtain their 3D camera coordinates utilizing the registered depth map. Finally, we perform a Procrustes registration on these ordered 3D checkerboard corners to obtain the relative transformation between two kinects. We obtain 3 pairs of relative transformation for 4 kinects and combine them to compute the transformation under a common world coordinate."
I was hoping that extrinsic parameters are the position of a camera with respect to the world coordinates of checkerboard, but it looks like I got it wrong. Reading the supplementary paper statement about how extrinsic params were obtained confused me even more.
Q1: Can you please explain what the extrinsics for each camera represent in this paper? Are these relative to cam1? because the rotation of cam1 is the identity matrix and the translation matrix is a zero matrix. But what does it mean "We obtain 3 pairs of relative transformation for 4 kinects and combine them to compute the transformation under a common world coordinate"?
{
"rotation": [
1.0,
0.0,
0.0,
0.0,
1.0,
0.0,
0.0,
0.0,
1.0
],
"translation": [
0.0,
0.0,
0.0
]
}
Q2: Are depth and color images already aligned or do I need to transform coordinates of depth image to color camera coordinate system?
Hi, thanks for making this great dataset public. I'm trying to align the pose of this dataset with the AMASS dataset and found the human pose is not in the world coordinate (by world I mean the upright coordinate).
From this reply, I realized that the word coordinate is set to be the coordinate of camera 1.
Have you calibrated the extrinsics of camera 1 with respect to the surrounding environment? If not, do you have any suggestions on how to acquire this transformation from camera 1 to any upright coordinate?
Thank you in advance!
Thank you very much for your BEHAVE dataset.
As with BEHAVE, each frame contains a person and object mask. However, there appear to be few keyboard and basketball masks.
https://virtualhumans.mpi-inf.mpg.de/behave/license.html
Human and object segmentation masks:
part1 (~7GB)
part2 (~10GB)
part3 (~10GB)
part4 (~7GB)
I download human and object segmentation masks from upper URL, but only find few segmentation masks of keyboard and basketball.
Would you upload all mask of keyboard and basketball in the video data?
I wonder if there is any text description paired with each sequence in this dataset since I didn't find any.
What is the difference between person_fit.pkl
in fit_02
and k[0-3].mocap.json
? Also within k[0-3].mocap.json
, why does the shape parameters change a lot too? I assumed, that fit_02
had the main SMPLH fits, wheras, indiviual fits would just be a function of the original fit and the extrinsics know from the kinect?
(/s/red/a/nobackup/vision/anju/behave/cvenv) carnap:/s/red/a/nobackup/vision/anju/behave/behave-dataset$ behave_demo.py -s /s/red/a/nobackup/vision/anju/datasets/behave/sequences/Date04_Sub05_boxlong -v /s/red/a/nobackup/vision/anju/behave/behave-dataset/visualize_path -vc
./behave_demo.py: line 5: $'\na simple demo script to show how to load different data given a sequence path\nAuthor: Xianghui\nCite: BEHAVE: Dataset and Method for Tracking Human Object Interaction\n': command not found
^C./behave_demo.py: line 7: syntax error near unexpected token os.getcwd' ./behave_demo.py: line 7:
sys.path.append(os.getcwd())'
I am using a very simple camera matrix
[[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]]
Should any other distortion parameters be incorporated like tangential and radial distortion?
Hi,
I found that the following sequences have one frame more registration result than corresponding rgbd data(e.g. both of them start at frame t0003.000
, then registration result ends at frame t0050.433
but rgbd data ends at frame t0050.400
). Should I just ignore the registration result at t0050.433
or any special operations should be taken to address this problem?
And I wonder how the "one frame more" registration result is fitted without corresponding rgbd data?
Thanks a lot!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.