shahroudy / nturgb-d Goto Github PK
View Code? Open in Web Editor NEWInfo and sample codes for "NTU RGB+D Action Recognition Dataset"
Info and sample codes for "NTU RGB+D Action Recognition Dataset"
Hi,
Thank you very much for your wonderful work.
I am curious to know what NTU stands for?
Hi Respected community of GitHub,
I am a research student working on Human action recognition and need help in:
Hi, could you provide “FloorClipPlane” data for each camera in each setup so that we can transform body joints from camera coordinates to world coordinates? Thanks.
Hello,
Thanks for your favorable work.
I cannot sent request for the access to the dataset on your website provided by you,
because it shows '500 - Internal server error'.
Could you probably help me to get Action Recognition Dataset--3D skeletons (body joints) which is 5.8 GB ?
I am an undergraduate student in the UK. I would like to use it to do the research..
My email address is [email protected]
Some of the joint coordinates are negative. Can you please explain how these coordinates were calculated/ normalized?
I want to use my data on a model pre trained with NTURGB-D, so i need to get it in the same format
It would be great if I could known how to get the original projectable code from the normalized x, y coordinates? In other words it will be helpful if you can provide the code for normalization of x,y coordinates
repsectively
hello, i got a problem when i use the 4 orientations (orientationW, orentationX, orientationY, orientationZ) in the skeleton data. I want to calculate the Euler rotation of each joint.
So, i think those 4 orientations are every joint quaternions, am I right?
or this 4 orientations are relative quaternion between current joint and last joint?
In the referred paper "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", A base model called P-LSTM has been proposed. How does the proposed model handle the data when there are two persons present in a single frame?
I couldn't find this information in the paper.
Thanks in advance.
I want the following intrinsic parameters for each camera used.
If not available how can I get the default intrinsic parameters?
{
'id': '',
'center': [],
'focal_length': [],
'radial_distortion': [],
'tangential_distortion': [],
'res_w': 0,
'res_h': 0,
'azimuth': 0, # Only used for visualization
}
When I apply to acquire the dataset through the website , I got the following error, everyone know how to solve it?
Microsoft OLE DB Provider for ODBC Drivers error '80040e14'
[Microsoft][ODBC Microsoft Access Driver] Syntax error in INSERT INTO statement.
/datasets/requesterAddProc.asp, line 134
Hello, I'm a graduate student collecting a dataset about skeleton and pose estimation. I want to label the true 3d joint position in the world coordinate. I want to know what is the method that your team label the 3d joint position. did your team use Kinect SDK? thanks.
Hi, Thank for your great job, I want to visualize the NTU 3D joints, but I don't the NTU dataset 3D joints' format. Is it in space coordinate or camera coordinate?
Sincerely hope that you can reply? thank you in advance?
I want to know where I can download the code of APSR-framework mentioned in NTU-RGBD120 paper. Thank you very much.
Hi, shahroudy, I want to know how to set the average RGB value when processing the training data before input to a network. And how to set training data and testing data. Thank you and look forward your reply.
Dear Amir,
Great work on the NTURGB-D dataset! Your papers do not mention how the skeleton was extracted from the RGBD-D data. Could you please shed some light on that?
Thanks and best regards,
Rahul
Hi,
Great datasets!It helped me a lot。
but ,my research direction now is to detect key points based on IR images and depth maps ,Can you provide the original 16 bit IR maps?
Sample: "S001C001P001R001A027"
Action: Jump Up
In the RGB video, there exists one person performing the action. But there exist joint values of two persons in the skeleton data (the file named: S001C001P001R001A027.skeleton). This is the case for a few other samples also like S001C001P001R001A028, S001C001P001R001A024, S001C001P001R001A026, S001C001P001R001A029.
P.S These are the cases that I found so far in examining the data.
Is this an error? If yes, anyways to eliminate this?
Dear all,
sample *.skeleton file contain
52 1 72057594037932358 0 0 0 0 0 0 0.33916 -0.5899565 2 25 0.2410551 -0.171228 4.294691 283.0309 224.2974 1027.245 581.2726 -0.2574076 0.123495 0.9539005 -0.09254229 2 0.2948349 0.1012649 4.225985 288.0122 200.99 1041.686 513.5756 -0.2665294 0.1243754 0.9513921 -0.09135558 2 0.3484591 0.3697312 4.144117 293.2654 177.1354 1056.9 444.3341 -0.2775609 0.1362276 0.9421903 -0.1291483 2 0.3180007 0.4567953 4.108379 290.831 169.0964 1049.855 421.0554 0 0 0 0 2 0.1822655 0.321388 4.28747 278.0624 182.3637 1012.409 459.6171 -0.281578 -0.6474488 0.683909 0.1838272 2 0.07589981 0.1205466 4.396867 268.8376 199.7327 985.4812 510.1192 0.06034974 -0.5202456 0.3134047 0.7921364 2 -0.05407938 0.00915651 4.244675 257.8868 208.9526 954.2135 537.0081 -0.2676778 0.8216244 -0.430342 0.2609363 1 -0.07004753 -0.036596 4.183693 256.4259 212.9325 950.2018 548.5809 -0.34924 0.8069586 -0.2904053 0.3775105 1 0.466928 0.2580718 4.100758 304.1543 186.7378 1088.714 472.0902 -0.09214124 0.7651293 0.5880391 -0.2455546 2 0.5106769 0.02274908 4.083334 308.2432 207.7038 1100.851 532.8464 -0.06122642 0.9532829 0.07727568 0.2855372 2 0.431269 -0.1834366 3.992133 302.0117 226.5302 1083.29 587.5076 0.1165461 0.2440035 -0.23478 0.9336796 2 0.3857652 -0.2527936 4.01325 297.6592 232.7558 1070.653 605.611 0.3048452 0.2731479 0.0439797 0.9113317 2 0.1892584 -0.1588427 4.28756 278.6519 223.2652 1014.547 578.3298 -0.05109064 -0.6076694 0.7827132 -0.1244497 2 -0.08986153 -0.09259083 4.157754 254.648 217.8683 945.1703 562.9314 0.7761323 -0.2254343 -0.05312065 0.5864948 2 -0.02291749 -0.4356488 4.294297 260.5873 246.7992 962.2934 646.7827 -0.1665024 -0.450032 0.1231817 0.8686624 2 -0.08422279 -0.5146305 4.21392 255.2325 254.3715 947.0784 668.7665 0 0 0 0 2 0.287944 -0.1807353 4.230647 287.3916 225.3406 1040.106 584.2456 -0.3058956 0.6702745 0.5890362 -0.3319583 2 0.1274008 -0.2041289 3.941398 274.338 228.6485 1003.222 594.0005 0.3023287 -0.6019208 0.6110057 -0.4158854 2 0.1858304 -0.4719123 4.131247 278.9752 251.4849 1016.218 660.1216 0.2207259 0.8369746 0.222939 0.4483881 2 0.1228096 -0.5504664 4.050951 273.6197 259.4163 1001.016 683.1406 0 0 0 0 2 0.3351774 0.303002 4.167017 291.9237 183.174 1053.013 461.8513 -0.2765343 0.1310133 0.9459615 -0.1073365 2 -0.1119047 -0.07854194 4.138204 252.6663 216.6677 939.4634 559.4663 0 0 0 0 2 -0.04821506 -0.02453206 4.158201 258.3048 211.8932 955.7338 545.5428 0 0 0 0 2 0.336823 -0.3180895 4.01796 293.1675 238.667 1057.662 622.8026 0 0 0 0 2 0.4231775 -0.249493 4.055667 300.6671 232.2204 1079.216 604.0177 0 0 0 0 2 1
I would appreciate if you could add some description about each line
I do not know if data presents joins position coordinates or orientantion (or both).
Best Regards
Is there any information at all on when the action begin and ends? Or have anyone came up with a heuristics to determine which parts of the clip contains the action?
I am asking this because from what i observed, in most clips i see that the action does not immediately begin, but rather, about 1/3 to 1/2 into the clip (these numbers came from like 10 clips, i cannot say anything about the significance). Having the idle frames counted towards the action might add noise to the data..
Does the dataset provide any info regarding at what time the action happened, or do i need to resort to heuristics along the lines of what i mentioned above?
I have seen similar quarries but would try to be more specific.
I want to align the RGB and depth image for the purposes of RGBD training.
Form the first look it seems that the depth image correspond to a rectangular region in the image which seems consistent for the entire setup. i.e. the RGB image can be simply cropped (and resized) to generate the image for which the depth is shared.
Can you provide with the parameters of such crop?
If we know the RGB to depth correspondence. The dataset is very useful for testing depth fusion single view reconstruction and tracking etc.
I understand that sharing the depth maps for full image is not memory efficient but the authors could crop the images instead to align with depth-maps and share a even smaller dataset.
R
Hi, thank you for providing awesome dataset!
I want to re-project the depth image to the 3D point cloud.
How do I get the information of intrinsic parameters of each Kinect?
Thanks in advance for considering!
Now i am only care about a subset of this dataset (e.g. fall down), and because the total dataset is too big and most of them is not useful for me, so in order to save the disk space and download time, is there any way to choose a subset of this dataset to download ?
Dear friend, I can't got the dataset from the given website because I can't submit the information with the problem '500 - Internal server error'. Can you give the login ID and password to download the dataset? My email is [email protected]. And I can send you my information to prove that I will use the dataset for academic research. Thank you very much!
Dear friend, I have a question about preprocessing the RGB data with size of (1920,1024,3),I want to get a (224,224,3) input but for image with two person I don't kown how should I deal with it.
If you have used RGB data directly, how do you preprocessing them?
(2) I want to align RGB and depth frames. Are there any camera calibration data recorded?
Unfortunately no camera calibration info is recorded. However, one applicable solution for this is to use the skeletal data. For each video sample, the skeletal data includes a big number of body joints and their precise locations in both RGB and depth frames. So for each sample you have a big number of mappings. Keep in mind that the cameras were fixed during each setup (Sxxx in the file names mean this sample is from setup xxx). So for each camera at each setup you have a huge number of mappings between RGB and depth cameras (and also between the three sensors!). Finding a transformation between the cameras will be as easy as solving a linear system with a lot of known points!
Is the transformation a linear transformation?
Hi Shahroudy
Could you please provide me the Action-Part Semantic Relevance-aware (APSR) framework code for experimentation.
Thanks
Rama
Hi, thanks for the converter to .npy extension. It looks like the logical check on line 123 wants to be checking if each+'.npy' in alread_exist_dict
, rather than each+'.skeleton.npy'
. For me it was always seeing (e.g.) 'S001C001P001R001A003.skeleton.skeleton.npy', and was always overwriting the numpy files.
I want to use data of only 5 classes, but I don't know how to do it.
For example, I need data of class 0 ,1, 2, 3, 4, how could I implement it? THX
I appreciate your work very much. There are some questions that plague me.
In raw data and the "read_skeleton_file.m", there are some parameter that I can't really understand what they really mean. As follows:
body.clipedEdges
body.handLeftConfidence
body.handLeftState
body.handRightConfidence
body.handRightState
body.isResticted
body.leanX
body.leanY
joint.orientationW
joint.orientationX
joint.orientationY
joint.orientationZ
What does each parameter mean? And how can we get these parameter by sensor?
Look forward to your reply!!!
hello, your design of the dataset is very clever
I appreciate your work very much , here I want to ask 2 questions
looking forward to your answer ,thank you
Hi,
Thank you very much for your wonderful work.
I am curious to know what does NTU stands for?
Hi, I'm interested in human action recognition using skeleton data.
When I look in to a data I see some misleading joint parts due to side view of camera.
Like the image below right hand is occluded but kinect camera estimates a women drinking water with two hands.
So i'm curious if it is okay with this kind of misleading joints data as a input to LSTM model.
hi, i really appreciate your work, but i wonder to know, Is there any way i can download the skeleton data only ?
Hi, Is there any code to register the RGB and depth images? They are of different aspect ratio. Is there any code to get the common areas in both the images?
thanks for your amazing work
Now I am working on NTU RGB-D, but I don't know the baseline of it.
As far as I know, the baseline of it is almost 85.5% (CVPR18), but I am not quite sure.
Anyone who knows it?
Hi, thank you for providing awesome dataset!
I want to re-project 3D skeleton to 2D.
How do I get the information of intrinsic parameters of each Kinect?
Thanks in advance for considering!
Hello, I have a confusion about the tracking ID of the skeleton.
For a subject in a video, is the tracking ID of that subject's skeleton constant for the entire video?
For e.g: In the file S010C003P025R001A060.skeleton, there is one subject initially whose tracking ID is "72057594037930227". When I jump to line 1000, I can see skeleton coordinates for two subjects, whose tracking IDs are "72057594037930243" and "72057594037930244".
What does this mean?
Are there three subjects in the video or the tracking was stopped in between and started again, giving the subjects new tracking IDs?
If the latter is the case, which new ID corresponds to the subject who was there initially in the video?
Please help me out with this as it is becoming difficult to choose the subject who is to be considered for the action if I can't figure out the corresponding subjects for each set of coordinates.
Dear friend, I have submitted the request for the NTU RGB-D dataset online on 1st, February,2019. But I don't receive any email util today(7th,February,2019) . And my Requester ID is A1941. Please let me know if I don't pass the vertification. Maybe the information I filled in the form is not complete enough. I want to reapply the dataset online, but
I can't resubmit the information with the problem '500 - Internal server error'. Can you give the login ID and password to download the dataset? My email is [email protected].
And I can send you my information to prove that I will use the dataset for academic research. Thank you very much!
Thank you for your great work on these two data sets. Here is a question that how do you ensure the accuracy of 25 key points? Does the player need to wear some markers to play the action? If possible, could you please share some details of capturing the dataset or a simple walkthrough introducing how to use kinects V2 to capture data the same as NTU RGB+D.
Hi,
After downloading the new extended dataset, I found that the previous 60 classes of subjects also made new 60 class actions, but the new subject did not do the previous 60 classes, so that when the dataset is divided(eg, cross-setup eval), is there any gap when making predictions for new subjects?
Hi,
Is it possible to add new data (new activity) to ntu dataset? If yes, is there any sample code to generate new skeleton data from videos?
How to get the camera extrinsic parameters for each subject similar to the following format
As you can see for each performer for the three cameras
{
'P001': [
{
'orientation': [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088],
'translation': [1841.1070556640625, 4955.28466796875, 1563.4454345703125],
},
{
'orientation': [0.6157187819480896, -0.764836311340332, -0.14833825826644897, 0.11794740706682205],
'translation': [1761.278564453125, -5078.0068359375, 1606.2650146484375],
},
{
'orientation': [0.14651472866535187, -0.14647851884365082, 0.7653023600578308, -0.6094175577163696],
'translation': [-1846.7777099609375, 5215.04638671875, 1491.972412109375],
},
],
'P002': [
{
'orientation': [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088],
'translation': [1841.1070556640625, 4955.28466796875, 1563.4454345703125],
},
{
'orientation': [0.6157187819480896, -0.764836311340332, -0.14833825826644897, 0.11794740706682205],
'translation': [1761.278564453125, -5078.0068359375, 1606.2650146484375],
},
{
'orientation': [0.14651472866535187, -0.14647851884365082, 0.7653023600578308, -0.6094175577163696],
'translation': [-1846.7777099609375, 5215.04638671875, 1491.972412109375],
},
],
...
}
Hey Amir, are there any plans to upload the code used to get the baseline or the state-of-the-art results from either "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis" or "Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition"?
Hello, i got a question about the relation between 3D joints data and color_x, color_y, since i want to use the inference result from simple baseline which only has the color_x, color_y to replace ground truth NTU 3D joints. However, i don't know how to transfer color_x, color_y to joint.x joint.y.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.