jingweij / actiongenome Goto Github PK
View Code? Open in Web Editor NEWA video database bridging human actions and human-object relationships
License: MIT License
A video database bridging human actions and human-object relationships
License: MIT License
Dear author,
I am reimplementing on this dataset. I found that in this graph, in my statistics, there are at most 5 edges between subject and object, under the circumstance that I am considering this graph as a directional one and there is always <object, spatial relationship, person> pairs and <person, attention/contacting relationships, object>in the dataset. So I am encountering the evaluation problem. Could you explain more detailed evaluation metrics? Thank you so much!
In the original SG evaluation, there are triplets for evaluation. For this Action Genome, should I convert multi-label relationship to several pairs of single relation label triplets for evaluation?
Thank you so much!
Hi Jingwei,
thanks for your work! I have a question about the direction in the scene graph. In other scene graph dataset like VG, the relations are annotated as triplet subject-relation-object. I see in the action genome the relation labels are associated with objects. So how did you define the direction when evaluating Re@X?
I also find some opposite relations appear at the same time e.g. {'class': value, 'bbox': value, 'spatial_relationship': ['in front of', 'behind']}. I guess it means human-in front of-object and object-behind-human (or opposite). How can we get this info? Waiting for your reply!
Thanks a lot
Hi, can you please provide the script of how to uniformly sample 5 frames from a charade action interval? I'm trying to sample uniformly by myself from charades action interval, but the extracted frame indexes totally cannot match your extracted frame indexes.
Thank you very much!
hi thanks for your wonderful work. i'd like to ask is there any other operation like resize on raw images to fit bbox annotation? i find the boxes are shifting a lot when i try to visualize the annotation.
Hi,I have a small doubt:For a frame that is annotated,how to determine the objects?Did you first extract the list of objects included in all the actions of a video, and then label each frame by the annotator to determine which objects in the list are included in the frame?Or you just use the objects occured in the actions of which the interval contains the frame?
Hi, it is a nice work!
Can you share the code of "Detecting Human-Object Relationships in Videos" ?
Thanks!
Aftering training on AG, I found that mAP of faster rcnn with Res101 is quite low. Is this my own problem or the dataset's?
HI! thanks for your outstanding work. I want to ask is it kind of challenging to obtain a high mAP performance on object detection tasks using this dataset? When using faster-rcnn it can only achieve 11-12 AP on the validation set? Thanks!
Thank you for providing the annotations!
There are quite a lot of None
object annotations, for e.g.
{'attention_relationship': None,
'bbox': None,
'class': 'sofa/couch',
'contacting_relationship': None,
'metadata': {'set': 'test', 'tag': 'BLLCM.mp4/sofa_couch/000394'},
'spatial_relationship': None,
'visible': False}
Are these annotations to be ignored?
Thank you for your work. When I load the annotations, I have some doubts about the 'visible' attribute. For example, the relation annotations of 50N4E.mp4/000682.png
just like this:
[
{'class': 'light', 'bbox': None, 'attention_relationship': None, 'spatial_relationship': None, 'contacting_relationship': None, 'metadata': {'tag': '50N4E.mp4/light/000682', 'set': 'train'}, 'visible': False},
{'class': 'dish', 'bbox': None, 'attention_relationship': None, 'spatial_relationship': None, 'contacting_relationship': None, 'metadata': {'tag': '50N4E.mp4/dish/000682', 'set': 'train'}, 'visible': False}
]
Does this mean there is no bbox and relation in 50N4E.mp4/000682.png
? So if this frame is tested, can we just ignore it?
Thanks for your great job!
In the dataset, I dont know the meaning of keypoints_logits .
Hello sir, happy new year and a request from my side to avail a pretrained model for action genome. So that I can use the pretrain model to predict the graph from the video for my further research work.
It'd be great to have the evaluation code - Recall@k metric for the SGG tasks.
I've noticed that all labelled objects have some relationship with the person. Will objects that have no relationship with the person be labelled?
Hi @JingweiJ Thanks for the wonderful work. Do you plan to release the baseline models for the proposed tasks, i.e. (few-shot) action recognition, spatial-temporal scene graph prediction? That would greatly facilitate researchers to experiment on this dataset.
Is BBox format for objects (x, y, w, h)
? And are (x, y)
center coordinates here?
e.g. annotation for an object:
{'class': 'food',
'bbox': (324.82430069930064,
193.98318348318338,
6.590909090909065,
8.636363636363626),
'attention_relationship': ['looking_at'],
'spatial_relationship': ['in_front_of'],
'contacting_relationship': ['holding'],
'metadata': {'tag': '924QD.mp4/food/000067', 'set': 'train'},
'visible': True}
while annotation for a person is in (x, y, x, y)
:
{'bbox': array([[ 75.57577, 78.03209, 212.58168, 467.56796]], dtype=float32),
'bbox_score': array([0.95631087], dtype=float32),
'bbox_size': (270, 480),
'bbox_mode': 'xyxy',
'keypoints': array([[[168.54407 , 169.3401 , 1. ],
[173.26842 , 170.01521 , 1. ],
[ 85.193184, 96.091156, 1. ],
[180.01747 , 183.17976 , 1. ],
[194.19049 , 201.40762 , 1. ],
[168.54407 , 188.91817 , 1. ],
[183.05455 , 212.54686 , 1. ],
[ 98.016396, 198.03209 , 1. ],
[ 99.36621 , 198.36964 , 1. ],
[111.51451 , 114.65656 , 1. ],
[109.4898 , 150.43715 , 1. ],
[129.39952 , 376.5975 , 1. ],
[164.15718 , 368.83377 , 1. ],
[153.69614 , 181.82956 , 1. ],
[153.02124 , 466.38654 , 1. ],
[115.226494, 126.47091 , 1. ],
[114.889046, 126.80846 , 1. ]]], dtype=float32),
'keypoints_logits': array([[ 0.3934058 , 1.2183307 , 0.36741984, 1.7435464 , 2.248969 ,
3.1777701 , 1.09344 , 2.236632 , 3.1861217 , 2.8617258 ,
1.0008469 , 3.27955 , 3.3649373 , -1.9560733 , -2.4075575 ,
-0.4515944 , -1.1781657 ]], dtype=float32)}
Can you please provide or point me to where I can find details regarding the fewshot experiments on Charades?
Please let us know if you can provide the exact split files or share how to design the experimental setup.
Thanks
Thanks again for the wonderful work. Regarding the persons & objects annotation, I appreciate if you could clarify my questions below:
7H7PN.mp4
(7H7PN.mp4/000048.png
), the upper-right command line screenshot is the person box annotation and the lower-right one is the objects & relationships annotations. If the annotated person is the person at the left-hand side who is taking out some stuffs from a bag, how can he simultaneously sitting on
the chair
(from the bbox coordinate we know that's the one at the right-hand side) and sitting on
the floor
?When I reproduce the results, I find that the PredClS is really high, although I use the random predicate score. Is my own problem or the dataset's ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.