GithubHelp home page GithubHelp logo

actiongenome's People

Contributors

jingweij avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

actiongenome's Issues

How to evaluate Recall@K in multi-label relationship of SGG?

Dear author,
I am reimplementing on this dataset. I found that in this graph, in my statistics, there are at most 5 edges between subject and object, under the circumstance that I am considering this graph as a directional one and there is always <object, spatial relationship, person> pairs and <person, attention/contacting relationships, object>in the dataset. So I am encountering the evaluation problem. Could you explain more detailed evaluation metrics? Thank you so much!
In the original SG evaluation, there are triplets for evaluation. For this Action Genome, should I convert multi-label relationship to several pairs of single relation label triplets for evaluation?
Thank you so much!

Question about the directional information

Hi Jingwei,

thanks for your work! I have a question about the direction in the scene graph. In other scene graph dataset like VG, the relations are annotated as triplet subject-relation-object. I see in the action genome the relation labels are associated with objects. So how did you define the direction when evaluating Re@X?
I also find some opposite relations appear at the same time e.g. {'class': value, 'bbox': value, 'spatial_relationship': ['in front of', 'behind']}. I guess it means human-in front of-object and object-behind-human (or opposite). How can we get this info? Waiting for your reply!

Thanks a lot

Frame Sampling

Hi, can you please provide the script of how to uniformly sample 5 frames from a charade action interval? I'm trying to sample uniformly by myself from charades action interval, but the extracted frame indexes totally cannot match your extracted frame indexes.

Thank you very much!

question about bbox annotation

hi thanks for your wonderful work. i'd like to ask is there any other operation like resize on raw images to fit bbox annotation? i find the boxes are shifting a lot when i try to visualize the annotation.

Question about the dataset

Hi,I have a small doubt:For a frame that is annotated,how to determine the objects?Did you first extract the list of objects included in all the actions of a video, and then label each frame by the annotator to determine which objects in the list are included in the frame?Or you just use the objects occured in the actions of which the interval contains the frame?

faster rcnn on ActionGenome

Aftering training on AG, I found that mAP of faster rcnn with Res101 is quite low. Is this my own problem or the dataset's?

About object detection in this dataset

HI! thanks for your outstanding work. I want to ask is it kind of challenging to obtain a high mAP performance on object detection tasks using this dataset? When using faster-rcnn it can only achieve 11-12 AP on the validation set? Thanks!

Annotation with None

Thank you for providing the annotations!
There are quite a lot of None object annotations, for e.g.

{'attention_relationship': None,
 'bbox': None,
 'class': 'sofa/couch',
 'contacting_relationship': None,
 'metadata': {'set': 'test', 'tag': 'BLLCM.mp4/sofa_couch/000394'},
 'spatial_relationship': None,
 'visible': False}

Are these annotations to be ignored?

Question about the 'visible' in annotations

Thank you for your work. When I load the annotations, I have some doubts about the 'visible' attribute. For example, the relation annotations of 50N4E.mp4/000682.png just like this:

[
{'class': 'light', 'bbox': None, 'attention_relationship': None, 'spatial_relationship': None, 'contacting_relationship': None, 'metadata': {'tag': '50N4E.mp4/light/000682', 'set': 'train'}, 'visible': False}, 
{'class': 'dish', 'bbox': None, 'attention_relationship': None, 'spatial_relationship': None, 'contacting_relationship': None, 'metadata': {'tag': '50N4E.mp4/dish/000682', 'set': 'train'}, 'visible': False}
]

Does this mean there is no bbox and relation in 50N4E.mp4/000682.png? So if this frame is tested, can we just ignore it?

Please avail any pretrained model

Hello sir, happy new year and a request from my side to avail a pretrained model for action genome. So that I can use the pretrain model to predict the graph from the video for my further research work.

Releasing baseline models

Hi @JingweiJ Thanks for the wonderful work. Do you plan to release the baseline models for the proposed tasks, i.e. (few-shot) action recognition, spatial-temporal scene graph prediction? That would greatly facilitate researchers to experiment on this dataset.

bbox format for persons and objects

Is BBox format for objects (x, y, w, h)? And are (x, y) center coordinates here?

e.g. annotation for an object:

{'class': 'food',
  'bbox': (324.82430069930064,
   193.98318348318338,
   6.590909090909065,
   8.636363636363626),
  'attention_relationship': ['looking_at'],
  'spatial_relationship': ['in_front_of'],
  'contacting_relationship': ['holding'],
  'metadata': {'tag': '924QD.mp4/food/000067', 'set': 'train'},
  'visible': True}

while annotation for a person is in (x, y, x, y):

{'bbox': array([[ 75.57577,  78.03209, 212.58168, 467.56796]], dtype=float32),
 'bbox_score': array([0.95631087], dtype=float32),
 'bbox_size': (270, 480),
 'bbox_mode': 'xyxy',
 'keypoints': array([[[168.54407 , 169.3401  ,   1.      ],
         [173.26842 , 170.01521 ,   1.      ],
         [ 85.193184,  96.091156,   1.      ],
         [180.01747 , 183.17976 ,   1.      ],
         [194.19049 , 201.40762 ,   1.      ],
         [168.54407 , 188.91817 ,   1.      ],
         [183.05455 , 212.54686 ,   1.      ],
         [ 98.016396, 198.03209 ,   1.      ],
         [ 99.36621 , 198.36964 ,   1.      ],
         [111.51451 , 114.65656 ,   1.      ],
         [109.4898  , 150.43715 ,   1.      ],
         [129.39952 , 376.5975  ,   1.      ],
         [164.15718 , 368.83377 ,   1.      ],
         [153.69614 , 181.82956 ,   1.      ],
         [153.02124 , 466.38654 ,   1.      ],
         [115.226494, 126.47091 ,   1.      ],
         [114.889046, 126.80846 ,   1.      ]]], dtype=float32),
 'keypoints_logits': array([[ 0.3934058 ,  1.2183307 ,  0.36741984,  1.7435464 ,  2.248969  ,
          3.1777701 ,  1.09344   ,  2.236632  ,  3.1861217 ,  2.8617258 ,
          1.0008469 ,  3.27955   ,  3.3649373 , -1.9560733 , -2.4075575 ,
         -0.4515944 , -1.1781657 ]], dtype=float32)}

Question about Charades Fewshot Split

Can you please provide or point me to where I can find details regarding the fewshot experiments on Charades?

  1. How is the 137/20 action split determined?
  2. How do we sample the k=[1,5,10] instances?

Please let us know if you can provide the exact split files or share how to design the experimental setup.

Thanks

Question about the annotations

Thanks again for the wonderful work. Regarding the persons & objects annotation, I appreciate if you could clarify my questions below:

  • Is there only a single person (the actor who is supposed to conduct action(s)) annotated for each clip, even if multiple persons appear?
  • Is there only a single object of an object class annotated, even if there are multiple instances of the same object class?
  • Is there any problem regarding the annotation, for example the image below is the first annotated frame from 7H7PN.mp4 (7H7PN.mp4/000048.png), the upper-right command line screenshot is the person box annotation and the lower-right one is the objects & relationships annotations. If the annotated person is the person at the left-hand side who is taking out some stuffs from a bag, how can he simultaneously sitting on the chair (from the bbox coordinate we know that's the one at the right-hand side) and sitting on the floor?

image

Question about reproduction

When I reproduce the results, I find that the PredClS is really high, although I use the random predicate score. Is my own problem or the dataset's ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.