(60, 256, 456, 3) - Shape of video segment. Where the Video consists of 60 frames, 256

Yes, that's correct. <a href="#"

How to convert a stack of RGB images into correct input format? about epic-kitchens-55-action-models HOT 5 CLOSED

epic-kitchens commented on May 15, 2024

How to convert a stack of RGB images into correct input format?

from epic-kitchens-55-action-models.

Comments (5)

willprice commented on May 15, 2024

In the README there is a code snippet that shows how to feed in data to the networks:

mtrn = torch.hub.load(repo, 'MTRN', (125, 352), 8, 'RGB',
                     base_model='resnet50', 
                      pretrained='epic-kitchens')
batch_size = 1
segment_count = 8
snippet_length = 1  # Number of frames composing the snippet, 1 for RGB, 5 for optical flow
snippet_channels = 3  # Number of channels in a frame, 3 for RGB, 2 for optical flow
height, width = 224, 224

inputs = torch.randn(
    [batch_size, segment_count, snippet_length, snippet_channels, height, width]
)
# The segment and snippet length and channel dimensions are collapsed into the channel
# dimension
# Input shape: N x TC x H x W
inputs = inputs.reshape((batch_size, -1, height, width))
# You can get features out of the models
features = mtrn.features(inputs)
# and then classify those features
verb_logits, noun_logits = mtrn.logits(features)

# or just call the object to classify inputs in a single forward pass
verb_logits, noun_logits = mtrn(inputs)
print(verb_logits.shape, noun_logits.shape)

Note that the data is fed in N x TC x H x W format. Your 180 x 224 x 224 is actually (60 x 3) x 224 x 224. You need to introduce a batch dimension (e.g. through unsqueeze(0)) for propagating a single example through the network.

from epic-kitchens-55-action-models.

arnavc1712 commented on May 15, 2024

Thank you. So I did that but I encountered an error "RuntimeError: shape '[-1, 8, 125]' is invalid for input of size 7500".
The error is encountered at this following line

 if self.reshape:
    413                 logits_verb = logits_verb.view(
--> 414                     (-1, self.num_segments) + logits_verb.size()[1:]
    415                 )

I suspect this could be since the input number of frames is not divisible by the num_segments? If so, do we have to add repeated frames/remove frames from input until we get something divisible by num_segments?

from epic-kitchens-55-action-models.

willprice commented on May 15, 2024

That's right, depending on the model you may not be able to feed more/fewer frames than it was trained with. TRN/MTRN has a fixed input size. TSM works best using the same number of frames as it was trained with, TSN can take in a variable number of frames, but you need to initialise the network with same number of segments you are going to feed into it.

from epic-kitchens-55-action-models.

arnavc1712 commented on May 15, 2024

This might be a stupid question, but in my case, I would have to further perform segment based sampling on the (60, 256, 456, 3) video to make it of size (8, 256, 456, 3), before passing it to the dataloader right? Because the dataloader does not seem to perform this sampling.

from epic-kitchens-55-action-models.

willprice commented on May 15, 2024

Yes, that's correct.

…

On Sun, 24 May 2020, 05:48 Arnav Chakravarthy, ***@***.***> wrote: This might be a stupid question, but in my case, I would have to further perform segment based sampling on the (60, 256, 456, 3) video to make it of size (8, 256, 456, 3), before passing it to the dataloader right? Because the dataloader does not seem to perform this sampling. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHL4PLNXOPFYO67USWF5UTRTCRLRANCNFSM4NFV6QPQ> .

from epic-kitchens-55-action-models.

How to convert a stack of RGB images into correct input format? about epic-kitchens-55-action-models HOT 5 CLOSED

Comments (5)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs