Since the training used fixed length of the feature's temporal dimension (~100), is th

about the length of temporal dimension? about bmn-boundary-matching-network HOT 10 CLOSED

jjboy commented on June 2, 2024

about the length of temporal dimension?

from bmn-boundary-matching-network.

Comments (10)

vhvkhoa commented on June 2, 2024 1

After reading through the code for sample_mask generation, I think it is kind of an ineffective way because we just care about 32 points in an anchor (selectively chosen) but we have to generate a whole weighting mask for every features on temporal dimension of the video.
I think there should be a better way to do it by some select functions of pytorch, I hope there will be an implementation on this part to make it more effective.

Currently I have dropped the PEM part of the code to test on my own problem using only the start and end scores and it saves me a lot of time, memory and thoughts for editing the code. According to the paper I think dropping PEM part doesn't decrease too much of the result so I hope it will work fine to me.

from bmn-boundary-matching-network.

vhvkhoa commented on June 2, 2024

I have tried to modify the code for arbitrary length videos but failed because the code has to build the ground truth confidence map which has size of BxTxT, and a sample mask for generating output confidence map which has an even bigger size (Tx(NTT)) and it destroys all the training process quickly.

from bmn-boundary-matching-network.

semchan commented on June 2, 2024

I have tried to modify the code for arbitrary length videos but failed because the code has to build the ground truth confidence map which has size of BxTxT, and a sample mask for generating output confidence map which has an even bigger size (Tx(N_T_T)) and it destroys all the training process quickly.

you are right. one of the solution is rescale the feature's temporal dimension to 100 for inference, but I don't think it is the best way.

from bmn-boundary-matching-network.

JJBOY commented on June 2, 2024

It's not convenient to train with unfixed length of the feature's temporal dimension because the temporal dimension of videos in one batch should be same. You can try to set the batch size to 1.

from bmn-boundary-matching-network.

semchan commented on June 2, 2024

It's not convenient to train with unfixed length of the feature's temporal dimension because the temporal dimension of videos in one batch should be same. You can try to set the batch size to 1.

I see. It is indeed a issue when training with a unfixed length. But for the inference is that can be unfixed? It seems that still not convenient yet since the "mask" should be generated in the "BMN" init; if the "mask" generated in "def forward()", it will cost much time for computing.

from bmn-boundary-matching-network.

JJBOY commented on June 2, 2024

Yes you are right. it's possible to use unfix length but will be unconvinient. For every length, you need to generated a mask.

from bmn-boundary-matching-network.

JJBOY commented on June 2, 2024

Actually， without PEM， the recall will drop a lot. As for the mask, we indeed only need 32 points, but if we generate a whole weighting mask we can reuse it rather than generating it on fly.

from bmn-boundary-matching-network.

vhvkhoa commented on June 2, 2024

Actually， without PEM， the recall will drop a lot. As for the mask, we indeed only need 32 points, but if we generate a whole weighting mask we can reuse it rather than generating it on fly.

I am sorry, is it the result you got from experiment yourself ? Could you tell more about it because I saw from Table 4 of the original paper, the result just dropped 2% on AR@100, on validation set ?

from bmn-boundary-matching-network.

Niclaus233 commented on June 2, 2024

Is each video sampled at different intervals
How does the author turn unequal video into equal length? rescaled the feature length of all videos to same length 100,

from bmn-boundary-matching-network.

JJBOY commented on June 2, 2024

Actually， without PEM， the recall will drop a lot. As for the mask, we indeed only need 32 points, but if we generate a whole weighting mask we can reuse it rather than generating it on fly.

I am sorry, is it the result you got from experiment yourself ? Could you tell more about it because I saw from Table 4 of the original paper, the result just dropped 2% on AR@100, on validation set ?

In my expriments, only use TEM can get AR@100 72.29, and only use PEM can get AR@100 75.08. It seems PEM is more import than TEM.

from bmn-boundary-matching-network.

about the length of temporal dimension? about bmn-boundary-matching-network HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs