GithubHelp home page GithubHelp logo

Comments (10)

vhvkhoa avatar vhvkhoa commented on June 2, 2024 1

After reading through the code for sample_mask generation, I think it is kind of an ineffective way because we just care about 32 points in an anchor (selectively chosen) but we have to generate a whole weighting mask for every features on temporal dimension of the video.
I think there should be a better way to do it by some select functions of pytorch, I hope there will be an implementation on this part to make it more effective.

Currently I have dropped the PEM part of the code to test on my own problem using only the start and end scores and it saves me a lot of time, memory and thoughts for editing the code. According to the paper I think dropping PEM part doesn't decrease too much of the result so I hope it will work fine to me.

from bmn-boundary-matching-network.

vhvkhoa avatar vhvkhoa commented on June 2, 2024

I have tried to modify the code for arbitrary length videos but failed because the code has to build the ground truth confidence map which has size of BxTxT, and a sample mask for generating output confidence map which has an even bigger size (Tx(NTT)) and it destroys all the training process quickly.

from bmn-boundary-matching-network.

semchan avatar semchan commented on June 2, 2024

I have tried to modify the code for arbitrary length videos but failed because the code has to build the ground truth confidence map which has size of BxTxT, and a sample mask for generating output confidence map which has an even bigger size (Tx(N_T_T)) and it destroys all the training process quickly.

you are right. one of the solution is rescale the feature's temporal dimension to 100 for inference, but I don't think it is the best way.

from bmn-boundary-matching-network.

JJBOY avatar JJBOY commented on June 2, 2024

It's not convenient to train with unfixed length of the feature's temporal dimension because the temporal dimension of videos in one batch should be same. You can try to set the batch size to 1.

from bmn-boundary-matching-network.

semchan avatar semchan commented on June 2, 2024

It's not convenient to train with unfixed length of the feature's temporal dimension because the temporal dimension of videos in one batch should be same. You can try to set the batch size to 1.

I see. It is indeed a issue when training with a unfixed length. But for the inference is that can be unfixed? It seems that still not convenient yet since the "mask" should be generated in the "BMN" init; if the "mask" generated in "def forward()", it will cost much time for computing.

from bmn-boundary-matching-network.

JJBOY avatar JJBOY commented on June 2, 2024

Yes you are right. it's possible to use unfix length but will be unconvinient. For every length, you need to generated a mask.

from bmn-boundary-matching-network.

JJBOY avatar JJBOY commented on June 2, 2024

Actually, without PEM, the recall will drop a lot. As for the mask, we indeed only need 32 points, but if we generate a whole weighting mask we can reuse it rather than generating it on fly.

from bmn-boundary-matching-network.

vhvkhoa avatar vhvkhoa commented on June 2, 2024

Actually, without PEM, the recall will drop a lot. As for the mask, we indeed only need 32 points, but if we generate a whole weighting mask we can reuse it rather than generating it on fly.

I am sorry, is it the result you got from experiment yourself ? Could you tell more about it because I saw from Table 4 of the original paper, the result just dropped 2% on AR@100, on validation set ?

from bmn-boundary-matching-network.

Niclaus233 avatar Niclaus233 commented on June 2, 2024

Is each video sampled at different intervals
How does the author turn unequal video into equal length? rescaled the feature length of all videos to same length 100,

from bmn-boundary-matching-network.

JJBOY avatar JJBOY commented on June 2, 2024

Actually, without PEM, the recall will drop a lot. As for the mask, we indeed only need 32 points, but if we generate a whole weighting mask we can reuse it rather than generating it on fly.

I am sorry, is it the result you got from experiment yourself ? Could you tell more about it because I saw from Table 4 of the original paper, the result just dropped 2% on AR@100, on validation set ?

In my expriments, only use TEM can get AR@100 72.29, and only use PEM can get AR@100 75.08. It seems PEM is more import than TEM.

from bmn-boundary-matching-network.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.