GithubHelp home page GithubHelp logo

Comments (3)

gabeur avatar gabeur commented on August 11, 2024

Thanks for your interest in our work.

  • features_t encode the time in the video at which each feature was extracted. As described in the section 3.1 of the paper, we use temporal embeddings to provide temporal information about the time in the video when each feature was extracted. The temporal information of each feature is discretized here and then embedded here.
  • features_ind encode if the feature is valid (1) or if it is padding (0). It is then used here to set the attention mask so that padding tokens are not attended.

from mmt.

code10086web avatar code10086web commented on August 11, 2024

Thank you very much for your reply. By any chance, Is it possible for you to explain the meaning of the parameters of the vid_bert function on line 577 of file model/model.py? especially the parameter input_ids.

image

from mmt.

gabeur avatar gabeur commented on August 11, 2024
  • input_ids were used initially to encode the different token classes but it became useless with the temporal embeddings. You will find in the forward function that it is only used to infer the shape of the missing token_type_ids.
  • attention_mask encode the tokens that should be attended (valid tokens) in contrast to the padding tokens that should not be attended.
  • token_type_ids encode the expert used to extract the feature (called "Expert embeddings" in part 3.1 of the paper).
  • position_ids encode the time in the video at which each feature was extracted (called "Temporal embeddings" in part 3.1 of the paper)
  • features are the expert features (called "Features" in part 3.1 of the paper)

from mmt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.