Comments (3)
Thanks for your interest in our work.
- features_t encode the time in the video at which each feature was extracted. As described in the section 3.1 of the paper, we use temporal embeddings to provide temporal information about the time in the video when each feature was extracted. The temporal information of each feature is discretized here and then embedded here.
- features_ind encode if the feature is valid (1) or if it is padding (0). It is then used here to set the attention mask so that padding tokens are not attended.
from mmt.
Thank you very much for your reply. By any chance, Is it possible for you to explain the meaning of the parameters of the vid_bert function on line 577 of file model/model.py? especially the parameter input_ids.
from mmt.
- input_ids were used initially to encode the different token classes but it became useless with the temporal embeddings. You will find in the forward function that it is only used to infer the shape of the missing token_type_ids.
- attention_mask encode the tokens that should be attended (valid tokens) in contrast to the padding tokens that should not be attended.
- token_type_ids encode the expert used to extract the feature (called "Expert embeddings" in part 3.1 of the paper).
- position_ids encode the time in the video at which each feature was extracted (called "Temporal embeddings" in part 3.1 of the paper)
- features are the expert features (called "Features" in part 3.1 of the paper)
from mmt.
Related Issues (20)
- H5 files with video features HOT 2
- S3D code for extracting the motion feature HOT 6
- About MSRVTT_full HOT 1
- missing speech features for LSMDC dataset HOT 1
- About finetuning from a HowTo100M pretrained model on ActivityNet dataset HOT 1
- How to train MMT from scratch for other databases e.g. v3c1 HOT 1
- About Dataloader HOT 3
- How to speed up the training process? HOT 2
- How to run code with multiple GPUs HOT 1
- Feature Extraction HOT 1
- MSRVTT features_t.s3d and ablation studies HOT 2
- File name export routine for top-rated files. HOT 1
- Can not download activity-net dataset HOT 2
- Cannot download the extracted features HOT 1
- TypeError HOT 1
- TypeError
- About inference
- Cannot download the video features
- Typo in query shuffling variable name
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmt.