Comments (6)
This is how we precompute the s3d features:
Each segment is 1 second long with no overlapping, the FPS is kept to 30.
So each segment has 30-frames, the input size is 30x224x224x3, the output size of S3D is averaged to 1x1x1x1024.
Your results look pretty close. I think it is important to report the average results over several experiments to draw conclusions because there is important variation with respect to the random seed.
from mmt.
In the h5 files, your provided under the folder of vid_feat_files/mult_h5, the data has keys of features.vggish, and features.audio. Is there any difference between those two features? Are they both used by the model?
features.audio are the audio features extracted by the authors of CE.
features.vggish are the audio features extracted by us.
We only use the features.vggish audio features for the results reported in the paper.
Did you use the default way to extract vggish features same as mentioned in the CE paper?
For obtaining the features.vggish, we used the same approach as the authors of CE except that our window size is 1.0s
from mmt.
Sorry, we cannot share the features extraction code.
The checkpoint to extract the S3D features is available here.
from mmt.
Sorry, we cannot share the features extraction code.
The checkpoint to extract the S3D features is available here.
Cool! It's enough. Thanks for your instant reply.
from mmt.
Hi, Gabeur,
May I know the way you precompute the S3D features?
According to the pentathlon challenge. "Frames are extracted at 10fps and processed in clips of 32 frames with a stride of 25 frames." pentathlon
But i dont think you use this way, because the number of S3D features(1024 features) you calculate for each video is similar to the video duration(for example, a video of 11 seconds will have S3D features in the dimension of (11, 1024) in MMT.
I'm wondering how you sample and extract the S3D features. I tried two ways to extract S3D. Here is the result.
The S3D I used is from model.(the d3d model you provided earlier is corrupted somehow, i cannot load the pretrained weights, so I switch to this S3D version)
As you can see, there still remains a gap. It could be the problem of the S3D model i used. It could also be the way I extract the S3D feature is different from yours. Could you give some advice? Thanks!
from mmt.
This is how we precompute the s3d features:
Each segment is 1 second long with no overlapping, the FPS is kept to 30.
So each segment has 30-frames, the input size is 30x224x224x3, the output size of S3D is averaged to 1x1x1x1024.Your results look pretty close. I think it is important to report the average results over several experiments to draw conclusions because there is important variation with respect to the random seed.
Hi Gabeur, we use the way you suggested, and the performance of S3D feature is similar now. Thanks a lot!
But we met with some problems in terms of the audio features(vggish). There are two questions and hope you could help.
-
In the h5 files, your provided under the folder of vid_feat_files/mult_h5, the data has keys of features.vggish, and features.audio. Is there any difference between those two features? Are they both used by the model?
-
Did you use the default way to extract vggish features same as mentioned in the CE paper?
I noticed that, according to CE paper or vggish tensorflow repo, the audio features should be parsed into non-overlapping 0.96s collections of frames. But in the MMT expert_timings.py, the expert_timing of vggish has feat_width of 1.0. It looks like you parse the audio features with 1.0s per collections of frames.
Since there is 0.04s difference, did you resample the data or align the vggish features? If so, may I know how the vggish feature was calculated? Please correct me if my understanding is not right.
Many thanks for your help!
from mmt.
Related Issues (20)
- H5 files with video features HOT 2
- About MSRVTT_full HOT 1
- missing speech features for LSMDC dataset HOT 1
- About finetuning from a HowTo100M pretrained model on ActivityNet dataset HOT 1
- How to train MMT from scratch for other databases e.g. v3c1 HOT 1
- About Dataloader HOT 3
- How to speed up the training process? HOT 2
- How to run code with multiple GPUs HOT 1
- Explanation of the input parameters of the forward function of the model HOT 3
- Feature Extraction HOT 1
- MSRVTT features_t.s3d and ablation studies HOT 2
- File name export routine for top-rated files. HOT 1
- Can not download activity-net dataset HOT 2
- Cannot download the extracted features HOT 1
- TypeError HOT 1
- TypeError
- About inference
- Cannot download the video features
- Typo in query shuffling variable name
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmt.