antoyang / just-ask Goto Github PK
View Code? Open in Web Editor NEW[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Home Page: https://arxiv.org/abs/2012.00451
License: Apache License 2.0
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Home Page: https://arxiv.org/abs/2012.00451
License: Apache License 2.0
Is it possible to use the tool for our own videos and dataset? If yes, in addition to videos, what features are required for pre-training or fine tuning?
I assume from your readme that : How to 100M feature extractor with mixture of expert
based on this repository, the features are extracted/exported in addition to the spoken speech to text transcript? Or correct me if I am wrong.
Because, I want to test this system with my own videos to see how much it can handle the explanation of the videos and how can I train for my own videos.
Please guide.
Hi, I have read your paper "FrozenBiLM". I have several question about the preprocess of LSMDC-FiB dataset. Since I noticed that there are some blanks only contains a part of the word. For example "I went to the place w___e I live." The answer would be "her". Therefore, there exists a problem that the semantic meaning of the question have been destroyed. I am wondering how do you treat these type of questions?
Thanks.
Thanks a lot for sharing great work!
I've met a problem when downloading pretrained checkpoints, pre-processed data and features via below commands:
bash download/download_checkpoints.sh <DEFAULT_CKPT_DIR>
bash download/download_downstream.sh <DEFAULT_DATASET_DIR
It requires a verification code and I cannot access to it.
Can you share them another way?
Thank you again for your sharing.
Helloo,
Thank you for your great work! I wanted to double-check if this file contains the features for the iVQA dataset? I am attempting to fine-tune the cross-modal trained FrozenBiLM on iVQA and when I try to load the features, there appears to be a corruption issue. Could you please let me know if I am processing the data correctly?
Hi,
I am running the script on a remote cluster which has no gshell cmodule. Instead I am using gdrive. I am wondering could you send me the verification code for downloading the ckpts and data from the google drive address?
email: [email protected]
Hi,
After finetuning on downstream VideoQA datasets, how does the model run in the test dataset?
I'm a little confused about this point.
Thanks
Hi, I ran the VQA-T model from scratch using the command
python main_videoqa.py --checkpoint_dir=ft<dataset> --dataset=<dataset> --lr=0.00001 \
--pretrain_path=<CKPT_PATH>
On MSRVTT-QA, MSVD-QA, Anet-QA, How2QA and iVQA, I got the following results: 40.2, 41.5, 33.8, 71.4 and 15.7, while the paper showed 39.6, 41.2, 36.8, 80.8, 23.0. Do the settings of Anet-QA, How2QA and iVQA use some different hyperparameters?
Hi, thank you for sharing the wonderful work.
I met a issue while run your repo. While I use the .sh files in download folders, I cannot download it through gdrive
The error report is as follow:
Failed getting oauth client: Failed to exchange auth code for token: Post https://accounts.google.com/o/oauth2/token: dial tcp 142.251.43.13:443: i/o timeout (command: bash download/download_checkpoints.sh)
Hope to see your reply, thank you and wish you a good luck
Hi, thanks for the interesting work. Can you please provide the VQA-T checkpoint pretrained on HowTo100M?
gshell download --with-id 1bMfT9WjBiNWgfdVl2dej4mUaXvICGGRH --recursive
hi when i run above code, i just got this error. do you know what happer here?
Hello,
I am trying to use your pretrained model and reproduce the results on MSVD-QA. I'm following the same hyperparameters you mentioned in the paper and use the ckpt_pt_howtovqa69m file to initiate the model. However, I observed an overfitting starting from the early epochs (I obtained 73.97% accuracy on the training set and 41.79% on the validation set). I have also tried to use the fine-tuned model on MSVD-QA to see what happens if I retrain it on the same dataset and I obtained a decrease in the performances (I obtained 30% after 20 epochs then it saturates)!
I tried to search for your loss and accuracy curves but could not find them. Would it be possible to share them here? Did you obtain similar results and if so do you know the origin of this problem? Thank you for your response.
hi where can i find ivqa dataset (video)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.