danieljf24 / dual_encoding Goto Github PK
View Code? Open in Web Editor NEW[CVPR2019] Dual Encoding for Zero-Example Video Retrieval
License: Apache License 2.0
[CVPR2019] Dual Encoding for Zero-Example Video Retrieval
License: Apache License 2.0
Hi.
I read your "Dual Encoding for Video Retrieval by Text" which accepted by TPAMI2021. But the code repository(https://github.com/danieljf24/hybrid_space) mentioned in the paper is not existed and I only found this repository for the CVPR2019 version.
Could you please share the source code for TPAMI2021 version ?
I would appreciate your reply.
Hello,
As we were trying to re-implement the model onto other datasets, we get stuck at the generation feature.bin file. Your team has mentioned that we could use txt2bin.py to convert the feature files from txt into binary format, but I'm not sure what should the feature files looks like when it is in .txt form.
Can you provide a few lines of example for the txt feature files? It would be great if there're some example files for reference.
Thank you for your help!
Hello. I'm trying to evaluate some model on the pre-computed MSR-VTT dataset that you provided.
But the result was on par with the random selection.
In the process of analyzing the cause, I think that there is a difference in the visual feature extraction step.
Can you tell me which framework (TF, Keras, PyTorch...) and weight source you used in the visual feature extraction stage? Then I can analyze and experiment your research under the same conditions with other models.
Thank you in advance!
Anyone run this code on python3 successfully, with some modifications?
Thanks!
It's a great work. Given a sentence to find the most matched video from several videos.
But could this work do video-to-text task, I mean given a video to find the most matched text from several texts.
Looking forward to your reply.
I train the model without changing the parameters and find that the sum of recall can only reach 135.3, how can I reach 148.6 as described in the paper?Please give me a hand
We train the model on 4-GPUs with different batch size. The results on validation change largely with batch size, for batch=128 we get all_recall=286; batch = 256 we get all_recall=268; batch=1280 all_recall=240;
We train the model on 1-GPU with different batch size: batch=128, all_recall=295; batch=256, all_recall=285;
We tried different learning rate, but it seems has no affect for the decreasing result.
Do you have the similar result?
Hi,
First of all great work. I have managed to reproduce your results. However could you please provide additional information on the model you have utilized for video feature extraction . I tried experimenting with features extracted using the torchvision Resnet-152 model (pre trained weights). However, they didn't performed particularly well using the trained model you have provided for Dual Encoding.
I assume since you have trained your model using features from a particular Resnet model the dual encoding is biased towards it . In order to achieve a good result using your trained model, the same CNN model needs to be utilized for the feature extraction.
So could you please give more information about the particular variant of the Resnet-152 model you utilized.
Thanks
Hello, I am trying to download TGIF and IACC.3 (ResNext-101, ResNet-152 features) from following link:
wget http://39.104.114.128/avs/tgif_ResNext-101.tar.gz
But it does not seem to be active.
Kindly guide me in this issues.
Thanks and Regards
Varsha DEVI
作者你好,我在测试数据时出现了一个错误:
IOError: [Errno 20] Not a directory: '/home/VisualSearch/testCollection/runs/model_best.pth.tar/pred_errors_matrix.pth.tar'
并且无法打开压缩文件model_best.pth.tar。请问是否是我在训练时出现了什么问题?
非常感谢!
In the paper, "So we assess the models previously trained on MSR-VTT using the MSVD test set" refers to training with the entire data set of MSRVTT, and testing the model with the test set of MSVD(670 vidoes)? Or the test model is to use the entire MSVD dataset(1970 vidoes) as a test set?
How would you evaluate the performance of the models on each videos?
I want to take a look at some relatively good and bad matches of the video and caption, but I don't understand how the video ids and the caption ids are related to the label matrix in evaluation.py.
In evaluation.py:i2t_varied(error_matrix), it convertes the error matrix to a label matrix of size #caption_embs * #video_embs. I was assuming the order of the label martix is representing the caption ids, but the caption id are of the format videos_xxxx#captions_nn.
Hi, I have met some problems.
I downloaded the provided data and the model, then directly ran the inference, found that the results are Recall@1 0.0? Is there anything wrong?
when I perform command " ./do_all.sh msrvtt10ktrain msrvtt10kval msrvtt10ktest full",it report an error "./do_all.sh: line 16: ./do_test_dual_encoding_msrvtt10ktest.sh: No such file or directory". Help me,thanks!
Hello, can I get your help? when I .run ./do_all.sh msrvtt10ktrain msrvtt10kval msrvtt10ktest full , It will show that there is no folder
The dataset LSMDC contains MPII-MD. Does your experiment use LSMDC (more than 100,000 videos) or MPII -MD(68375 videos)? Can you provide MPII-MD features?(The features you provide do not include MPII-MD)
Hello
Im trying to train the model and gets the following error :
[02 Feb 17:03:43 - text2vec.py:line 13] /data/home/ameen.ali/dual_encoding/util/text2vec.py.Bow2Vec initializing ... Traceback (most recent call last): File "trainer.py", line 426, in <module> main() File "trainer.py", line 161, in main opt.we_parameter = get_we_parameter(rnn_vocab, w2v_data_path) File "/data/home/ameen.ali/dual_encoding/model.py", line 18, in get_we_parameter w2v_reader = BigFile(w2v_file) File "/data/home/ameen.ali/dual_encoding/basic/bigfile.py", line 10, in __init__ assert(len(self.names) == self.nr_of_images) AssertionError
any idea why this happens?
I am very interested in your work and thank you for the code.But in the process of running the code, I encountered a problem that could not be solved, I hope you can give me some help:
Traceback (most recent call last):
File "/data/projects/zero-simple2/trainer.py", line 422, in
main()
File "/data/projects/zero-simple2/trainer.py", line 215, in main
train(opt, data_loaders['train'], model, epoch)
File "/data/projects/zero-simple2/trainer.py", line 290, in train
for i, train_data in enumerate(train_loader):
File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/data/projects/zero-simple2/util/data_provider.py", line 107, in getitem\n caption.append(self.vocab(''))\n File "/data/projects/zero-simple2/util/vocab.py", line 34, in call\n return self.word2idx[word]\nKeyError: ''\n'
thank you very much!
I train the model without changing the parameters and find that the sum of recall can only reach 135.3, how can I reach 148.6 as described in the paper?Please give me a hand
Performance of zero-example video retrieval,meansured by mAP on MSVD is 0.232,this result is only reflect video-to- text retrieval, doesn't reflect text-to-video retrieval,isn't it?
Thanks your good job, but
When I run ./do_all.sh msrvtt10ktrain msrvtt10kval msrvtt10ktest full
got this error.
Hello,
This request is actually for the data for the hybrid space experiment, sorry for the confusion.
Thanks for the great work and for sharing the code and the data.
Unfortunately, it is not possible for non-Chinese people (at least for French ones) to download from baidu. At some point, one has to enter a Chinese phone number for creating an account, which we do not have. There seems to be some unofficial alternatives but they do not look so trustable (with the required installation of unknown software) and not even guaranteed to work.
Could you please consider sharing these data on a more open and easy-to-use platform?
In case you have them readily available, we would also be interested in the frame features in the same format for the Vimeo V3C1 and V3C2 collections and/or in the procedure that you used for creating them on the IACC.3 collection.
Best regards,
Georges Quénot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.