danieljf24 / dual_encoding Goto Github PK

View Code? Open in Web Editor NEW

155.0 155.0 32.0 290 KB

[CVPR2019] Dual Encoding for Zero-Example Video Retrieval

License: Apache License 2.0

Shell 3.72% Python 86.22% Perl 10.06%

dual_encoding's People

Contributors

Stargazers

Watchers

dual_encoding's Issues

Code for newly published TPAMI2021 version

Hi.

I read your "Dual Encoding for Video Retrieval by Text" which accepted by TPAMI2021. But the code repository(https://github.com/danieljf24/hybrid_space) mentioned in the paper is not existed and I only found this repository for the CVPR2019 version.

Could you please share the source code for TPAMI2021 version ?

I would appreciate your reply.

Issue about dataset format

Hello,
As we were trying to re-implement the model onto other datasets, we get stuck at the generation feature.bin file. Your team has mentioned that we could use txt2bin.py to convert the feature files from txt into binary format, but I'm not sure what should the feature files looks like when it is in .txt form.
Can you provide a few lines of example for the txt feature files? It would be great if there're some example files for reference.
Thank you for your help!

Question about feature extraction method

Hello. I'm trying to evaluate some model on the pre-computed MSR-VTT dataset that you provided.
But the result was on par with the random selection.
In the process of analyzing the cause, I think that there is a difference in the visual feature extraction step.

Can you tell me which framework (TF, Keras, PyTorch...) and weight source you used in the visual feature extraction stage? Then I can analyze and experiment your research under the same conditions with other models.

Thank you in advance!

Training time on MSVD dataset

Anyone run this code on python3?

Anyone run this code on python3 successfully, with some modifications?

Thanks!

Could this work do video to text task?

It's a great work. Given a sentence to find the most matched video from several videos.
But could this work do video-to-text task, I mean given a video to find the most matched text from several texts.

Looking forward to your reply.

about the testing result

I train the model without changing the parameters and find that the sum of recall can only reach 135.3, how can I reach 148.6 as described in the paper?Please give me a hand

The model is very sensitive to the batch size

We train the model on 4-GPUs with different batch size. The results on validation change largely with batch size, for batch=128 we get all_recall=286; batch = 256 we get all_recall=268; batch=1280 all_recall=240;
We train the model on 1-GPU with different batch size: batch=128, all_recall=295; batch=256, all_recall=285;
We tried different learning rate, but it seems has no affect for the decreasing result.
Do you have the similar result?

Seeking information for CNN model used on MSRVTT

Hi,

First of all great work. I have managed to reproduce your results. However could you please provide additional information on the model you have utilized for video feature extraction . I tried experimenting with features extracted using the torchvision Resnet-152 model (pre trained weights). However, they didn't performed particularly well using the trained model you have provided for Dual Encoding.

I assume since you have trained your model using features from a particular Resnet model the dual encoding is biased towards it . In order to achieve a good result using your trained model, the same CNN model needs to be utilized for the feature extraction.

So could you please give more information about the particular variant of the Resnet-152 model you utilized.

Thanks

IP not active

Hello, I am trying to download TGIF and IACC.3 (ResNext-101, ResNet-152 features) from following link:

wget http://39.104.114.128/avs/tgif_ResNext-101.tar.gz

But it does not seem to be active.

Kindly guide me in this issues.

Thanks and Regards
Varsha DEVI

训练出的压缩文件损坏

作者你好，我在测试数据时出现了一个错误：
IOError: [Errno 20] Not a directory: '/home/VisualSearch/testCollection/runs/model_best.pth.tar/pred_errors_matrix.pth.tar'
并且无法打开压缩文件model_best.pth.tar。请问是否是我在训练时出现了什么问题？
非常感谢！

#10 More details about MSVD on zero-example video retrieval

In the paper, "So we assess the models previously trained on MSR-VTT using the MSVD test set" refers to training with the entire data set of MSRVTT, and testing the model with the test set of MSVD（670 vidoes）? Or the test model is to use the entire MSVD dataset（1970 vidoes） as a test set？

Evaluating performance on a single video

How would you evaluate the performance of the models on each videos?

I want to take a look at some relatively good and bad matches of the video and caption, but I don't understand how the video ids and the caption ids are related to the label matrix in evaluation.py.

In evaluation.py:i2t_varied(error_matrix), it convertes the error matrix to a label matrix of size #caption_embs * #video_embs. I was assuming the order of the label martix is representing the caption ids, but the caption id are of the format videos_xxxx#captions_nn.

about the provided model

Hi, I have met some problems.

I downloaded the provided data and the model, then directly ran the inference, found that the results are Recall@1 0.0? Is there anything wrong?

An error when perform script

when I perform command " ./do_all.sh msrvtt10ktrain msrvtt10kval msrvtt10ktest full",it report an error "./do_all.sh: line 16: ./do_test_dual_encoding_msrvtt10ktest.sh: No such file or directory". Help me,thanks!

./do_test_dual_encoding_msrvtt10ktest.sh: 没有那个文件或目录

Hello, can I get your help? when I .run ./do_all.sh msrvtt10ktrain msrvtt10kval msrvtt10ktest full ， It will show that there is no folder

A question about MPII-MD

The dataset LSMDC contains MPII-MD. Does your experiment use LSMDC (more than 100,000 videos) or MPII -MD(68375 videos)? Can you provide MPII-MD features?（The features you provide do not include MPII-MD）

Error while training

Hello

Im trying to train the model and gets the following error :

[02 Feb 17:03:43 - text2vec.py:line 13] /data/home/ameen.ali/dual_encoding/util/text2vec.py.Bow2Vec initializing ... Traceback (most recent call last): File "trainer.py", line 426, in <module> main() File "trainer.py", line 161, in main opt.we_parameter = get_we_parameter(rnn_vocab, w2v_data_path) File "/data/home/ameen.ali/dual_encoding/model.py", line 18, in get_we_parameter w2v_reader = BigFile(w2v_file) File "/data/home/ameen.ali/dual_encoding/basic/bigfile.py", line 10, in __init__ assert(len(self.names) == self.nr_of_images) AssertionError

any idea why this happens?

KeyError: 'Traceback (most recent call last):\n

I am very interested in your work and thank you for the code.But in the process of running the code, I encountered a problem that could not be solved, I hope you can give me some help：

Traceback (most recent call last):
File "/data/projects/zero-simple2/trainer.py", line 422, in
main()
File "/data/projects/zero-simple2/trainer.py", line 215, in main
train(opt, data_loaders['train'], model, epoch)
File "/data/projects/zero-simple2/trainer.py", line 290, in train
for i, train_data in enumerate(train_loader):
File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/data/projects/zero-simple2/util/data_provider.py", line 107, in getitem\n caption.append(self.vocab(''))\n File "/data/projects/zero-simple2/util/vocab.py", line 34, in call\n return self.word2idx[word]\nKeyError: ''\n'

thank you very much!

about the testing result

I train the model without changing the parameters and find that the sum of recall can only reach 135.3, how can I reach 148.6 as described in the paper?Please give me a hand

Experiments on MSVD and MPII-MD

Performance of zero-example video retrieval,meansured by mAP on MSVD is 0.232，this result is only reflect video-to- text retrieval, doesn't reflect text-to-video retrieval，isn't it?

torch.backends.cudnn.CuDNNError: 8: CUDNN_STATUS_EXECUTION_FAILED

Thanks your good job, but
When I run ./do_all.sh msrvtt10ktrain msrvtt10kval msrvtt10ktest full got this error.

Alternative to baidu for downloading the frame features.

Hello,
This request is actually for the data for the hybrid space experiment, sorry for the confusion.
Thanks for the great work and for sharing the code and the data.
Unfortunately, it is not possible for non-Chinese people (at least for French ones) to download from baidu. At some point, one has to enter a Chinese phone number for creating an account, which we do not have. There seems to be some unofficial alternatives but they do not look so trustable (with the required installation of unknown software) and not even guaranteed to work.
Could you please consider sharing these data on a more open and easy-to-use platform?
In case you have them readily available, we would also be interested in the frame features in the same format for the Vimeo V3C1 and V3C2 collections and/or in the procedure that you used for creating them on the IACC.3 collection.
Best regards,
Georges Quénot.

danieljf24 / dual_encoding Goto Github PK

dual_encoding's People

Contributors

Stargazers

Watchers

Forkers

dual_encoding's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs