GithubHelp home page GithubHelp logo

danieljf24 / dual_encoding Goto Github PK

View Code? Open in Web Editor NEW
155.0 155.0 32.0 290 KB

[CVPR2019] Dual Encoding for Zero-Example Video Retrieval

License: Apache License 2.0

Shell 3.72% Python 86.22% Perl 10.06%

dual_encoding's People

Contributors

danieljf24 avatar xuchaoxi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dual_encoding's Issues

Issue about dataset format

Hello,
As we were trying to re-implement the model onto other datasets, we get stuck at the generation feature.bin file. Your team has mentioned that we could use txt2bin.py to convert the feature files from txt into binary format, but I'm not sure what should the feature files looks like when it is in .txt form.
Can you provide a few lines of example for the txt feature files? It would be great if there're some example files for reference.
Thank you for your help!

Question about feature extraction method

Hello. I'm trying to evaluate some model on the pre-computed MSR-VTT dataset that you provided.
But the result was on par with the random selection.
In the process of analyzing the cause, I think that there is a difference in the visual feature extraction step.

Can you tell me which framework (TF, Keras, PyTorch...) and weight source you used in the visual feature extraction stage? Then I can analyze and experiment your research under the same conditions with other models.

Thank you in advance!

Could this work do video to text task?

It's a great work. Given a sentence to find the most matched video from several videos.
But could this work do video-to-text task, I mean given a video to find the most matched text from several texts.

Looking forward to your reply.

about the testing result

I train the model without changing the parameters and find that the sum of recall can only reach 135.3, how can I reach 148.6 as described in the paper?Please give me a hand

The model is very sensitive to the batch size

We train the model on 4-GPUs with different batch size. The results on validation change largely with batch size, for batch=128 we get all_recall=286; batch = 256 we get all_recall=268; batch=1280 all_recall=240;
We train the model on 1-GPU with different batch size: batch=128, all_recall=295; batch=256, all_recall=285;
We tried different learning rate, but it seems has no affect for the decreasing result.
Do you have the similar result?

Seeking information for CNN model used on MSRVTT

Hi,

First of all great work. I have managed to reproduce your results. However could you please provide additional information on the model you have utilized for video feature extraction . I tried experimenting with features extracted using the torchvision Resnet-152 model (pre trained weights). However, they didn't performed particularly well using the trained model you have provided for Dual Encoding.

I assume since you have trained your model using features from a particular Resnet model the dual encoding is biased towards it . In order to achieve a good result using your trained model, the same CNN model needs to be utilized for the feature extraction.

So could you please give more information about the particular variant of the Resnet-152 model you utilized.

Thanks

训练出的压缩文件损坏

作者你好,我在测试数据时出现了一个错误:
IOError: [Errno 20] Not a directory: '/home/VisualSearch/testCollection/runs/model_best.pth.tar/pred_errors_matrix.pth.tar'
并且无法打开压缩文件model_best.pth.tar。请问是否是我在训练时出现了什么问题?
非常感谢!

#10 More details about MSVD on zero-example video retrieval

In the paper, "So we assess the models previously trained on MSR-VTT using the MSVD test set" refers to training with the entire data set of MSRVTT, and testing the model with the test set of MSVD(670 vidoes)? Or the test model is to use the entire MSVD dataset(1970 vidoes) as a test set?

Evaluating performance on a single video

How would you evaluate the performance of the models on each videos?

I want to take a look at some relatively good and bad matches of the video and caption, but I don't understand how the video ids and the caption ids are related to the label matrix in evaluation.py.

image

In evaluation.py:i2t_varied(error_matrix), it convertes the error matrix to a label matrix of size #caption_embs * #video_embs. I was assuming the order of the label martix is representing the caption ids, but the caption id are of the format videos_xxxx#captions_nn.

about the provided model

Hi, I have met some problems.

I downloaded the provided data and the model, then directly ran the inference, found that the results are Recall@1 0.0? Is there anything wrong?

An error when perform script

when I perform command " ./do_all.sh msrvtt10ktrain msrvtt10kval msrvtt10ktest full",it report an error "./do_all.sh: line 16: ./do_test_dual_encoding_msrvtt10ktest.sh: No such file or directory". Help me,thanks!

A question about MPII-MD

The dataset LSMDC contains MPII-MD. Does your experiment use LSMDC (more than 100,000 videos) or MPII -MD(68375 videos)? Can you provide MPII-MD features?(The features you provide do not include MPII-MD)

Error while training

Hello

Im trying to train the model and gets the following error :

[02 Feb 17:03:43 - text2vec.py:line 13] /data/home/ameen.ali/dual_encoding/util/text2vec.py.Bow2Vec initializing ... Traceback (most recent call last): File "trainer.py", line 426, in <module> main() File "trainer.py", line 161, in main opt.we_parameter = get_we_parameter(rnn_vocab, w2v_data_path) File "/data/home/ameen.ali/dual_encoding/model.py", line 18, in get_we_parameter w2v_reader = BigFile(w2v_file) File "/data/home/ameen.ali/dual_encoding/basic/bigfile.py", line 10, in __init__ assert(len(self.names) == self.nr_of_images) AssertionError


any idea why this happens?

KeyError: 'Traceback (most recent call last):\n

I am very interested in your work and thank you for the code.But in the process of running the code, I encountered a problem that could not be solved, I hope you can give me some help:

Traceback (most recent call last):
File "/data/projects/zero-simple2/trainer.py", line 422, in
main()
File "/data/projects/zero-simple2/trainer.py", line 215, in main
train(opt, data_loaders['train'], model, epoch)
File "/data/projects/zero-simple2/trainer.py", line 290, in train
for i, train_data in enumerate(train_loader):
File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n File "/data/anaconda3/envs/pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/data/projects/zero-simple2/util/data_provider.py", line 107, in getitem\n caption.append(self.vocab(''))\n File "/data/projects/zero-simple2/util/vocab.py", line 34, in call\n return self.word2idx[word]\nKeyError: ''\n'

thank you very much!

about the testing result

I train the model without changing the parameters and find that the sum of recall can only reach 135.3, how can I reach 148.6 as described in the paper?Please give me a hand

Experiments on MSVD and MPII-MD

Performance of zero-example video retrieval,meansured by mAP on MSVD is 0.232,this result is only reflect video-to- text retrieval, doesn't reflect text-to-video retrieval,isn't it?

Alternative to baidu for downloading the frame features.

Hello,
This request is actually for the data for the hybrid space experiment, sorry for the confusion.
Thanks for the great work and for sharing the code and the data.
Unfortunately, it is not possible for non-Chinese people (at least for French ones) to download from baidu. At some point, one has to enter a Chinese phone number for creating an account, which we do not have. There seems to be some unofficial alternatives but they do not look so trustable (with the required installation of unknown software) and not even guaranteed to work.
Could you please consider sharing these data on a more open and easy-to-use platform?
In case you have them readily available, we would also be interested in the frame features in the same format for the Vimeo V3C1 and V3C2 collections and/or in the procedure that you used for creating them on the IACC.3 collection.
Best regards,
Georges Quénot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.