GithubHelp home page GithubHelp logo

shivangibithel / cross-modal-retrieval-using-cmfh Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sachjbp/cross-modal-retrieval-using-cmfh

0.0 0.0 0.0 4.16 MB

Python 16.54% Jupyter Notebook 83.46%

cross-modal-retrieval-using-cmfh's Introduction

Cross-Modal-Retrieval-using-CMFH

This is my implementation of Cross-Modal retrieval using Collective Matrix Factorization Hashing (CMFH) originally described in link. CMFH helps us to generate unified embeddings for different modes of data in such a way that similar semantic data is nearer ( For eg, a video and its corresponding text being nearer in the common embedding space). In this repo, we demonstrate training and testing for video-text and text-video retrievals on MSR-VTT-10K dataset.

To train CMFH from scratch follow these steps:

  1. Since, generation of feature matrices is a time comsuming task, you may want to download the precomputed feature matrices X1 for training videos ,X2 for corresponding annotated texts, X1_test for test videos and X2_test for corresponding annotated texts and put it in feature_matrices folder. Feature matrices are of dimension (d * n) where where d is the length of embedding a single video or a text and n is the number of samples in training set.

For folks willing to generate feature matrices themselves, can download the train videos from here and put them in respective folder.

  1. Run the following command

python train.py

For folks , just interested in testing the joint embeddings for video-text and text-video retrievals can follow these steps:

  1. Download pre-trained projection matrices P1 from here and P2 from here and save them as P1.npy and P2.npy respectively in projection_matrices folder.
  2. Run the cells in test notebook as instructed to run the webapp and play around by entering YouTube IDs of smaller videos(< 1 min ) and get the matching texts from MSR-VTT-10K training data texts or enter a sentence and get top 10 relevant video YouTube URLs from MSR-VTT-10K training videos.

cross-modal-retrieval-using-cmfh's People

Contributors

sachjbp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.