GithubHelp home page GithubHelp logo

dpcca's Introduction

Deep Partial Canonical Correlation Analysis ( DPCCA )

This code implements the algorithms described in the paper: "Bridging Languages through Images with Deep Partial Canonical Correlation Analysis" by Guy Rotman, Ivan Vulić and Roi Reichart.
Please cite the paper if you are using this code.

Prerequisites

The code was implemented in python 3.6.3 with anaconda environment. All requirements are included in the requirements.txt file. They can be installed by running the following command from the command line: pip install -r requirements.txt

Data

.h5 files with all samples of the WIW dataset (including textual and visual features) are available in the following link: WIW Feature Set.

After downloading the "wiw_data.zip" file, please unzip it into the "data" directory.

Lastly, in order to run the models please make sure to first split the WIW dataset to train/val/test by running the following command from the command line: python split_wiw.py

Optional

The full dataset (including the set of images) can be downloaded from the following link: WIW Dataset

Models

The directory contains the following models:

  1. dpcca_a.py - An implementation for Deep Partial Canonical Correlation Analysis by the NOI optimization algorithm for variant A.
  2. dpcca_b.py - An implementation for Deep Partial Canonical Correlation Analysis by the NOI optimization algorithm for variant B.
  • Each model can be executed (training + evaluation) by running the following command from the command line: python model_name.py (e.g. python dpcca_a.py).

  • The architectures of the models are implemented in the model_architecture.py file.

Hyperparameters and Default Settings

All hyperparameters and default settings appear in the cfg.py file. A detailed explanation of them appears inside the file.

Task and Evaluation

Cross-Lingual Word Retrieval (Also known as Bilingual Lexicon Induction)

The cross-lingual word retrieval task can be described as follows: Given a word in one language the goal is to retrieve the correct translation of it from a lexicon of a second language.

  • To train and test on the EN-DE version of WIW please set in the cfg.py file:
 self.feats = ['eng','ger','vis']
  • To train and test on the EN-IT version of WIW please set in the cfg.py file:
self.feats = ['eng','it','vis']
  • To train and test on the EN-RU version of WIW please set in the cfg.py file:
 self.feats = ['eng','ru','vis']

Evaluation

Evaluation of R@K (Recall at K) for the task is implemented in the retrieval_eval.py file.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details

References

  • English vectors were taken from: "Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of EMNLP."
  • German vectors were taken from: "Ivan Vulic and Anna Korhonen. 2014. Is "universal ´syntax" universally useful for learning distributed representations? In Proceedings of ACL, pages 518–524."
  • Italian vectors were taken from: "Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. 2015. Improving zero-shot learning by mitigating the hubness problem. In Proceedings of ICLR: Workshop Papers."
  • Russian vectors were taken from: "Andrey Kutuzov and Igor Andreev. 2015. Texts in, meaning out: neural language models in semantic similarity task for Russian. In Proceedings of DIALOG."
  • Visual vectors were taken from: "Simonyan, Karen, and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.