GithubHelp home page GithubHelp logo

jdc08161063 / twitter100k Goto Github PK

View Code? Open in Web Editor NEW

This project forked from huyt16/twitter100k

0.0 2.0 0.0 14.24 MB

MATLAB 7.41% M 2.03% Python 46.09% Makefile 0.12% Cuda 34.66% Shell 3.21% Protocol Buffer 1.10% C++ 5.30% C 0.08%

twitter100k's Introduction

Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

Yuting Hu, Liang Zheng, Yi Yang, and Yongfeng Huang

Introduction

This paper contributes a new large-scale dataset for weakly supervised cross-media retrieval, named Twitter100k. It is characterized by two aspects: 1) it has 100,000 image-text pairs randomly crawled from Twitter and thus has no constraint in the image categories; 2) text in Twitter100k is written in informal language by the users.

Since strongly supervised methods leverage the class labels that may be missing in practice, this paper focuses on weakly supervised learning for cross-media retrieval, in which only text-image pairs are exploited during training. We extensively benchmark the performance of four subspace learning methods and three variants of the Correspondence AutoEncoder, along with various text features on Wikipedia, Flickr30k and Twitter100k.

As a minor contribution, inspired by the characteristic of Twitter100k, we propose an OCR-based cross-media retrieval method. In experiment, we show that the proposed OCR-based method improves the baseline performance.

Detailed description is provided in our paper.

Requirements

  • This software is both tested on Windows 10 and CentOS Linux release 7.3.1611.
  • Matlab (tested with R2016a both on Windows and CentOS).
  • Python (tested with 2.7.5 both on Windows and CentOS).
  • Deepnet and its dependencies. (Copyright is held by the author.)

How to use the code

For subspace learning methods (CCA, PLS, BLM, GMMFA)

  1. download the data of the three benckmark datasets from my homepage and put them into the folders feature/ or other folders convenient to you.
  2. modify the dataset name and the data path variables of the script file run_baseline.m in code/GMA-CVPR2012/.
  3. run the matlab script file run_baseline.m.
  4. run retrieve.py for a specific dataset.

For Corr-AE methods

  1. download the data of the three benckmark datasets from my homepage and put them into the folders feature/ or other folders convenient to you.
  2. run the python script file genNPYdata.py in code/deepnet-master/deepnet/examples/yutinghu/ to generate the input data for Corr-AE methods.
  3. install deepnet and its dependencies with patience following the instruction INSTALL.TXT in code/deepnet-master/.
  4. run runall_all.sh in code/deepnet-master/deepnet/examples/yutinghu/wikipedia/ or flickr30k/, twitter100k.
  5. run retrieve_corr_ae.py for a specific dataset.

Result Files

You can download the results of CMC saved in MAT-file format for direct comparison.

twitter100k's People

Contributors

huyt16 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.