GithubHelp home page GithubHelp logo

youjiangxu / movieqa_benchmark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from makarandtapaswi/movieqa_benchmark

0.0 2.0 0.0 2.96 MB

Benchmark data and code for Question-Answering on Movie stories

Python 100.00%

movieqa_benchmark's Introduction

MovieQA

MovieQA: Understanding Stories in Movies through Question-Answering
Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, June 2016.
Project page | arXiv preprint | Read the paper | Explore the data


Benchmark Data

The data is made available in simple JSON / text files for easy access in any environment. We provide Python scripts to help you get started by providing simple data loaders.

To obtain access to the stories, and evaluate new approaches on the test data, please register at our benchmark website.

Python data loader

import MovieQA
mqa = MovieQA.DataLoader()

Explore

Movies are indexed using their corresponding IMDb keys. For example
mqa.pprint_movie(mqa.movies_map['tt0133093'])

QAs are stored as a standard Python list
mqa.pprint_qa(mqa.qa_list[0])

Use

Get the list of movies in a particular split, use
movie_list = mqa.get_split_movies(split='train')

To get train or test splits of the QA along with a particular story, use
story, qa = mqa.get_story_qa_data('train', 'plot')

Supported splits are: train, val, test, full and story forms are: plot, subtitle, dvs, script

Video lists can be obtained per QA, or per movie using
vl_qa, _ = get_video_list('train', 'qa_clips') % per QA
vl_movie, _ = get_video_list('train', 'all_clips') % per movie

Build your own data/story loaders

We provide a simple interface to load all the data (QAs, movies) and stories through the code above. If you wish to modify something, you are welcome to use your own data loaders and access the raw data directly. The evaluation server submissions are simple text files (explained after login) and are independent of any data loaders.


qa.json

  • qid: A unique id for every question. Also indicates, train|val|test sets
  • imdb_key: The movie this question belongs to
  • question: The question string
  • answers: The five answer options
  • correct_index: Correct answer option (indexed by 0)
  • plot_alignment: split_plot file line numbers, to which this question corresponds
  • video_clips: Clips that are aligned with the question, to be used for answering

movies.json

  • imdb_key: A unique id for every movie. Corresponds to that used by IMDb
  • name: Movie title
  • year: Movie release year
  • genre: Movie genre classification
  • text: Text sources that are available for that movie

Data Releaselog

  • 2017.01.14: Alignments between question and plot sentence, plot sentence and video clips
  • 2016.11.08: Patch for 65 missing video clips
  • 2016.09.10: Video meta-data released: Shot boundaries, frame-timestamp correspondence
  • 2016.04.06: Removed missing video clips from qa.json
  • 2016.03.30: v1.0 data release

Requirements

  • numpy
  • pysrt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.