GithubHelp home page GithubHelp logo

intuitionmachine / counterfactual-storyrw Goto Github PK

View Code? Open in Web Editor NEW

This project forked from atcbosselut/counterfactual-storyrw

0.0 2.0 0.0 872 KB

code and data for EMNLP-19 paper "Counterfactual Story Reasoning and Generation" https://arxiv.org/abs/1909.04076

License: MIT License

Python 93.46% Shell 6.54%

counterfactual-storyrw's Introduction

Dataset and code for "Counterfactual Story Reasoning and Generation"

This repo contains the dataset and code for the following paper:

Counterfactual Story Reasoning and Generation
Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark and Yejin Choi
EMNLP 2019

Dataset: TimeTravel

The dataset can be downloaded from here.

Data files includes (see examples below):

  1. train_supervised_small.json: supervised training set (the training set used in the paper experiments)
  2. train_supervised_large.json: supervised training set (a larger supervised training set as we annotated more)
  3. train_unsupervised.json: unsupervised training set
  4. dev_data.json: Dev set
  5. test_data.json: Test set

Data format in each file:

  • Supervised training data example
{
  "story_id": "4fd7d150-b080-4fb1-a592-8c27fa6e1fc8",
  "premise": "Andrea wanted a picture of her jumping.",
  "initial": "She set the camera up.",
  "counterfactual": "She asked her friend to draw one.",
  "original_ending": "Then, she jumped in the air. The picture kept coming out wrong. It took twenty tries to get it right.",
  "edited_ending": [
    "Then, she jumped in the air to demonstrate how she wanted it to look.",
    "The picture kept coming out wrong.",
    "It took drawing it several times to get it right."
  ]
}
  • Unsupervised training data example
{
  "story_id": "da0e85f1-c586-4236-a8a3-ee6421c8e71d",
  "premise": "Charles' mother taught her son to carry a pre-paid cell phone.",
  "initial": "As a job seeker, Charles put his cell phone number on applications.",
  "counterfactual": "As a job seeker, Charles used his cell phone to keep his information out of employers hands.",
  "original_ending": "He needed a real cell phone, but kept up with his pre-paid cell phone. One afternoon he was in a phone interview with Apple Computers. He ran out of minutes and never reached Apple's hiring manager again."
}
  • Dev / test data example
{
  "story_id": "048f5a77-7c17-4071-8b0b-b8e43087132d",
  "premise": "Neil was visiting Limerick in Ireland.",
  "initial": "There, he saw a beautiful sight.",
  "counterfactual": "It was the ugliest city he's ever seen.",
  "original_ending": "He saw the large and lovely River Shannon! After a few minutes, he agreed with the locals. The River Shannon was beautiful.",
  "edited_endings": [
    [
      "He saw the small and lonely River Shannon!",
      "After a few minutes, he agreed with the locals.",
      "The River Shannon was lonely."
    ],
    [
      "However, he saw the large and lovely River Shannon!",
      "After a few minutes, he agreed with the locals.",
      "The River Shannon was beautiful."
    ],
    [
      "However, he did think the large River Shannon was lovely!",
      "After a few minutes, he agreed with the locals that Limerick wasn't as ugly as he though.",
      "The River Shannon was beautiful."
    ]
  ]
}

Code

(The code is still under cleanup. More details of code usage will be added soon.)

  • The code depends on Texar. Please install the version under third_party/texar. Follow the installation instructions in the README there.
  • Use prepare_data_rewriting.py to preprocess the raw text data and transform into TFRecord format. An example command is (please see the code for more config options).
python prepare_data_rewriting.py --data_dir=raw_data_dir
  • Run run_[X].sh for training/testing model [X].
  • Use evaluate.py for evaluation. An example command is
python evaluate.py --all-preds-dir data/100_output_proced --gold-file data/dev.jsonl &> 100_output_proced_metrics.log
  • The WMS and W+SMS metrics in the paper (Table.7) use the code here.

Citation

@inproceedings{qin-counterfactual,
    title = "Counterfactual Story Reasoning and Generation",
    author = "Qin, Lianhui and Bosselut, Antoine and Holtzman, Ari and  Bhagavatula, Chandra and  Clark, Elizabeth and Choi, Yejin",
    booktitle = "2019 Conference on Empirical Methods in Natural Language Processing.",
    month = "nov",
    year = "2019",
    address = "Hongkong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/pdf/1909.04076.pdf",
}

counterfactual-storyrw's People

Contributors

bryant1410 avatar qkaren avatar zhitinghu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.