GithubHelp home page GithubHelp logo

fallcat / openpi-dataset Goto Github PK

View Code? Open in Web Editor NEW

This project forked from allenai/openpi-dataset

0.0 0.0 0.0 5.11 MB

OpenPI dataset for tracking entities in open domain procedural text

License: MIT License

Python 97.92% Shell 2.08%

openpi-dataset's Introduction

Openpi-dataset

OpenPI dataset for tracking entities in open domain procedural text (EMNLP 2020)

Openpi Task

Paper: https://www.aclweb.org/anthology/2020.emnlp-main.520.pdf

Project page: https://allenai.org/data/openpi

Dataset

OpenPI Dataset files are available in JSON format under openpi-dataset/data/gold/ . There are four files:

  • id_question.jsonl: each line is a json with an id, and the input sentence and its past sentences i.e., "x"
  • id_question_metadata.jsonl: the metadata corresponding to the question such as topic. Each line is a json with an id, and the metadata
  • id_answers_metadata.jsonl: each line is a json with an id, and the a list of answers i.e., "y"
  • id_answers.jsonl: the metadata corresponding to the answer. Each line is a json with an id, and the metadata (such as entity, attribute, before value, after value).

Training

You can modify the hyperparameters in this script to train the model.

sh scripts/training_bash.sh

Run Prediction

To run predictions on a single file:

python src/model/generation.py \
      --model_path /path/to/trained_model \
      --test_input_file /path/to/input_file \
      --unformatted_outpath /path/to/store/unformatted_predictions \
      --formatted_outpath /path/to/store/formatted/predictions \
      --max_len max_len_say_200

To run predictions on multiple files, you can use this bash script:

sh scripts/predictions_bash.sh

Run Evaluation

python eval/simple_eval.py 
    -g data/gold/test/id_answers.jsonl
    -p /path/to/formatted/predictions 
    --quiet

(no diagnostics file generated when using --quiet)

To run evaluation on multiple files, you can use this bash script:

sh scripts/evaluations_bash.sh

Hyperparameters

To match the results published in the paper, please use the following hyperparameters. https://github.com/allenai/openpi-dataset/blob/main/hyperparams.md

Leaderboard

coming soon... (we are now working on openpi v2 which clusters

Citation

If you use this dataset in your work, please cite:

@inproceedings{tandon-etal-2020-dataset,
    title = "A Dataset for Tracking Entities in Open Domain Procedural Text",
    author = "Tandon, Niket  and
      Sakaguchi, Keisuke  and
      Dalvi, Bhavana  and
      Rajagopal, Dheeraj  and
      Clark, Peter  and
      Guerquin, Michal  and
      Richardson, Kyle  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.520",
    doi = "10.18653/v1/2020.emnlp-main.520",
    pages = "6408--6417"
}

openpi-dataset's People

Contributors

dheerajrajagopal avatar fallcat avatar nikett avatar shirley-wu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.