GithubHelp home page GithubHelp logo

2021_nli_datasets's Introduction

NLI_datasets: Datasets on inference

Dataset format:

premise | hypothesis| label

File format:

csv

FRACAS

  • paper: @MISC{Fracas96, author = {{The Fracas Consortium} and Robin Cooper and Dick Crouch and Jan Van Eijck and Chris Fox and Josef Van Genabith and Jan Jaspars and Hans Kamp and David Milward and Manfred Pinkal and Massimo Poesio and Steve Pulman and Ted Briscoe and Holger Maier and Karsten Konrad}, title = {Using the Framework}, year = {1996} }

  • year: 1996

  • type: NLI dataset

  • adapted from https://nlp.stanford.edu/~wcmac/downloads/fracas.xml We took P1, ..Pn as premise H as hypothesis label = {'yes': "entailment", 'no': 'contradiction', 'undef': "neutral", 'unknown': "neutral"}

    And we randomly split 80/20 for train/dev.

  • link: https://drive.google.com/open?id=13-gDw3lnxqnqYwXQUdH0bPsPPZXdD5n7

RTE

  • paper: @inproceedings{wang-etal-2018-glue, title = "{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding", author = "Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel", booktitle = "Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}", month = nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W18-5446", pages = "353--355", abstract = "For natural language understanding (NLU) technology to be maximally useful, it must be able to process language in a way that is not exclusively tailored to a specific task, genre, or dataset.", }
  • year: 2006, 2007, 2009 -> (created from RTE1, RTE2, RTE3, RTE5)
  • type: RTE dataset
  • link: https://drive.google.com/open?id=1-7xH81M__XsKF7Uog5nemoNppabiKqZy

COPA

  • paper: @inproceedings{wang-etal-2019-superglue, title = "{SuperGLUE}: A Stickier Benchmark for General-Purpose Language Understanding Systems", author = "Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel", year = "2019", url = "https://w4ngatang.github.io/static/papers/superglue.pdf"}

  • year: 21 March 2011 (constructed from http://people.ict.usc.edu/~gordon/copa.html)

  • type: RTE dataset

  • note: I have modified the original dataset as follows: original:

      premise                                |       choice1       |          choice2            |        label
      My body cast a shadow over the grass   |  The sun was rising |     The grass was cut       |          0
    
     transformed:
    
       premise                              |    hypothesis     |   label
       My body cast a shadow over the grass | The sun was rising| entailment
       My body cast a shadow over the grass | The grass was cut | not_entailment
    
  • link: https://drive.google.com/open?id=1UYXo9OQOnu51yjunHGBwSa6Jj_YFoCLR

WNLI

  • paper: @inproceedings{wang-etal-2018-glue, title = "{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding", author = "Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel", booktitle = "Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}", month = nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W18-5446", pages = "353--355", abstract = "For natural language understanding (NLU) technology to be maximally useful, it must be able to process language in a way that is not exclusively tailored to a specific task, genre, or dataset.", }
  • year: January 2012 -> (created from The Winograd Schema Challenge)
  • type: RTE dataset
  • link: https://drive.google.com/open?id=1R1aR18dC4ke1TUSzDl8Pwr4VviDgpX_q

SICK

SNLI

  • paper: @inproceedings{snli:emnlp2015, Author = {Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher, and Manning, Christopher D.}, Booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, Publisher = {Association for Computational Linguistics}, Title = {A large annotated corpus for learning natural language inference}, Year = {2015}}
  • year: 2015
  • type: NLI dataset
  • link: https://drive.google.com/open?id=1k1Mj0-vVdGuPkx3BaP_BGCUyRb-97Mx2

Add-one RTE

QNLI

  • paper: @inproceedings{wang-etal-2018-glue, title = "{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding", author = "Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel", booktitle = "Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}", month = nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W18-5446", pages = "353--355", abstract = "For natural language understanding (NLU) technology to be maximally useful, it must be able to process language in a way that is not exclusively tailored to a specific task, genre, or dataset.", }
  • year: 11 october 2016 -> (created from The Standford Question Answering Dataset)
  • type: RTE dataset
  • link: https://drive.google.com/open?id=1_5BK3fWBzQouS8XtLGx57TFQhRlAjRTN

MNLI

  • paper: @InProceedings{williams2018broad, author = {Williams, Adina and Nangia, Nikita and Bowman, Samuel R.}, title = {A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference}, booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, year = {2018}, publisher = {Association for Computational Linguistics}}
  • year: 18 Apr 2017
  • type: NLI dataset
  • link: https://drive.google.com/open?id=1zW3D9E6uVcKRvAEsS_inI31J4QO1J3po

JOCI

IIE

  • paper: @inproceedings{white-etal-2017-inference, title = "Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework", author = "White, Aaron Steven and Rastogi, Pushpendre and Duh, Kevin and Van Durme, Benjamin", booktitle = "Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = nov, year = "2017", address = "Taipei, Taiwan", publisher = "Asian Federation of Natural Language Processing", url = "https://www.aclweb.org/anthology/I17-1100", pages = "996--1005", abstract = "We propose to unify a variety of existing semantic classification tasks, such as semantic role labeling, anaphora resolution, and paraphrase detection, under the heading of Recognizing Textual Entailment (RTE). We present a general strategy to automatically generate one or more sentential hypotheses based on an input sentence and pre-existing manual semantic annotations. The resulting suite of datasets enables us to probe a statistical RTE model{'}s performance on different aspects of semantics. We demonstrate the value of this approach by investigating the behavior of a popular neural network RTE model.", }
  • year: 27 NOV 2017
  • type: RTE dataset
  • collect from: http://decomp.io/projects/diverse-natural-language-inference/
  • link: https://drive.google.com/open?id=1vYE0lAMf_G0iLKCV9oE31xxnxQPZbLTv

MPE

  • paper: @inproceedings{lai-etal-2017-natural, title = "Natural Language Inference from Multiple Premises", author = "Lai, Alice and Bisk, Yonatan and Hockenmaier, Julia", booktitle = "Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = nov, year = "2017", address = "Taipei, Taiwan", publisher = "Asian Federation of Natural Language Processing", url = "https://www.aclweb.org/anthology/I17-1011", pages = "100--109", abstract = "We define a novel textual entailment task that requires inference over multiplepremise sentences. We present a new dataset for this task that minimizestrivial lexical inferences, emphasizes knowledge of everyday events, andpresents a more challenging setting for textual entailment. We evaluate severalstrong neural baselines and analyze how the multiple premise task differs fromstandard textual entailment.", }
  • year: 27 NOV 2017
  • type: NLI dataset
  • link: https://drive.google.com/open?id=1oH5YpPg1gkddrrV6TRA5tA7zxnVkjqiX

Scitail

  • paper: @inproceedings{scitail, Author = {Tushar Khot and Ashish Sabharwal and Peter Clark}, Booktitle = {AAAI} Title = {SciTail: A Textual Entailment Dataset from Science Question Answering}, Year = {2018} }
  • year: 27 Apr 2018
  • type: RTE dataset
  • alteration: I have changed the labels: {"neutral": "not_entailment", "entailment": "entailment"}
  • link: https://drive.google.com/open?id=11VMy9l6RNBXfvgD9_GfMHHj8e95gokel

Commitment Bank

2021_nli_datasets's People

Contributors

felipessalvatore avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.