GithubHelp home page GithubHelp logo

ahoho / dart Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yale-lily/dart

0.0 2.0 0.0 267.38 MB

License: MIT License

Shell 0.16% Python 1.39% Perl 1.27% Java 14.20% TeX 0.03% Lex 81.73% Jupyter Notebook 1.21%

dart's Introduction

DART: Open-Domain Structured Data Record to Text Generation

DART is a large and open-domain structured DAta Record to Text generation corpus with high-quality sentence annotations with each input being a set of entity-relation triples following a tree-structured ontology. It consists of 82191 examples across different domains with each input being a semantic RDF triple set derived from data records in tables and the tree ontology of table schema, annotated with sentence description that covers all facts in the triple set.

DART is released in the following paper where you can find more details and baseline results.

Citation

@article{radev2020dart,
  title={DART: Open-Domain Structured Data Record to Text Generation},
  author={Dragomir Radev and Rui Zhang and Amrit Rau and Abhinand Sivaprasad and Chiachun Hsieh and Nazneen Fatema Rajani and Xiangru Tang and Aadit Vyas and Neha Verma and Pranav Krishna and Yangxiaokang Liu and Nadia Irwanto and Jessica Pan and Faiaz Rahman and Ahmad Zaidi and Murori Mutuma and Yasin Tarabar and Ankit Gupta and Tao Yu and Yi Chern Tan and Xi Victoria Lin and Caiming Xiong and Richard Socher},
  journal={arXiv preprint arXiv:2007.02871},
  year={2020}

Data Content and Format

The DART dataset is available in the data/ directory. The dataset consists of JSON files in data/. Each JSON file contains a list of tripleset-annotation pairs of the form:

  {
    "tripleset": [
      [
        "Ben Mauk",
        "High school",
        "Kenton"
      ],
      [
        "Ben Mauk",
        "College",
        "Wake Forest Cincinnati"
      ]
    ],
    "subtree_was_extended": false,
    "annotations": [
      {
        "source": "WikiTableQuestions_lily",
        "text": "Ben Mauk, who attended Kenton High School, attended Wake Forest Cincinnati for college."
      }
    ]
  }

We also provide delexicalization dictionaries in data/**/delex/ that map string entities to entity categories.

Leaderboard

We maintain a leaderboard on our test set.

Model BLEU METEOR TER MoverScore BERTScore BLEURT
BART (Lewis et al., 2020) 37.06 0.36 0.57 0.44 0.92 0.22
Seq2Seq-Att (MELBOURNE) 29.60 0.28 0.62 0.32 0.90 -0.11
End-to-End Transformer (Castro Ferreira et al., 2019) 19.87 0.26 0.65 0.28 0.87 -0.20

dart's People

Contributors

amritrau avatar tangxiangru avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.