yale-lily / dart Goto Github PK

View Code? Open in Web Editor NEW

144.0 144.0 20.0 268.4 MB

Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"

License: MIT License

Shell 0.16% Python 1.52% Perl 1.26% Java 14.16% TeX 0.03% Lex 81.65% Jupyter Notebook 1.21%

dart's People

Contributors

Stargazers

Watchers

dart's Issues

Missing annotations in test set

Hi ! 😃

Great data augmentation you've done here!

I have noticed missing lexicalizations/annotations for some entries in your test set . (For example, eid=Id1392 in your dart-v1.1.1-full-test.xml file).

Do you plan to add those ?

Thanks!

about the size of DART dataset and its performance

Recently, I used GPT to do generation with DART dataset. However, I found that the test set may be different from other works. In fact, I can only get 5,097 samples for testing, while GEM website says their test set is 12,552. And the data provied in (Li, et al 2021) (https://github.com/XiangLi1999/PrefixTuning) also has 12,552 samples but they do not provide gold references.

Through the official evaluation scripts and test set, I obtain about 37-38 BLEU, which is much lower than the results (46-47 BLEU) reported by (Li, et al 2021) and other works (like the leaderboard in github: https://github.com/Yale-LILY/dart). So, I am confused that which one is right.

Could you please answer these questions if possible? I will be appreciate.

Reference

Li X L, Liang P. Prefix-tuning: Optimizing continuous prompts for generation[J]. arXiv preprint arXiv:2101.00190, 2021.

Number of references used for training and testing

I noticed that the generate_input_dart.py only used 3 references for evaluation. However, some examples have many more references. I was wondering if you could provide more details about the results in the paper. Can't seem to replicate your results. Also if you can share the fine-tuned T5 and Bart model on dart this would be very helpful.

Update Evaluation used for V.1.1.1

The references in /evaluation/dart_reference are not for the current version. Can you replace with the new references and share the tokenization script that is done to the predictions.

I am getting very different BLEU scores depending on tokenization, and how many references I use.
As there are up to ~30 for a few examples.

I would like to directly compare against the README leaderboard.

The number of examples in file dart-v1.1.1-full-train.json is much smaller than 82,191

The example number of Dart declaimed in Readme file and paper is 82,191, while the actual number of training set in file dart-v1.1.1-full-train.json is 30,526 which is much smaller than 82,191.

yale-lily / dart Goto Github PK

dart's People

Contributors

Stargazers

Watchers

Forkers

dart's Issues

Missing annotations in test set

about the size of DART dataset and its performance

Number of references used for training and testing

Update Evaluation used for V.1.1.1

The number of examples in file dart-v1.1.1-full-train.json is much smaller than 82,191

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs