RankME: Reliable Human Ratings for NLG
Authors: Jekaterina Novikova, Ondrej Dusek and Verena Rieser
This repository contains the dataset and code released with the submission of our NAACL 2018 paper "RankME: Reliable Human Ratings for Natural Language Generation".
Contents
crowdflower:
This folder contains instructions, CML, CSS and JS code used in CrowdFlower tasks.
data:
This folder contains data files with human evaluation ratings collected via CrowdFlower.
Description
Setup 1 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected together. The folder crowdflower/setup_1, in correspondence with the paper, contains three code versions of Setup 1 - Likert, PlainME and RankME. Screenshots of the corresponding CrowdFlower tasks are shown in Fig.1:
Fig.1. Screenshots of three methods used with Setup 1 to collect human evaluation data. Left to right - Likert, PlainME and RankME methods
Setup 2 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected separately. The folder crowdflower/setup_2 provides CrowdFlower code for three collection methods (Likert, PlainME and RankME) for each human rating (informativeness*, naturalness and quality). Screenshots of the RankME method for Setup 2 for informativeness, naturalness and quality are shown in Fig.2:
Fig.2. Screenshots of the RankME methods/setup 2 used to collect human evaluation data. Left to right - informativeness, naturalness, quality.
Citing
If you use this code or data in your work, please cite the following paper:
@inproceedings{novikova2018rankME,
title={Rank{ME}: Reliable Human Ratings for Natural Language Generation},
author={Novikova, Jekaterina and Du{\v{s}}ek, Ondrej and Rieser, Verena},
booktitle={Proceedings of the 16th Annual Conference of the North American Chapter
of the Association for Computational Linguistics},
address={New Orleans, Louisiana},
pages={72--78},
year={2018},
url={http://aclweb.org/anthology/N18-2012},
}
License
Distributed under the Creative Commons 4.0 Attribution-ShareAlike license (CC4.0-BY-SA).
Acknowledgements
This research received funding from the EPSRC projects DILiGENt (EP/M005429/1) and MaDrIgAL (EP/N017536/1).