GithubHelp home page GithubHelp logo

dorothychai / jfleg Goto Github PK

View Code? Open in Web Editor NEW

This project forked from keisks/jfleg

0.0 1.0 0.0 778 KB

JFLEG (JHU FLuency-Extended GUG) corpus for Grammatical Error Correction Evaluation

Python 75.26% Shell 0.70% HTML 24.03%

jfleg's Introduction

JFLEG (JHU FLuency-Extended GUG) corpus

Last updated: December 7th, 2018

(Make sure to download and use the latest version.)

link to the paper


Data

.
├── EACL_exp      # experiments in the EACL paper
│   ├── m2converter # script to create m2 format from plain texts
│   ├── mturk     # mechanical turk experiments
│   │   ├── sample.csv
│   │   ├── pairwise.csv
│   │   └── template.html
│   └── manual_eval # manual analysis of 100 sentences
│       ├── README.md
│       └── coded_sentences.csv
├── README.md     # This file
├── EACLshort037.pdf
├── dev           # dev set (754 sentences originally from the GUG **test** set)
│   ├── dev.ref0
│   ├── dev.ref1
│   ├── dev.ref2
│   ├── dev.ref3
│   ├── dev.spellchecked.src (spellchecked by enchant)
│   └── dev.src   # source (This should be the input for your system.)
├── eval
│   └── gleu.py   # evaluation script (sentence-level GLEU score)
└── test          # test set (747 sentenses ogirinally from the GUG **dev** set)
    ├── test.ref0
    ├── test.ref1
    ├── test.ref2
    ├── test.ref3
    ├── test.spellchecked.src (spellchecked by enchant)
    └── test.src   # source (This should be the input for your system.)

Evaluation

e.g. python ./eval/gleu.py -r ./dev/dev.ref[0-3] -s ./dev/dev.src --hyp YOUR_SYSTEM_OUTPUT

Leader Board (published results)

N.B. Sytems with asterisk (*) are tuned on different data.

System GLEU (dev) GLEU (test)
Ge et al. (2018) N/A 62.42
Grundkiewicz and Junczys-Dowmunt (2018) N/A 61.50
Junczys-Dowmunt et al. (2018) N/A 59.90
Chollampatt and Ng (2018) 52.48 57.47
Chollampatt and Ng (2017) 51.01 56.78
Xie et al. (2018)* N/A 56.20
Sakaguchi et al. (2017) 49.82 53.98
Ji et al. (2017)* 48.93 53.41
Yuan and Briscoe (2016)* 47.20 52.05
Junczys-Dowmunt and Grundkiewicz (2016) 49.74 51.46
Chollampatt et al. (2016)* 46.27 50.13
Felice et al. (2014)* 42.81 46.04
=================================== ========== ==========
SOURCE 38.21 40.54
REFERENCE 55.26 62.37
  • If you want to add your score, please send an e-mail to keisuke[at]cs.jhu.edu a link to your paper and system outputs.
  • The reference scores are computed by averaging each reference.

Reference

The following paper should be cited in any publications that use this dataset:

Courtney Napoles, Keisuke Sakaguchi and Joel Tetreault. (EACL 2017): JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain. April 03-07, 2017.

Michael Heilman, Aoife Cahill, Nitin Madnani, Melissa Lopez, Matthew Mulholland, and Joel Tetreault. (ACL 2014): Predicting Grammaticality on an Ordinal Scale. In Proceedings of the Association for Computational Linguistics. Baltimore, MD, USA. June 23-25, 2014.

bibtex information:

@InProceedings{napoles-sakaguchi-tetreault:2017:EACLshort,
  author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Tetreault, Joel},
  title     = {JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction},
  booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
  month     = {April},
  year      = {2017},
  address   = {Valencia, Spain},
  publisher = {Association for Computational Linguistics},
  pages     = {229--234},
  url       = {http://www.aclweb.org/anthology/E17-2037}
}

@InProceedings{heilman-EtAl:2014:P14-2,
  author    = {Heilman, Michael  and  Cahill, Aoife  and  Madnani, Nitin  and  Lopez, Melissa  and  Mulholland, Matthew  and  Tetreault, Joel},
  title     = {Predicting Grammaticality on an Ordinal Scale},
  booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  month     = {June},
  year      = {2014},
  address   = {Baltimore, Maryland},
  publisher = {Association for Computational Linguistics},
  pages     = {174--180},
  url       = {http://www.aclweb.org/anthology/P14-2029}
}

Questions

  • Please e-mail Courtney Napoles (napoles[at]cs.jhu.edu) and Keisuke Sakaguchi (keisuke[at]cs.jhu.edu).

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

jfleg's People

Contributors

keisks avatar cnap avatar snukky avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.