GithubHelp home page GithubHelp logo

ukplab / emnlp2015-crowdsourcing Goto Github PK

View Code? Open in Web Editor NEW
1.0 29.0 5.0 105 KB

Leveraging crowdsource annotation item agreement for natural language tasks

License: Apache License 2.0

Java 75.08% Groovy 24.92%

emnlp2015-crowdsourcing's Introduction

Leveraging crowdsource annotation item agreement for natural language tasks

This project runs experiments comparing the benefit of soft labeling and filtering with label aggregation for learning a classification model n natural language tasks. This project is the experiment code described in the paper, "Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks" (Jamison and Gurevych, 2015).

Please use the following citation:

@inproceedings{	TUD-CS-2015179,
	author = {Emily Jamison and Iryna Gurevych},
	title = {Noise or additional information? Leveraging crowdsource annotation item
agreement for natural language tasks},
	month = sep,
	year = {2015},
	publisher = {Association for Computational Linguistics},
	booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language
Processing (EMNLP)},
	pages = {291--297},
	language = {Lisbon, Portugal},
	pubkey = {TUD-CS-2015-1179},
	research_area = {Ubiquitous Knowledge Processing},
	research_sub_area = {UKP_reviewed},
    url = {https://aclweb.org/anthology/D/D15/D15-1035.pdf}
}

Abstract: In order to reduce noise in training data, most natural language crowdsourcing annotation tasks gather redundant labels and aggregate them into an integrated label, which is provided to the classifier. However, aggregation discards potentially useful information from linguistically ambiguous instances. For five natural language tasks, we pass item agreement on to the task classifier via soft labeling and low-agreement filtering of the training dataset. We find a statistically significant benefit from low item agreement training filtering in four of our five tasks, and no systematic benefit from soft labeling.

Contact person: Emily Jamison, EmilyKJamison {at} gmail {dot} com

http://www.ukp.tu-darmstadt.de/

http://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Project structure

  • src/main/groovy/de/tudarmstadt/ukp/experiments/ej/repeatwithcrowdsource -- this folder contains java experiment code for 5 natural language tasks
  • resources/scripts -- this folder contains the Groovy files where experiment parameters may be set
  • Please note: 3rd party datasets must be downloaded from elsewhere

Requirements

  • Java 1.7 and higher
  • Maven
  • Tested on 64-bit Linux with 2 GB RAM (-Xmx2g)
  • 2 GB RAM

Installation

  • Follow the DKPro Core instructions to set your DKPro Home environment variable.

  • All dependencies are available on Maven Central; no 3rd party projects must be installed.

  • You will need to obtain the necessary corpora for the respective experiment you plan to run. Corpora and locations are described in (Jamison & Gurevych 2015), cited above.

  • For all experiments except Affective Text, prepare your corpus for our experiment architecture by dividing instances into cross-validation rounds of training and test data. We created "dev" and "final" batches of "train" and "test" datasets, resulting in (for RTE):

rte_orig.r0.devtest.txt
rte_orig.r0.devtrain.txt
rte_orig.r0.finaltest.txt
rte_orig.r0.finaltrain.txt
rte_orig.r1.devtest.txt
rte_orig.r1.devtrain.txt
rte_orig.r1.finaltest.txt
rte_orig.r1.finaltrain.txt
etc.

For each experiment, update file locations in the Groovy file in src/main/resources/scripts (such as method runManualCVExperiment()).

Running the experiments

To run an experiment, first set the experiment parameters in the respective Groovy file in src/main/resources/scripts; in particular, you may wish to change the path to your corpus or parameter instanceModeTrain, the feature set, or feature parameters.

Then, run the respective "RunXXXExperiment" in src/main/groovy/EXPERIMENTTORUN/. For example, to run the Biased language experiment, run the class src/main/groovy/biasedlanguage/RunBiasedLangExperiment.

Affective Text experiments run in a few seconds, while POS Tagging experiments may take several hours.

Expected results

After running the experiments, results should be printed to stdout. They can also be found in your dkpro home folder, under de.tudarmstadt.ukp.dkpro.lab/repository. You can change which results get printed from src/main/groovy/util/CombineTestResultsRegression or CombineTestResultsClassification, as appropriate. The tasks Biased Language, Affective Text use regression, while Stemming, RTE, and POS Tagging use classification.

emnlp2015-crowdsourcing's People

Contributors

betoboullosa avatar emilykjamison avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emnlp2015-crowdsourcing's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.