GithubHelp home page GithubHelp logo

biassum's Introduction

BiasSum

Data and code for "Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization " by Taehee Jung*, Dongyeop Kang*, Lucas Mentch and Eduard Hovy (*equal contribution), EMNLP 2019. If you have any questions, please contact to Dongyeop Kang ([email protected]).

We provide a platform (BiasSum.com) for bias analysis of your system across different summarization corpora. Please evaluate your summarization system across differet domains of datasets and metrics, and measure general performance on robustness against the biases.

Citation

@inproceedings{jungkang19emnlp_biassum,
    title = {Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization},
    author = {Taehee Jung and Dongyeop Kang and Lucas Mentch and Eduard Hovy},
    booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    address = {Hong Kong},
    month = {November},
    url = {https://arxiv.org/abs/1908.11723},
    year = {2019}
}

Note

  • Some codes are still under development. We will be refactoring them soon.
  • If you like to add a new dataset or a new evaluation metric, please contact to Dongyeop.

Installation

Please download the pre-processed nine summarization copora in task. Every corpora has the same format of dataset as follow:

Dataset format: 
[source sentences] \t [target sentences]
or
<s> I was at home .. </s> <s> It was rainy day ..</s> ... \t <s> Sleeping at home rainy day </s> ..

An example python script for loading each dataset is provided here

python example/data_load.py --dataset AMI

Summarization Corpora

Please check [task] tab for more details in BiasSum.com/task). If you like to download all the preprocessed dataset at once, please download here.

NOTE: the links are not available now. Please download the pre-processed datasets here.

Type Name Preprocessed Dataset Original
News CNNDM link link
News NewsRoom link link
News XSum link link
Papers PeerRead link link
Papers PubMed link link
Books BookSum - link
Dialogues AMI link link
Posts Reddit link link
Script MovieScript link link

Evaluation Metrics

  • avearged ROUGE with reference abstractive summaries (R)
  • Sentence overlap score with Oracle extractive summaries (SO)
  • Volume overlap score with reference abstractive summaries (VO)
  • the balance across three aspects (P/D/I)

Leaderboard

  • Please contact to Dongyeop if you like to add your system to the leaderboard with your R/SO/VO/PDI scores across corpora.

biassum's People

Contributors

dykang avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

biassum's Issues

cannot reach the platform

Hi, Thanks for this excellent code and published leaderboard. I am trying to use it to eval one dataset. However, I can not reach the platform (http://biassum.com/), Is it still working now?

Many Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.