GithubHelp home page GithubHelp logo

dreamerriver / debatesum Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hellisotherpeople/debatesum

0.0 0.0 0.0 503 KB

Corresponding code repo for the paper at COLING 2020 - ARGMIN 2020: "DebateSum: A large-scale argument mining and summarization dataset"

Python 100.00%

debatesum's Introduction

DebateSum

Corresponding code repo for the upcoming paper at ARGMIN 2020: "DebateSum: A large-scale argument mining and summarization dataset"

Arxiv pre-print available here: https://arxiv.org/abs/2011.07251

Check out the presentation date and time here: https://argmining2020.i3s.unice.fr/node/9

Full paper as presented by the ACL is here: https://www.aclweb.org/anthology/2020.argmining-1.1/

The dataset is distributed as csv files.

A search engine over DebateSum (as well as some additional evidence not included in DebateSum) is available as debate.cards. It's very good quality and allows for the evidence to be viewed in the format that debaters use.

Data

DebateSum consists of 187328 debate documents, arguements (also can be thought of as abstractive summaries, or queries), word-level extractive summaries, citations, and associated metadata organized by topic-year. This data is ready for analysis by NLP systems.

Download

All data is accesable in a parsed format organized by topic year here

Addtionally, the trained word-vectors for debate2vec are also found in that folder.

Regenerating it yourself

This is useful as the debaters who produce the evidence release their work every year. Soon enough I will update to include the 2020-2021 topic.

Step 1: Download all open evidence files from Open Evidence and unzip them into a directory. The links are as follows:

  • 2019 - Resolved: The United States federal government should substantially reduce Direct Commercial Sales and/or Foreign Military Sales of arms from the United States.
  • 2018 - Resolved: The United States federal government should substantially reduce its restrictions on legal immigration to the United States.
  • 2017 - Resolved: The United States federal government should substantially increase its funding and/or regulation of elementary and/or secondary education in the United States.
  • 2016 - Resolved: The United States federal government should substantially increase its economic and/or diplomatic engagement with the People’s Republic of China.
  • 2015 - Resolved: The United States federal government should substantially curtail its domestic surveil-lance.
  • 2014 - Resolved: The United States federal government should substantially increase its non-military exploration and/or development of the Earth’s oceans.
  • 2013 - Resolved: The United States federal government should substantially increase its economic en-gagement toward Cuba, Mexico or Venezuela.

Step 2: Convert all evidence from docx files to html5 files using pandoc with this command:

for f in *.docx; do pandoc "$f" -s -o "${f%.docx}.html5"; done

Step 3: install the dependencies for make_debate_dataset.py.

pip install -r requirements.txt

Step 4: Modify the folder and file locations as needed for your system, and run make_debate_dataset.py

python3 make_debate_dataset.py

Credits

Huge thanks to Arvind Balaji for making debate.cards and being second author on this paper!

debatesum's People

Contributors

hellisotherpeople avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.