GithubHelp home page GithubHelp logo

wadeyin9712 / qmsum Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yale-lily/qmsum

0.0 1.0 0.0 8.18 MB

Dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization"

qmsum's Introduction

QMSum

Overview

This repository maintains dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization".

QMSum is a new human-annotated benchmark for query-based multi-domain meeting summarization task, which consists of 1,808 query-summary pairs over 232 meetings in multiple domains.

If you use our dataset, please limit it to research purposes and cite our paper.

Dataset

You can access the train/valid/test set of QMSum through the data/ALL folder. In addition, QMSum is composed of three domains: data/Academic, data/Product and data/Comittee contain data in a single domain.

Files in each folder:

  • jsonl: data in .jsonl format.
  • all: all data in .json format.
  • train: training data.
  • val: validation data.
  • test: test data.

The format of json data is as follows:

{
    "topic_list": [
        {
            "topic": "Introduction of petitions and prioritization of governmental matters",
            "relevant_text_span": [["0","19"]]
        },
        {
            "topic": "Financial assistance for vulnerable Canadians during the pandemic and beyond",
            "relevant_text_span": [["21","57"], ["113","119"], ["191","217"]]
        },
        ...
    ],
    "general_query_list": [
        {
            "query": "Summarize the whole meeting.",
            "answer": "The meeting of the standing committee took place to discuss matters pertinent to the Coronavirus pandemic. The main issue at stake was to ..."
        },
        ...
    ],
    "specific_query_list": [
        {
            "query": "Summarize the discussion about introduction of petitions and prioritization of government matters.",
            "answer": "The Chair brought the meeting to order, announcing that the purpose of the meeting was to discuss COVID-19 's impact on Canada. Five petitions were presented ...",
            "relevant_text_span": [["0","19"]]
        },
		{
            "query": "What did Paul-Hus think about the introduction of petitions and prioritization of government matters?",
            "answer": "Mr. Paul-Hus thought that the government should not take firearms away from law-abiding Canadian citizens. He inquired into ...",
            "relevant_text_span": [["9","18"]]
        },
        ...
    ],
    "meeting_transcripts": [
        {
            "speaker": "The Chair (Hon. Anthony Rota (NipissingTimiskaming, Lib.))",
            "content": "I call the meeting to order.  Welcome to the third meeting of the House of Commons Special Committee on the COVID-19 Pandemic ..."
        },
        {
            "speaker": "Mr. Garnett Genuis (Sherwood ParkFort Saskatchewan, CPC)",
            "content": "Mr. Chair, I'm pleased to be presenting two petitions today. The first petition is with respect to government Bill C-7 ..."
        },
        ...
		{
            "speaker": "Hon. Seamus O'Regan",
            "content": "Mr. Chair, we have been working with our provincial partners. We have been working with businesses of all sizes in the oil and gas industry ...."
        },
        {
            "speaker": "The Chair",
            "content": "That's all the time we have for questions today. I want to thank all the members for taking part. The committee stands adjourned until tomorrow at noon.  The committee stands adjourned until tomorrow at noon. Thank you."
        }
    ]
}

Please note that there may be multiple relevant text spans for a topic or a specific query. The general query has no corresponding text spans because it corresponds to the entire meeting transcript.

Statistics

statistics

Experimental Results

statistics

qmsum's People

Contributors

maszhongming avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.