Conversational IR

Publication venues, datasets and other things.

I made a brief slide deck with a summary of the workshops in January 2020.

Relevant workshops and conferences

(Semi-)Relevant datasets and benchmarks

CAsT-19: A Dataset for Conversational Information Seeking
- "The corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets. Eighty information seeking dialogues (30 train, 50 test) are on average 9 to 10 questions long. A dialogue may explore a topic broadly or drill down into subtopics." (source)
FIRE 2020 task: Retrieval From Conversational Dialogues (RCD-2020)
- details are not yet known
MIMICS
- "A Large-Scale Data Collection for Search Clarification"
- "MIMICS-Click includes over 400k unique queries, their associated clarification panes, and the corresponding aggregated user interaction signals (i.e., clicks)."
- "MIMICS-ClickExplore is an exploration data that includes aggregated user interaction signals for over 60k unique queries, each with multiple clarification panes."
- "MIMICS-Manual includes over 2k unique real search queries. Each query-clarification pair in this dataset has been manually labeled by at least three trained annotators."
ClariQ challenge
- "we have crowdsourced a new dataset to study clarifying questions"
- "We have extended the Qulac dataset and base the competition mostly on the training data that Qulac provides. In addition, we have added some new topics, questions, and answers in the training set."
PolyAI
- "This repository provides tools to create reproducible datasets for training and evaluating models of conversational response. This includes: Reddit (3.7 billion comments), OpenSubtitles (400 million lines from movie and television subtitles) and Amazon QA (3.6 million question-response pairs in the context of Amazon products)"
Natural Questions: Google's latest question answering dataset.
- "Natural Questions contains 307K training examples, 8K examples for development, and a further 8K examples for testing."
- "NQ is the first dataset to use naturally occurring queries and focus on finding answers by reading an entire page, rather than extracting answers from a short paragraph. To create NQ, we started with real, anonymized, aggregated queries that users have posed to Google's search engine. We then ask annotators to find answers by reading through an entire Wikipedia page as they would if the question had been theirs. Annotators look for both long answers that cover all of the information required to infer the answer, and short answers that answer the question succinctly with the names of one or more entities. The quality of the annotations in the NQ corpus has been measured at 90% accuracy."
- Paper
🐥 QuAC: Question Answering in Context
- "A dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text."
🍃 CoQA: A Conversational Question Answering Challenge
- "CoQA contains 127,000+ questions with answers collected from 8000+ conversations. Each conversation is collected by pairing two crowdworkers to chat about a passage in the form of questions and answers."
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
- "HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts."
QANTA: Question Answering is Not a Trivial Activity
- "A question answering dataset composed of questions from Quizbowl - a trivia game that is challenging for both humans and machines. Each question contains 4-5 pyramidally arranged clues: obscure ones at the beginning and obvious ones at the end."
MSDialog
- "The MSDialog dataset is a labeled dialog dataset of question answering (QA) interactions between information seekers and answer providers from an online forum on Microsoft products."
- "The annotated dataset contains 2,199 multi-turn dialogs with 10,020 utterances."
🐋 ShARC: Shaping Answers with Rules through Conversation
- "Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader's background knowledge. We formalise this task and develop a crowd-sourcing strategy to collect 37k task instances."
Training Millions of Personalized Dialogue Agents
- 5 million personas and 700 million persona-based dialogues
Ubuntu Dialogue Corpus
- "A dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words."
MultiWOZ: Multi-domain Wizard-of-Oz dataset
- "Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics."
Frames: Complex conversations and decision-making
- "The 1369 dialogues in Frames were collected in a Wizard-of-Oz fashion. Two humans talked to each other via a chat interface. One was playing the role of the user and the other one was playing the role of the conversational agent. We call the latter a wizard as a reference to the Wizard of Oz, the man behind the curtain. The wizards had access to a database of 250+ packages, each composed of a hotel and round-trip flights. We gave users a few constraints for each dialogue and we asked them to find the best deal."
Wizard of Wikipedia: 22,311 conversations grounded with Wikipedia knowledge
- "We consider the following general open-domain dialogue setting: two participants engage in chit-chat, with one of the participants selecting a beginning topic, and during the conversation the topic is allowed to naturally change. The two participants, however, are not quite symmetric: one will play the role of a knowledgeable expert (which we refer to as the wizard) while the other is a curious learner (the apprentice)."
- "... first crowd-sourcing 1307 diverse discussion topics and then conversations involving 201,999 utterances about them"
- Available at http://www.parl.ai/.
Qulac: 10,277 single-turn conversations consiting of clarifying questions and their answers on multi-faceted and ambigouous queries from TREC Web track 2009-2012.
- "Qulac presents the first dataset and offline evaluation framework for studying clarifying questions in open-domain information-seeking conversational search systems."
- "... we collected Qulac following a four-step strategy. In the first step, we define the topics and their corresponding subtopics. In the second step, we collected several candidates clarifying questions for each query through crowdsourcing. Then, in the third step, we assessed the relevance of the questions to each facet and collected new questions for those facets that require more specific questions. Finally, in the last step, we collected the answers for every query-facet-question triplet."
TREC 2019 Conversational Search task

Non-relevant but interesting datasets

CoSQL: CoSQL is a corpus for building cross-domain Conversational text-to-SQL systems.
Spider: Yale Semantic Parsing and Text-to-SQL Challenge
- "Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables covering 138 different domains."
Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
MS Marco: question answering and passage re-ranking

Tooling

Script to find SOTA results

tonellotto / conversationalir Goto Github PK

conversationalir's Introduction

Conversational IR

Relevant workshops and conferences

(Semi-)Relevant datasets and benchmarks

Non-relevant but interesting datasets

Tooling

Other sites

conversationalir's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs