GithubHelp home page GithubHelp logo

fendaq / task-oriented-dialogue-dataset-survey Goto Github PK

View Code? Open in Web Editor NEW

This project forked from atmahou/task-oriented-dialogue-research-progress-survey

0.0 2.0 0.0 20 KB

A dataset survey about task-oriented dialogue, including recent datasets.

task-oriented-dialogue-dataset-survey's Introduction

Task-Oriented Dialogue Dataset Survey

A dataset survey about task-oriented dialogue, including information about recent datasets.

See Survey Here or in Excel File

Name Introduction Multi/Single Turn Task Task Detail Public Accessible Links Size & Stats Included Label Missing Label
Dialog bAbI tasks data 1. Facebook's 6 task-oriented dialogues data set consist of 6 different tasks.2. Dataset for task 1-5 is constucted automaticly from bots' chat(Bot2Bot). And dataset for task 6 is simply reformated dstc2 dataset.3. A Shared database is included.4. This is the only task-oriented dataset among bAbI tasks.5. The goal of it is to evaluate end2end tasks, so there is not intents and slots. M Task Oriented Book a table at a restaurant Yes Download:https://research.fb.com/downloads/babi/Paper:http://arxiv.org/abs/1605.07683 For each task, training 1000 develop 1000test 1000 For tasks 1-5, second test set (with suffix -OOV.txt) that contains dialogs including entities not present. API callFull Database SlotIntentUser ActAgent Act
Stanford Dialog Dataset 1. Standford NLP group's data of car autopilot agent.2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/ M Task Oriented car autopilot agent: schedule, weather, navigation Yes Download:http://nlp.stanford.edu/projects/kvret/kvret_dataset_public.zipPaper:https://arxiv.org/abs/1705.05414 Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25 Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots) API callIntentSlot
Stanford Dialog Dataset Labeled 1. Stanford data labeled by us, relabel slot & intent2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/ to stanford data4. Annotation handbook: https://docs.google.com/document/d/1ROARKf8AJNnG2_nPINe1Xm5Rza7V0jPnQV8io09hcFY/edit M Task Oriented car autopilot agent: schedule, weather, navigation No N/A Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25 SlotIntent API callNeed to do sample alignment to get the following:Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots)Agent Reply
灵犀数据 1. The data is all single round user input divided into good words. There is more noise.2. Completed part of speech tagging and slot labeling3. Language: Chinese S Task Oriented conversational robot service user log No N/A Utterance: 5132 SlotPOS Agent replyIntentAPI callDatabase
DSTC-2 1. Human2Bot restaurant booking dataset2. For usage refer to: http://camdial.org/~mh521/dstc/downloads/handbook.pdf3. Each dialofue is stored in different folder, which contains log and label. M Task Oriented Booking restautant Yes http://camdial.org/~mh521/dstc/ Train 1612 callsDev 506 callsTest 1117 dialogs SlotUser Act(inform, request slots)Agent Act(inform, request slots) IntentAPI callDatabase
CamRest676 CamRest676 Human2Human dataset contains the following three json files:1. CamRest676.json: the woz dialogue dataset, which contains the conversion from users and wizards, as well as a set of coarse labels for each user turn.2. CamRestDB.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.3. The ontology file, specific all the values the three informable slots can take. M Task Oriented Booking restautant Yes Download:https://www.repository.cam.ac.uk/handle/1810/260970Paper:https://arxiv.org/abs/1604.04562 Total 676 DialoguesTotal 1500 TurnsTrain:Dev:Test 3:1:1 (Test set not given) SlotUser Act(inform, request slots)Agent Act(inform, request slots) IntentAPI callDatabase
Human-human goal oriented dataset 1. Maluuba reased a travel booking dataset2. Design for new task: frame tracking (allow comparing between history entities)3. Homepage: https://datasets.maluuba.com/Frames4. Human2Human M Task Oriented Travel Booking Yes Download:https://datasets.maluuba.com/Frames/dlPaper:https://arxiv.org/abs/1706.01690https://1drv.ms/b/s!Aqj1OvgfsHB7dsg42yp2BzDUK6U Dialogues 1369Turns 19986Average user satisfaction (from 1-5) 4.58 FrameUser agendaUser Act(inform, request slots)Agent Act(inform, request slots)API CallUser's satisfactionTask successfulDatabaseEntity reference Intent
DSTC4 1. Data name as TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists2. All the recorded dialogs with the total length of 21 hours have been manually transcribed and annotated with speech act and semantic labels for each turn level.3. Homepage: http://www.colips.org/workshop/dstc4/data.html4. Human2Human M Task Oriented Querry touristic information No N/A Train 20 dialogsTest 15 dialogs speech act (User & Agent)semantic labels(Intent? User & Agent)topic for turn (Intent?) N/A
Movie Booking Dataset 1. (Microsoft) Raw conversational data collected via Amazon Mechanical Turk, with annotations provided by domain experts.2. Human2Human M Task Oriented Booking Movie Yes Download:https://github.com/MiuLab/TC-Bot#dataPaper:TC-bot 280 dialoguesturns per dialogue is approximately 11 User Act(inform, request slots)Agent Act(inform, request slots)IntentSlots DatabaseAPI-call
Microsoft Dialogue Challenge human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. M Task Oriented Movie-Ticket BookingRestaurant ReservationTaxi Ordering Yes Paper:https://arxiv.org/pdf/1807.11125.pdf Task Intents Slots DialoguesMovie-Ticket Booking 11 29 2890Restaurant Reservation 11 30 4103Taxi Ordering 11 29 3094 IntentSlots DatabaseAPI-call

task-oriented-dialogue-dataset-survey's People

Contributors

atmahou avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.