budzianowski / multiwoz Goto Github PK
View Code? Open in Web Editor NEWSource code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)
License: MIT License
Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)
License: MIT License
Hi, what does pre_invalid
mean in the book
field? How is it relate to invalid
? Thanks!
Natural Language Generation
Here the Baseline (Budzianowski et al. 2018 https://pdfs.semanticscholar.org/47d0/1eb59cd37d16201fcae964bd1d2b49cfb55e.pdf) model got the BLEU score 0.632。
However, when I read the experiment records in the paper , there is no where to find such BLEU score.
Some conversations have no dialog_act
annotation in MultiWOZ 2.1:
PMUL4707.json
PMUL2245.json
PMUL4776.json
PMUL3872.json
PMUL4859.json
Hi, thanks a lot for sharing the data!
I found that some slot labels in MultiWOZ2.2 are incomplete, like follow.
In PMUL0698.json, trun6, user says:"I am leaving from Cambridge and going to Norwich." for book train, this turn has train-departure and train-destination slots, but there are no labels in slots field, Will the data is not yet complete labeling?
The following is the part of MultiWOZ2.2/dev/dialogues_001.json
{
"frames": [
{
"actions": [],
"service": "restaurant",
"slots": [],
"state": {
"active_intent": "NONE",
"requested_slots": [],
"slot_values": {
"restaurant-area": [
"centre"
],
"restaurant-food": [
"chinese"
]
}
}
},
{
"actions": [],
"service": "train",
"slots": [],
"state": {
"active_intent": "find_train",
"requested_slots": [],
"slot_values": {
"train-day": [
"sunday"
],
"train-departure": [
"cambridge"
],
"train-destination": [
"norwich" # in slot-values but not in slots, and no slots fields contains it before this turn.
],
"train-leaveat": [
"16:15"
]
}
}
},
{
"actions": [],
"service": "taxi",
"slots": [],
"state": {
"active_intent": "NONE",
"requested_slots": [],
"slot_values": {}
}
},
{
"actions": [],
"service": "bus",
"slots": [],
"state": {
"active_intent": "NONE",
"requested_slots": [],
"slot_values": {}
}
},
{
"actions": [],
"service": "police",
"slots": [],
"state": {
"active_intent": "NONE",
"requested_slots": [],
"slot_values": {}
}
},
{
"actions": [],
"service": "hotel",
"slots": [],
"state": {
"active_intent": "NONE",
"requested_slots": [],
"slot_values": {}
}
},
{
"actions": [],
"service": "attraction",
"slots": [],
"state": {
"active_intent": "NONE",
"requested_slots": [],
"slot_values": {}
}
},
{
"actions": [],
"service": "hospital",
"slots": [],
"state": {
"active_intent": "NONE",
"requested_slots": [],
"slot_values": {}
}
}
],
"speaker": "USER",
"turn_id": "6",
"utterance": "I am leaving from Cambridge and going to Norwich."
}
Hi!
Your preprocessing script in create_delex_data.py
is a good start for working with Multiwoz. But, it was implemented in Python 2 that is depreciated. I refactored this part and dependent codes for being compatible in Python 3 once for my research project that I think can help others. I want to ask may I push them and make a pull request?
Thanks!
The 2.2 dataset doesn't appear to have any system action annotations at all. The json format for it is convenient, but it isn't useful to me without the action annotations. Will they be added soon?
Hi,
May I ask, whether the three MultiWOZ dataset in your folder data/ are same as downloaded from https://www.repository.cam.ac.uk/handle/1810/294507
Thanks for your feedback.
I want to the storage structure of this dataset, so just ran 'create_delex_data.py' and also read all of the source code in this script. But i still confused about "db" and "bs". could you help me explain something about these two variables? Thanks in advance!
Hi Paweł,
Do you know if there's are mirrors of the dataset anywhere? The cambridge website has reported that it's under maintenance for a couple weeks now.
Cheers,
Stephen
Line 267 in 9fd409f
In the Multi-WOZ 2.2 "data.json", line 7419586, dialog_id : MUL1382.json:
"text": "We've narrowed it down to 3. kihinoor, the gandhi, and mahal of cambridge. Would you like me to make a reservation for you?"
"text": "Yes please make a reservation for 3 people at 16:00 on Saturday at any of those choices."
"text": "I was able to book at Kohinoor for 16:00 on Saturday for 3 people. Your reference number is NTJ52ASI. The table will be held for 15 minutes."
Actually, in the database for restaurant domain, there is no restaurant named "kihinoor" but there is one restaurant named "kohinoor". And based on the next two utterances, I believe the first restaurant name in the first utterance should be "kohinoor".
Firstly, thanks for launch of MultiWOZ2.2 dataset. Really appreciate for the contribution and correctness.
I found there are 15 errors in system act annotation in MultiWOZ2.2. please find more details in the following. for every annotation error, I showed dialogue_id, turn_id, span_info and the corresponding system response. Hopefully it helps. Thanks a lot.
**MUL0963.json
13
['Taxi-Inform', 'arriveby', '9:15', 19, 19]
Ok, a white audi will pick you up at cafe jello gallery and bring you to Ali baba by 19:15. You can contact the driver at 07646811518. Anything else?
MUL1382.json
3
['Restaurant-Inform', 'name', 'kihinoor', 29, 37]
We've narrowed it down to 3. kohinoor, the gandhi, and mahal of cambridge. Would you like me to make a reservation for you?
PMUL0363.json
9
['Restaurant-Inform', 'food', 'French', 35, 41]
Restaurant Restaurant Two Two is an expensive French restaurant in the north with wonderful food. Would you like to book a table?
PMUL0363.json
9
['Restaurant-Inform', 'area', 'north', 60, 65]
Restaurant Restaurant Two Two is an expensive French restaurant in the north with wonderful food. Would you like to book a table?
PMUL0363.json
9
['Restaurant-Inform', 'pricerange', 'expensive', 25, 34]
Restaurant Restaurant Two Two is an expensive French restaurant in the north with wonderful food. Would you like to book a table?
PMUL0363.json
9
['Restaurant-Inform', 'name', 'Two Two', 11, 18]
Restaurant Restaurant Two Two is an expensive French restaurant in the north with wonderful food. Would you like to book a table?
PMUL2368.json
11
['Booking-Book', 'ref', '9Z58HWE1,general-reqmore:', 10, 10]
I have you booked at Charlie Chan on Saturday at 20:00 for 5 people. Your reference number is 9Z58HWE1. They hold the table for 15 minutes. Is there anything else?
PMUL2584.json
11
['Taxi-Inform', 'leaveat', '19:00,general-reqmore:', 11, 11]
A grey skoda will pick you up at the hotel by 19:00 to take you to the Castle Galleries. Your contact number is 07375156908. Will there be anything else today? : 07375156908
PMUL3093.json
5
['Train-Inform', 'arriveby', '1:54', 3, 3]
TR8659 leaves at 10:09 and arrives at 11:54, will that work for you?
PMUL3382.json
11
['Train-Inform', 'leaveat', '11:50', 2, 2]
TR0767 leaves at11:50 on Friday morning, arriving 12:07. Price is 4.40 pounds. Would you like me to book a seat?
PMUL4077.json
13
['Taxi-Inform', 'arriveby', '5:15', 5, 5]
Ok you will arrive at 15:15 in a yellow skoda Contact number :07710839987
PMUL4115.json
5
['Train-Inform', 'leaveat', '19:39', 2, 2]
TR3197 leaves atb19:39 and costs 13:39 pounds. is that fine with you?
PMUL4385.json
3
['Train-Inform', 'leaveat', '9:29', 20, 20]
You have a few options available if you're traveling from bishops stortford to cambridge. There is a train leaving at 09:29, Does that work for you?
SNG01733.json
5
['Train-Inform', 'leaveat', '5:40', 6, 6]
Train TR7213 departing from cambridge at 05:40 and arriving at stansted airport at 06:08 will be the best option for you.
SNG1041.json
9
['Hotel-Inform', 'type', 'guesthouse,general-reqmore:', 10, 11]
I remind you that you can check-in in this guesthouse after 3:00 pm. You can leave your suitcases anytime.**
NLTK breaks when trying to run create_delex_data.py
, there are syntax errors resulting. The solution would be to either specify the exact versions of the packages used in requirements.txt
or update the code to python 3.
I want to annotate similar data for different language, thus I want to build a web-based annotation tool, can you share the code of annotation tool or any suggestion?
Thanks.
Hi, I have a question about requestable slots for success rate.
Line 210 in d5f0a56
Hi,
I don't seem to understand the inform metric very well. What do you exactly mean by providing the right entity and why is the inform rate not 100% even with the oracle belief state. Does this mean that dialogue state prediction systems must do better than oracle in-order to improve inform rate?
I noticed some of the results listed in the Benchmarks
are different from those claimed in the original articles. For example, in SimpleTOD article, Joint Accuracy is 56.45, while in your list, the number is 55.72.
How do you get these results? Is there a script for everyone? Or you just rerun their model and report the results you get?
from nlp import normalize in dbPointer.py. What is this nlp library? link to install?
I just run the "convert_to_multiwoz_format.py" in MultiWOZ2.2, there is the following error:
File "convert_to_multiwoz_format.py", line 85, in main
clean_dialogue = clean_data[dialogue_id]
KeyError: 'SNG01862.json'
it seems there is no 'SNG01862.json' in the dialogue_acts.json file. how to fit it? @XiaoxueZang
Thanks in advance.
Line 335 in 42a1ff2
Hello,
We have just published our work which reach new SoTA for policy optimization and end-to-end generation tasks.
https://www.aclweb.org/anthology/2020.coling-main.41.pdf
Policy optimization
Match/Success/BLEU
MW 2.0: 97.50/94.80/0.12
MW 2.1: 96.39/83.57/0.14
End-to-End generation
Match/Success/BLEU
MW 2.0: 91.80/81.80/0.12
Would it be possible to include this work on the page? Thank you!
Hey
Could you please upload the data for MultiWOZ2.2 here aswell?
Hi, The hospital-dbase.db and tax-dbase.db are empty in the db folder.
Hi, I wanna know is there an official evaluation tool for dialog belief tracking? The code of TRADE is built-in and confusing...
Is it okay to use dialog_acts.json that was used in version 2.0?
Or is it that the version 2.1 does not require dialog_act.json?
Dear All,
we recently handed in a pre-print of our soon-to-be-published paper about a new model for DST on MultiWOZ 2.1, where we achieve a JGA of 55.3%.
https://arxiv.org/abs/2005.02877
Thank you & Best regards
Hi Budzianowski,
Have you run the code on GPUs? I cannot make it. So may I know which version of Pytorch are you using?
Can you add the license for the baseline in case people want to use it? Thanks!
What do these two fields mean in book domain in MultiWOZ 2.0?
Hello,
I'm conducting some experiments with trade on MultiWOZ 2.1. I simply replaced the dataset used by trade, which experiments on MultiWOZ 2.0, with the hyperparameters unchange., However, this only reached an accuracy of 35% approximately. This result is much lower than the result this paper reported, which I guess it may be an issue of hyperparameters of the trade model.
However, I'm not able to find any reference related to this, I even have no idea if the hyperparameters of trade, or other models, change between these two datasets. I wonder if it is possible for me to get the specific values of these hyperparamters of models on MultiWOZ 2.1, so I could reproduce the result? Thanks in advance.
I have started to collect a new dataset for a new domain, but I don't know how to annotate the dataset.
Should I annotate them manually? Or is there a helpful tool to do it?
Hi Paweł ,
We just released a paper last week : SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model. SOLOIST a pretraining-finetuning solution to building task-oriented dialog at scale with limited training examples and annotation efforts. Details can be found at https://arxiv.org/pdf/2005.05298.pdf . Project website is at here
We have updated numbers on context-to-response, and end-to-end evaluation setting. @budzianowski Could you please help update the leaderboard ?
Context-to-response using MultiWOZ 2.0
Inform: 89.60
Success: 79.30
BLEU: 18.03
End-to-end Evaluation using MultiWOZ 2.0:
Inform: 85.50
Success: 72.90
BLEU : 16.54
We have released our new results on arxiv
A Simple Language Model for Task-Oriented Dialogue
TL;DR: SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal language model trained on all sub-tasks recast as a single sequence prediction problem.
https://arxiv.org/abs/2005.00796
Belief Tracking:
version joint acc
2.1 55.72
Policy Optimization:
version Inform Success Bleu
2.0 84.4 70.1 15.01
2.1 85 70.5 15.23
@budzianowski do you update the leaderboard? Or should we open a PR?
cc @bmccann
Hi,
I found "span_info" of text: "The city centre north b and b has parking and wifi. It is in the north area. Would you like to book this hotel?" an error, i think the index of the value ' north' should. be the second 'north', the first 'north' is a part of name 'city centre north b and b'. could you modify that?
details in he following
"text": "The city centre north b and b has parking and wifi. It is in the north area. Would you like to book this hotel?",
"metadata": {
"taxi": {
"book": {
"booked": []
},
"semi": {
"leaveAt": "",
"destination": "",
"departure": "",
"arriveBy": ""
}
},
"police": {
"book": {
"booked": []
},
"semi": {}
},
"restaurant": {
"book": {
"booked": [
{
"name": "nandos city centre",
"reference": "LYIENP77"
}
],
"people": "4",
"day": "wednesday",
"time": "15:00"
},
"semi": {
"food": "not mentioned",
"pricerange": "not mentioned",
"name": "nandos city centre",
"area": "not mentioned"
}
},
"hospital": {
"book": {
"booked": []
},
"semi": {
"department": ""
}
},
"hotel": {
"book": {
"booked": [],
"people": "",
"day": "",
"stay": ""
},
"semi": {
"name": "not mentioned",
"area": "not mentioned",
"parking": "yes",
"pricerange": "not mentioned",
"stars": "0",
"internet": "yes",
"type": "guesthouse"
}
},
"attraction": {
"book": {
"booked": []
},
"semi": {
"type": "",
"name": "",
"area": ""
}
},
"train": {
"book": {
"booked": [],
"people": ""
},
"semi": {
"leaveAt": "",
"destination": "",
"day": "",
"arriveBy": "",
"departure": ""
}
}
},
"dialog_act": {
"Booking-Inform": [
[
"none",
"none"
]
],
"Hotel-Inform": [
[
"Name",
"city centre north b and b"
],
[
"Area",
"north"
],
[
"Internet",
"none"
],
[
"Parking",
"none"
]
]
},
"span_info": [
[
"Hotel-Inform",
"Name",
"city centre north b and b",
1,
6
],
[
"Hotel-Inform",
"Area",
"north",
3,
3
]
]
},
The training data contains goals asking for hospital name, postcode and address, but the hospital database only contains departments and phone numbers. Any idea where the complete hospital database could be found? It must exist for the training data to have been created. For example: "Addenbrookes Hospital on Hills Rd".
In db pointer vector, there is information of whether booking is available or not.
Is there a way that using predicted belief states, we compute booking availability for the retrieved entities?
Example:
for the attached example with following gold belief on restaurant, the retried entity has booking=available in db pointer vector, but there is no booking information in the restaurant db
belief : {'pricerange': 'cheap', 'area': 'centre', 'name': 'dojo noodle bar'}
retrieved restaurant: ('19225', '40210 Millers Yard City Centre', 'centre', 'asian oriental', 'dojo noodle bar serves a variety of japanese chinese vietnamese korean and malaysian dishes to eat in or take away sister restaurant to touzai', 'dojo noodle bar', '01223363471', 'cb21rq', 'cheap', 'NULL', 'restaurant')
Hi.When I use the test_dials to test my dst model.I found that some slot-values about train-arriveby are not update in the ground truth.
Such as MUL2294 :
"transcript": "i need to travel on saturday from cambridge to london kings cross and need to leave after 18:30",
"system_transcript": "train tr0427 leaves at 19:00 on saturday and will get you there by 19:51. the cost is 18.88 pounds. want me to book it?",
"transcript": "yes please book the train for 1 person and provide the reference number"
From above we can see that the slot-value train-arriveby has changed in the dialog.But why this slot-value is not changed in the ground truth?
This confused me a lot,hope you can have a look,thank you!
Hi~ I'm Tianbao from Harbin Institute of Technology, I recently load the db from your json files to carry out some reseach on dialogue system, and I noticed db/taxi_db.json couldn't be convert to list of dict by pyhton json pkg since
Line 3 in e87b0a3
Hello, I trying figure out whether is there a way to convert a value that appears in a slot to a canonical form suitable for querying database.
For example, according to the database, there exists an attraction named sheep's green and lammas land park fen causeway
which appears in various forms in the annotation, namely:
sheep's green and lammas land park
sheeps green and lammas land park fen
sheep's green
sheeps green
lammas land park
I belief that I need to convert all those names to the original form to successfully query the database, is it true? And is there a code which can do it for me (or a mapping, normalization etc.)?
hello,
In this two lines,
https://github.com/budzianowski/multiwoz/blob/master/model/evaluator.py#L212
https://github.com/budzianowski/multiwoz/blob/master/model/evaluator.py#L362
The "venue_offered[domain]" is a string, so the [0] will give you just a token "[", I think the "venue_offered[domain][0] in goal_venues" does not do the logic you try to do.
Please check if this makes any mistake in your evaluation. Thanks.
Can the model run on gpu? I got errors when I set no_cuda to False.
There is no instruction in README on how to run preprocessing for 2.2. Instead, it implies that delexicalization is only compatible with earlier versions. Can someone confirm if that's the case? Thanks.
The line I refer to means that the final informed values(from dialog actions) are checked with ground-truth values. See the code,the informed_value
will be set to none
if the value from dialog action is not equal to ground-truth value.
Originally posted by @HuangLK in #64 (comment)
Hi.
Line 133 in a24d299
Line 73 in a24d299
But, the time values of train(leaveAt, arrvieBy) are also normalized to [value_time].
In this case, I think we cannot check whether the values meet the user's goal, right?
Is it intended?
I noticed some of the results listed in the Benchmarks
are different from those claimed in the original articles. For example, in SimpleTOD article, Joint Accuracy is 56.45, while in your list, the number is 55.72.
How do you get these results? Is there a script for everyone? Or you just rerun their model and report the results you get?
Hello, I have a hard time evaluating my model.
First, the score for DAMD in the end-to-end modeling table should be 16.6 (as described in their paper) and not 18.6.
Second, I found out that the way I tokenize my responses highly affects the resulting BLEU score. I checked the systems from the end-to-end modeling table that have an open implementation and I am afraid that the numbers are not comparable:
.
,
!
?
:
's
with spaces, split them by whitespaces, and use the tokens for the BLEU score.I evaluated my data using different tokenization approaches and there are results:
.
,
!
?
:
's
- 16.9I think this shows that the evaluation script in this repository should be modified so that it first normalizes the input strings (for example using tokenization and immediate detokenization with the Moses tokenizer), somehow resolves the delexicalized spans (removes spaces etc., removes [
and ]
) and does the tokenization on its own. I would really appreciate a standalone script that would be able to output the score from the delexicalized responses with corresponding dialogue and turn ids (provided in a file in a predefined format).
Or at least a guide to the preferred tokenization would be highly appreciated (for future generations).
Similarly, it would be also very nice to have a standalone script for computing inform and success rates that would accept just a file with delexicalized responses (taking into accunt that domain names do not have to be present in the spans) and corresponding dialogue states in .json
I spotted some randomness in evaluation code. For example,
Line 142 in e4922d6
Wouldn't it make the match and success rates different even if we evaluate the same model twice?
In the example PMUL1848.json
the fail_book
field for the hotel domain is missing. The prompt message, on the other hand, does include instructions on what to do if booking fails:
If the booking fails how about <span class='emphasis'>friday</span>
Am I right that this is an inconsistency in the data, or am I missing something?
Here is the complete JSON goal:
'police': {},
'hospital': {},
'hotel': {'info': {'area': 'east',
'internet': 'yes',
'type': 'guesthouse',
'parking': 'yes'},
'fail_info': {},
'book': {'people': '5', 'day': 'thursday', 'invalid': True, 'stay': '5'}},
'topic': {'taxi': False,
'police': False,
'restaurant': False,
'hospital': False,
'hotel': False,
'general': False,
'attraction': False,
'train': False,
'booking': False},
'attraction': {},
'train': {'info': {'destination': 'cambridge',
'day': 'friday',
'arriveBy': '14:00',
'departure': 'stansted airport'},
'fail_info': {},
'book': {'invalid': True, 'people': '5'},
'fail_book': {}},
'message': ['You are planning your trip in Cambridge',
"You are looking for a <span class='emphasis'>place to stay</span>. The hotel should <span class='emphasis'>include free parking</span> and should <span class='emphasis'>include free wifi</span>",
"The hotel should be in the type of <span class='emphasis'>guesthouse</span> and should be in the <span class='emphasis'>east</span>",
"Once you find the <span class='emphasis'>hotel</span> you want to book it for <span class='emphasis'>5 people</span> and <span class='emphasis'>5 nights</span> starting from <span class='emphasis'>thursday</span>",
"If the booking fails how about <span class='emphasis'>friday</span>",
"Make sure you get the <span class='emphasis'>reference number</span>",
"You are also looking for a <span class='emphasis'>train</span>. The train should <span class='emphasis'>arrive by 14:00</span> and should be on <span class='emphasis'>the same day as the hotel booking</span>",
"The train should depart from <span class='emphasis'>stansted airport</span> and should go to <span class='emphasis'>cambridge</span>",
"Once you find the train you want to make a booking for <span class='emphasis'>the same group of people</span>",
"Make sure you get the <span class='emphasis'>reference number</span>"],
'restaurant': {}}```
In hotel domain, I saw some parking=none
annotations, what does it mean?
Does it possible to generate the data set of the SLU task, that is, the data set of the sequence labeling task, I need to obtain all possible values of slots in each turn (whether or not the values are in the results of the DST)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.