GithubHelp home page GithubHelp logo

facebookresearch / simmc Goto Github PK

View Code? Open in Web Editor NEW
130.0 20.0 39.0 749 KB

With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.

License: Other

Python 93.25% Shell 6.75%

simmc's Introduction

Situated Interactive MultiModal Conversations (SIMMC) Challenge 2020

Welcome to the Situated Interactive Multimodal Conversations (SIMMC) Track for DSTC9 2020.

The SIMMC challenge aims to lay the foundations for the real-world assistant agents that can handle multimodal inputs, and perform multimodal actions. We thus focus on task-oriented dialogs that encompass a situated multimodal user context in the form of a co-observed image or virtual reality (VR) environment. The context is dynamically updated on each turn based on the user input and the assistant action. Our challenge focuses on our SIMMC datasets, both of which are shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images).

Organizers: Ahmad Beirami, Eunjoon Cho, Paul A. Crook, Ankita De, Alborz Geramifard, Satwik Kottur, Seungwhan Moon, Shivani Poddar, Rajen Subba

Example from SIMMC

Example from SIMMC-Furniture Dataset

Latest News

  • [Apr 15, 2021] Released screenshots for SIMMC-Furniture (part 0, part 1, part 2). Also released improved API calls with newer heuristics as SIMMC v1.2 (PR).
  • [Dec 29, 2020] Fixed the errors in text spans for both SIMMC-Furniture and SIMMC-Fashion, released new JSON files as SIMMC v1.1 (PR).
  • [Sept 28, 2020] Test-Std data released, End of Challenge Phase 1.
  • [July 8, 2020] Evaluation scripts and code to train baselines for Sub-Task #1, Sub-Task #2 released.
  • [June 22, 2020] Challenge announcement. Training / development datasets (SIMMC v1.0) are released.

Note: DSTC9 SIMMC Challenge was conducted on SIMMC v1.0. Thus all the results and baseline performances are on SIMMC v1.0.

Important Links

Timeline

Date Milestone
June 22, 2020 Training & development data released
Sept 28, 2020 Test-Std data released, End of Challenge Phase 1
Oct 5, 2020 Entry submission deadline, End of Challenge Phase 2
Oct 12, 2020 Final results announced

Track Description

Tasks and Metrics

We present three sub-tasks primarily aimed at replicating human-assistant actions in order to enable rich and interactive shopping scenarios.

Sub-Task #1 Multimodal Action Prediction
Goal To predict the correct Assistant API action(s) (classification)
Input Current user utterance, Dialog context, Multimodal context
Output Structural API (action & arguments)
Metrics Action Accuracy, Attribute Accuracy, Action Perplexity
Sub-Task #2 Multimodal Dialog Response Generation & Retrieval
Goal To generate Assistant responses or retrieve from a candidate pool
Input Current user utterance, Dialog context, Multimodal context, (Ground-truth API Calls)
Output Assistant response utterance
Metrics Generation: BLEU-4, Retrieval: MRR, R@1, R@5, R@10, Mean Rank
Sub-Task #3 Multimodal Dialog State Tracking (MM-DST)
Goal To track user belief states across multiple turns
Input Current user utterance, Dialogue context, Multimodal context
Output Belief state for current user utterance
Metrics Slot F1, Intent F1

Please check the task input file for a full description of inputs for each subtask.

Evaluation

For the DSTC9 SIMMC Track, we will do a two phase evaluation as follows.

Challenge Period 1: Participants will evaluate the model performance on the provided devtest set. At the end of Challenge Period 1 (Sept 28), we ask participants to submit their model prediction results and a link to their code repository.

Challenge Period 2: A test-std set will be released on Sept 28 for the participants who submitted the results for the Challenge Period 1. We ask participants to submit their model predictions on the test-std set by Oct 5. We will announce the final results and the winners on Oct 12.

Challenge Instructions

(1) Challenge Registration

  • Fill out this form to register at DSTC9. Check “Track 4: Visually Grounded Dialog Track” along with other tracks you are participating in.

(2) Download Datasets and Code

  • Irrespective of participation in the challenge, we'd like to encourge those interested in this dataset to complete this optional survey. This will also help us communicate any future updates on the codebase, the datasets, and the challenge track.

  • Git clone our repository to download the datasets and the code. You may use the provided baselines as a starting point to develop your models.

$ git lfs install
$ git clone https://github.com/facebookresearch/simmc.git

(3) Reporting Results for Challenge Phase 1

  • Submit your model prediction results on the devtest set, following the submission instructions.
  • We will release the test-std set (with ground-truth labels hidden) on Sept 28.

(4) Reporting Results for Challenge Phase 2

  • Submit your model prediction results on the test-std set, following the submission instructions.
  • We will evaluate the participants’ model predictions using the same evaluation script for Phase 1, and announce the results.

Contact

Questions related to SIMMC Track, Data, and Baselines

Please contact [email protected], or leave comments in the Github repository.

DSTC Mailing List

If you want to get the latest updates about DSTC9, join the DSTC mailing list.

Citations

If you want to publish experimental results with our datasets or use the baseline models, please cite the following articles:

@article{moon2020situated,
  title={Situated and Interactive Multimodal Conversations},
  author={Moon, Seungwhan and Kottur, Satwik and Crook, Paul A and De, Ankita and Poddar, Shivani and Levin, Theodore and Whitney, David and Difranco, Daniel and Beirami, Ahmad and Cho, Eunjoon and Subba, Rajen and Geramifard, Alborz},
  journal={arXiv preprint arXiv:2006.01460},
  year={2020}
}

@article{crook2019simmc,
  title={SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform},
  author={Crook, Paul A and Poddar, Shivani and De, Ankita and Shafi, Semir and Whitney, David and Geramifard, Alborz and Subba, Rajen},
  journal={arXiv preprint arXiv:1911.02690},
  year={2019}
}

NOTE: The paper above describes in detail the datasets, the NLU/NLG/Coref annotations, and some of the baselines we provide in this challenge. The paper reports the results from an earlier version of the dataset and with different train-dev-test splits, hence the baseline performances on the challenge resources will be slightly different.

License

SIMMC is released under CC-BY-NC-SA-4.0, see LICENSE for details.

simmc's People

Contributors

beirami avatar bigfootjon avatar cccntu avatar igorsugak avatar kauterry avatar r-barnes avatar satwikkottur avatar seo-95 avatar shanemoon avatar skiingpacman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simmc's Issues

bug in run scripts/preprocess_simmc.sh

When I run scripts/preprocess_simmc.sh,I had some issue about file missing
First I print sh scripts/preprocess_simmc.sh,then

Traceback (most recent call last):
File "tools/extract_actions.py", line 1210, in
app.run(main)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tools/extract_actions.py", line 1198, in main
furniture_db = data_support.FurnitureDatabase(FLAGS.metadata_path)
File "/home/chenrj/simmc/mm_action_prediction/tools/data_support.py", line 169, in init
row.append(row[headers.index('obj')].split('/')[-1].split('.zip')[0])
ValueError: 'obj' is not in list
Reading: ../data/simmc_furniture/furniture_train_dials.json
Traceback (most recent call last):
File "tools/extract_vocabulary.py", line 72, in
main(parsed_args)
File "tools/extract_vocabulary.py", line 17, in main
train_data = json.load(file_id)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Reading: ../data/simmc_furniture/furniture_metadata.csv
Traceback (most recent call last):
File "tools/embed_furniture_assets.py", line 94, in
main(parsed_args)
File "tools/embed_furniture_assets.py", line 23, in main
assets = data_support.read_furniture_metadata(args["input_csv_file"])
File "/home/chenrj/simmc/mm_action_prediction/tools/data_support.py", line 132, in read_furniture_metadata
new_asset["id"] = int(new_asset["obj"].split("/")[-1].split(".")[0])
KeyError: 'obj'
Reading: ../data/simmc_furniture/furniture_train_dials.json
Traceback (most recent call last):
File "tools/build_multimodal_inputs.py", line 370, in
app.run(main)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tools/build_multimodal_inputs.py", line 364, in main
mm_inputs_split = build_multimodal_inputs(input_json_file)
File "tools/build_multimodal_inputs.py", line 58, in build_multimodal_inputs
data = json.load(file_id)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Reading: ../data/simmc_furniture/furniture_dev_dials.json
Traceback (most recent call last):
File "tools/build_multimodal_inputs.py", line 370, in
app.run(main)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tools/build_multimodal_inputs.py", line 364, in main
mm_inputs_split = build_multimodal_inputs(input_json_file)
File "tools/build_multimodal_inputs.py", line 58, in build_multimodal_inputs
data = json.load(file_id)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Reading: ../data/simmc_furniture/furniture_devtest_dials.json
Traceback (most recent call last):
File "tools/build_multimodal_inputs.py", line 370, in
app.run(main)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tools/build_multimodal_inputs.py", line 364, in main
mm_inputs_split = build_multimodal_inputs(input_json_file)
File "tools/build_multimodal_inputs.py", line 58, in build_multimodal_inputs
data = json.load(file_id)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
File "tools/extract_attribute_vocabulary.py", line 149, in
extract_action_attributes(parsed_args)
File "tools/extract_attribute_vocabulary.py", line 41, in extract_action_attributes
data = np.load(args["train_npy_path"], allow_pickle=True)[()]
File "/home/chenrj/anaconda3/envs/mul/lib/python3.6/site-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '../data/simmc_furniture/furniture_train_dials_mm_inputs.npy'

Question about the new evaluation method for Task 1&2

Hi. I've noticed that attribute accuracy for action prediction is very low for fashion baseline model
I know that there is a new parameter, single_round_eval added to the updated evaluation script for task 1 and 2(mm_action_prediction).
if single_round_eval and round_id != num_gt_rounds - 1: continue
And when single_round_eval is True, it only evaluates the last round for each dialog.
but most last round of every dialog's API is "None" or "AddToCart", which does not have any attributes
so it leaves supervision to be None most of the time.
`supervision = gt_datum["action_supervision"]

        if supervision is not None and "args" in supervision:
            supervision = supervision["args"]
        if supervision is None:
            skipped += 1
            continue`

I've counted the number of times the evaluation skips because supervision is None and it was 973 times for fashion domain for dev test. Hence, for fashion domain, only 982-973 rounds are being evaluated. I believe this is the reason why attribute_accuracy is very low for the updated evaluation script. I want to check if this is how it is supposed to be or it needs to be fixed.

Bug in mm_dst baseline

slot_regex = re.compile(r'([A-Za-z0-9_.-:]*) *= ([^,]*)')

Here .-: is a range of characters, does not include -.

>>> slot_regex = re.compile(r'([A-Za-z0-9_.-:]*)  *= ([^,]*)')
>>> slot_regex.findall('furniture-O = OBJECT_0')
[('O', 'OBJECT_0')]

Here is a fix.

>>> slot_regex = re.compile(r'([A-Za-z0-9_.:-]*)  *= ([^,]*)')
>>> slot_regex.findall('furniture-O = OBJECT_0')
[('furniture-O', 'OBJECT_0')]

This seemed fine when both target and prediction were parsed using this function, but will cause issue if someone use this function to format output into json and evaluate using the new evaluation script.

scripts/preprocess_simmc.sh

First of all, thank you for opening these datasets and baselines.
But when I run scripts/preprocess_simmc.sh, I got some issues about import modules

`
Reading: ../data/simmc_furniture/furniture_train_dials.json
Saving: ../data/simmc_furniture/furniture_train_dials_api_calls.json
Reading: ../data/simmc_furniture/furniture_dev_dials.json
Saving: ../data/simmc_furniture/furniture_dev_dials_api_calls.json
Reading: ../data/simmc_furniture/furniture_devtest_dials.json
Saving: ../data/simmc_furniture/furniture_devtest_dials_api_calls.json
Reading: ../data/simmc_furniture/furniture_train_dials.json

Identified 2473 words..
Saving dictionary: ../data/simmc_furniture/furniture_vocabulary.json

Traceback (most recent call last):
File "tools/embed_furniture_assets.py", line 12, in
from tools import data_support
ModuleNotFoundError: No module named 'tools'
Traceback (most recent call last):
File "tools/build_multimodal_inputs.py", line 18, in
from tools import support
ModuleNotFoundError: No module named 'tools'
Traceback (most recent call last):
File "tools/build_multimodal_inputs.py", line 18, in
from tools import support
ModuleNotFoundError: No module named 'tools'
Traceback (most recent call last):
File "tools/build_multimodal_inputs.py", line 18, in
from tools import support
ModuleNotFoundError: No module named 'tools'
Traceback (most recent call last):
File "tools/extract_attribute_vocabulary.py", line 149, in
extract_action_attributes(parsed_args)
File "tools/extract_attribute_vocabulary.py", line 41, in extract_action_attributes
data = np.load(args["train_npy_path"], allow_pickle=True)[()]
FileNotFoundError: [Errno 2] No such file or directory: '../data/simmc_furniture/furniture_train_dials_mm_inputs.npy'`

scripts/train_simmc_model.sh

Hi!
I tried to run scripts/train_simmc_model.sh then got some errors

(simmc) telele77:~/simmc/mm_action_prediction$ scripts/train_simmc_model.sh
Traceback (most recent call last):
File "eval_simmc_agent.py", line 15, in
import models
File "/simmc/mm_action_prediction/models/init.py", line 4, in
from .assistant import Assistant
File "/simmc/mm_action_prediction/models/assistant.py", line 12, in
import models.encoders as encoders
File "/simmc/mm_action_prediction/models/encoders/init.py", line 21, in
from .history_agnostic import HistoryAgnosticEncoder
File "/simmc/mm_action_prediction/models/encoders/history_agnostic.py", line 15, in
import models.encoders as encoders
AttributeError: module 'models' has no attribute 'encoders'

I think "models" is just a directory, How can I import it right away?

Missing dialogue_task_id in training

By analyzing the file fashion_train_dials.json I have found some missing dialogue_task_id keys in the following dialogues:

id: 3406 ; is dialogue_task_id missing: True
id: 3969 ; is dialogue_task_id missing: True
id: 4847 ; is dialogue_task_id missing: True
id: 321 ; is dialogue_task_id missing: True
id: 3455 ; is dialogue_task_id missing: True
id: 3414 ; is dialogue_task_id missing: True

Incorrect evaluation script provided for MM-DST baseline

There is a bug located in the parse_flattened_result function in the "gpt2_dst/utils/convert.py" file. Please look at the following code:

def parse_flattened_result(to_parse):
    ....
    d = {}
    for dialog_act in dialog_act_regex.finditer(to_parse):
        d['act'] = dialog_act.group(1)
        d['slots'] = []
        ....

        if d != {}:
            belief.append(d)  # Not re-initialized during the for-loop.

The belief variable is appending the reference of dictionary variable d rather than the copy of variable d which would cause the variable belief always adding the same action and slots. The fix is to put the d={} inside for-loop. It would impact the baseline performance for SubTask 3. (The actual performance would be lower after fixing this script)

Arguments values for Action Prediction task

Hi!
I used the mm_action_prediction/tools/extract_actions_fashion.py and I have found the following values that API arguments can assume:

{'DA:INFORM:GET:SWEATER', 'hemStyle', 'warmthRating', 'size', 'ageRange', 'info', 'material', 'clothingCategory', 'brand', 'DA:ASK:GET:JACKET', 'skirtLength', 'dressStyle', 'waistStyle', 'waterResistance', 'DA:ASK:CHECK:SWEATER', 'color', 'customerRating', 'DA:INFORM:GET:CLOTHING', 'hasPart', 'sequential', 'forGender', 'price', 'availableSizes', 'DA:INFORM:GET:JACKET', 'DA:ASK:GET:CLOTHING', 'skirtStyle', 'sleeveLength', 'pattern', 'sleeveStyle', 'DA:ASK:GET:SKIRT', 'DA:ASK:GET:SWEATER', 'necklineStyle', 'jacketStyle', 'soldBy', 'embellishment', 'madeIn', 'sweaterStyle', 'clothingStyle', 'hemLength', 'DA:ASK:CHECK:CLOTHING', 'DA:ASK:GET:DRESS', 'DA:INFORM:GET:SKIRT', 'amountInStock', 'forOccasion', 'DA:INFORM:GET:DRESS'}

There are two questions I want to ask you:

  1. Is it ok to have DA:INFORM:GET:SWEATER (and all the others DIALOGUE_ACT:ACTIVITY:OBJECT) as argument value for an API or there is an error during the generation?

  2. Do we have a list of possible values that the arguments can assume for each API? In Table 4 of the paper we have a list of possible values terminating with "etc.".

Thank you

Baselines results for API call prediction

Hi,
I was checking the table showing the results of baselines over subtask#1 (mm_action_prediction/README.md) and I have noticed that both tables report the results for furniture dataset only. I think there is an error and one of the tables should refer to fashion dataset instead.

How can I get images of fashion items?

I have successfully downloaded the screenshots of simmc furniture but I cannot find any clues about fashion items. Could you please tell me how can I get fashion images? Thank you very much!

Baselines results

Baselines result in mm_action_prediction/README.md does not match result in the paper. Which are the official baselines result we should refer to?

Question about submission models

Hi,
For the final submission to DSTC9 Track4, are we allowed to submit multiple models(for ensemble) for each subtask?
Thanks.

Question about mm_action_prediction/scripts/train_simmc_model.sh

When I run train_simmc_model.sh and I found some issues. After finish training and begin to evaluate, then some error happen.
Traceback (most recent call last):
File "eval_simmc_agent.py", line 199, in
main(args)
File "eval_simmc_agent.py", line 24, in main
checkpoint = torch.load(args["checkpoint"], map_location=torch.device("cpu"))
File "/home/chenrj/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/chenrj/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/chenrj/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/serialization.py", line 215, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/hae/epoch_20.tar'

I found that the result is saved in mm_action_prediction/checkpoints ,and the last file is epoch_100.tar.
And the file scripts/train_simmc_model.sh seemed to ignore the definition of the variable CHECKPOINT_ROOT so lead to the error

#Evaluate a trained model checkpoint.
CHECKPOINT_PATH="${CHECKPOINT_ROOT}/hae/epoch_20.tar"
python -u eval_simmc_agent.py
--eval_data_path=${DEVTEST_JSON_FILE/.json/_mm_inputs.npy}
--checkpoint="$CHECKPOINT_PATH" --gpu_id=${GPU_ID} --batch_size=50
--domain="$DOMAIN"

Should I change the path to 'checkpoints/epoch_100.tar' or to 'checkpoints/epoch_20.tar'?

SubTask #3 evaluation lower case issue

Hi,

I just found the pre-processed data in the fashion and furniture dataset used in the baseline have different casing format regarding to words like "that" and "That":

image

Is this really meaningful for the evaluation method to differentiate the two? Current evaluation method doesn't do the normalization between the two. Could you add lower case method before comparing the slot values? Thanks.

Possible bugs in evaluation script in SubTask #1

Hi, I think there is a bug in the evaluation script in the SubTask #1 located at evaluate_action_prediction function in the action_evaluation.py. Please have a look at the following code:

def evaluate_action_prediction(gt_actions, model_actions):
    ....
    # Case 1: Action mismatch -- record False for all attributes.
    if not action_match:
         for _ in supervision.keys():
              matches["attributes"].append(False)
     # Case 2: Action matches -- use model predictions for attributes.
     else:
         for key in supervision.keys():
             if key in IGNORE_ATTRIBUTES:
                 continue
             gt_key_vals = supervision[key]
             model_key_vals = round_datum["attributes"][key]
     .....

When evaluate in the furniture dataset, if action mismatch, it will loop all the gold attribute keys including the ignored attributes and append False in matches["attributes"]. Since some attribute keys are already ignored when you found a match in the action, so I think you continue the case when condition key in IGNORE_ATTRIBUTES is satisfied during the loop when action mismatch.

Another thing I want to point out is: is the accuracy of attribute in SubTask#1 measured by accuracy per attribute rather than per conversation really meaningful? For example, there are two conversations, one conversation has 1 action and 7 attributes, one conversation has 1 action and 1 attribute, assume the model predicts the action and one of attribute in the first conversation correctly and the action and attribute in the second conversation correctly, in the current evaluation method, the accuracy of attributes should be: 2 / 8 = 0.25. However based on the per conversation evaluation, it should be ((1/7) + 1) / 2 = 0.57 which makes more sense, because the current method favors conversations with more attributes. May I ask would you add the second evaluation metrics which are based on the per conversation level? Thanks.

Actions do not always match the dialog

I was checking the validity of the actions generated with mm_action_prediction/tools/extract_actions_fashion.py and I have found that the API extracted does not always match the dialogue. Sometimes there are more actions than turns (and vice-versa) for a dialogue.

Dialogues {321, 3969, 3406, 4847, 3414} have a different number of turns and actions. For instance dialog 3969 has 6 turns but 2 actions only. It seems to me that dialog 3969 in fashion_train_dials.json is badly annotated (everything is inside the belief_state field).

action_evaluation expected file format

Hi,
I am trying to evaluate my model results on fashion dataset by using the script mm_action_prediction/tools/action_evaluation.py, but I have not understood the file format reported in mm_action_prediction/README.md.

  1. <action_token>: <action_log__prob>, with action_token you mean the name of the action?

  2. <attribute_label>: <attribute_val>, I have a vague idea of what attribute_label is by looking at the list IGNORE_ATTRIBUTES but it is not clear what to insert as attribute_val. For each dialogue turn, I predict as multilabel a series of arguments but I have not understood what you mean with label and arguments. Can you please better explain with an example what is a label versus a value for an attribute in the fashion dataset?

  3. I have filled the dict in this way for a particular turn of a dialogue.

{'action': 'SearchDatabase', 'action_log_prob': {'None': -2.091989517211914, 'SearchDatabase': -0.17580336332321167, 'SearchMemory': -3.525150775909424, 'SpecifyInfo': -4.88762903213501, 'AddToCart': -7.144548416137695}, 'attributes': {}}

Anyway, I have issues with the script because attributes field is empty and the script does round_datum["attributes"][key].
How to fill the output json in the case no arguments have been found for a particular dialogue turn?

Question about retrieval evaluation

Hi, I have some questions about retrieval evaluation of the response generation.
I'm trying to write the retrieval evaluation scripts for mm_dst to evaluate reponse text, hope to clarify some details.
I can understand this part of the code.

def evaluate_response_retrieval(gt_responses, model_scores):
    """Evaluates response retrieval using the raw data and model predictions.
    """
    # NOTE: Update this later to include gt_index for candidates.
    gt_ranks = []
    for model_datum in model_scores:
        for _round_id, round_datum in enumerate(model_datum["candidate_scores"]):
            gt_score = round_datum[0]
            gt_ranks.append(np.sum(np.array(round_datum) > gt_score) + 1)
            # Best: all other < gt, like -40 < -20
    gt_ranks = np.array(gt_ranks)
    return {
        "r1": np.mean(gt_ranks <= 1),
        "r5": np.mean(gt_ranks <= 5),
        "r10": np.mean(gt_ranks <= 10),
        "mean": np.mean(gt_ranks),
        "mrr": np.mean(1 / gt_ranks)
    }

The question is about how each score( round_datum[i], i in [ 0,len(round_datum) ) ) is calculated.
As I understood, there are 100 candidate for each turn in a dialogue.

The first problem:

  • Is one candidate's(one candidate in one turn in one dialogue) score calculated by using it as both input and target to fed in the model and calculated by the cross_entropy_loss?
  • Or use a candidate as the input but the target is the ground truth?

The second problem:

  • When a candidate sentence is fed into the model to generate, I suppose it is fed into the model like using teacher forcing(Without using the model output word as the next input)?

Question about Fashion attributes

Hi
I have some questions about the attribute value of fashion dataset.
I was using the "fashion_devtest_dials_api_calls.json" made by the preprocessing scripts of the mm_action_prediciton
When I look into it, I found some weird attribute values.

In the attribute vocabulary and action_evaluation.py, I've found that there exist 6 available attributes and 8 ignored attributes.
However, in the "fashion_devtest_dials_api_calls.json" file, I have found 27 kinds of attribute values.
available attributes:
["availableSizes", "brand", "color", "customerRating", "info", "other", "price"]
ignored attributes:
["minPrice", "maxPrice","furniture_id", "material", "decorStyle", "intendedRoom","raw_matches","focus"]
Founded attributes:
['soldBy', 'pattern', 'clothingStyle', 'waistStyle', 'price', 'forOccasion', 'skirtLength', 'clothingCategory', 'customerRating', 'necklineStyle', 'color', 'embellishment', 'size', 'jacketStyle', 'material', 'info', 'ageRange', 'sleeveLength', 'hemLength', 'sweaterStyle', 'dressStyle', 'amountInStock', 'availableSizes', 'sleeveStyle', 'skirtStyle', 'hemStyle', 'brand']
What are those founded attributes which are neither available attributes nor ignored attributes?
Should we just ignore them and delete them when training?
And will these attributes be ignored by the action_evaluation.py script?

Are we allowed to use "turn_label" fields for subtasks 1-2 ?

In the first turn of dialogue 4146 in fashion-dev dataset the user asks to compare the price of the current object (present in visual_objects) with the price of the previously seen object. The only 2 annotations about the existence of a previous object are present in "state_graph_2", which is not allowed as input, and in the "objects" subfield of "turn_label".
Are we allowed to use "turn_label" as input for action_prediction and response_generation?

Question about test-std files

Hi,
I am curious about the test-std files will be released on Sept 28.
Will you provide retrieval-candidate files and api call files?
Or just only provide one JSON file. ( {domain}_devtest_dials_teststd_format_public.json)

Since we want to use the action information which is allowed at inference time mentioned in TASK_INPUTS.md. However, there is no annotated information in {furniture/fashion}_devtest_dials_teststd_format_public.json. How can we get the ground truth action?

We are using the preprocessing script in mm_action_predicition, but it is not able to process {furniture/fashion}_devtest_dials_teststd_format_public.json.
Can you provide full test files before Sept. 28 that we can directly run on the baseline model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.