Comments (9)
Hi, we're curious about the test-std file format.
Are there gonna be retrieval_candidates files and files contain API information for test-std?
Since the team model entry deadline is getting closer, without the detailed format of the test-std files, we are afraid that our code may not be able to directly run on the test-std files( Which is able to directly run on the devtest set).
Are we allowed to modify preprocess code after Sep.28 to run test-std ?(Only change some details to be able to successfully run it, without changing the models).
from simmc.
Hello @billkunghappy,
Thanks for raising this concern.
Regarding the comment, your understanding is correct.
(a) In the API calls file, the last round on which evaluation is to be performed will be excluded.
(b) For retrieval candidates, the last round will contain the retrieval candidates but will not contain the gt_index
field that gives the index of the ground truth response.
With respect to the files, I just realized that the suffixes public
and private
got switched for API calls. While I will fix these later, please take a look at the wrongly named furniture_devtest_dials_api_calls_teststd_format_private.json
(should be public) and corresponding file for fashion
.
Hope this helps!
from simmc.
Hello @seo-95 ,
Thanks for raising these two important concerns. We've updated the API calls file to include these two images for the last turn on which evaluation is performed. Of course, you're not allowed to use the ground truth API calls for subtask-1 but can use these for subtask-2 (as per the table).
Apologies for the confusion earlier, hope this addresses your concerns.
from simmc.
from simmc.
Same issue here. Ground truth action is missing from the test-std
draft file. Additionally, I can't even find the item focus id for each turn, only partial information of the item is reported inside visual_objects
and sometimes they are not sufficient.
{
"domain": "fashion",
"visual_objects": {
"OBJECT_2": {
"hemLength": [
"mini",
"knee_length"
],
"pattern": [
"chevron",
"animal"
],
"pos": "focus",
"skirtStyle": [
"peplum",
"a_line",
"body_con",
"loose",
"fit_and_flare"
],
"embellishments": [
"pleated"
],
"type": "skirt"
}
},
"system_transcript": "Here is the skirt from Pedals & Gears. It retails for $124 and is rated at 3.96.",
"turn_idx": 1,
"belief_state": {},
"transcript": "sure"
}
from simmc.
Hi all, sorry that this info was not included, we will look into this and release a new file soon.
Please do note though that we are providing {domain}_devtest_dials_teststd_format_public.json just as a guide before we release the future test-std set, and the results for Phase 1 should be reported on the {domain}_devtest_dials.json, released earlier.
from simmc.
Hello all, sorry for not including this information before.
For the test-std
split, we will also release the corresponding API calls and retrieval candidates (public versions, excludes the last round on which evaluation is done) in the format of corresponding existing files.
In order to check for compatibility, we will now release devtest
API calls and retrieval candidates in this format.
from simmc.
Hello, there are still 2 questions about the submission.
First, for the Challenge Phase1, we should submit our devtest prediction files. In the Readme of the simmc Repo, it mentioned us to follow the instructions in the Submission instruction.
However there is no instruction about the submission of the devtest in Submission instruction. Should we email you the results of devtest? If so, which email address and what's the submission format?
Secondly, In the previous comment, @satwikkottur has said
For the test-std split, we will also release the corresponding API calls and retrieval candidates (public versions, excludes the last round on which evaluation is done) in the format of corresponding existing files.
What does the excludes the last round on which evaluation is done means?
I thought it means to exclude the last round API in each dialogue in the api calls file(exclude the last round of retrieval candidates is kind of weird, since we need those candidates to predict retrieval candidates scores)
But when I check into fashion_devtest_dials_api_calls_teststd_format_public.json and furniture_devtest_dials_api_calls_teststd_format_public.json, both files do include the last round API information correspond to the fashion_devtest_dials_teststd_format_public.json and furniture_devtest_dials_teststd_format_public.json
from simmc.
Hi, I have 2 questions raised from the test-std
dataset release regarding the response_generation
task.
-
Since the
action
andattributes
annotations are not available for the current turn (differently from what was defined in TASK_INPUTS.md), are we able to slightly modify the code (not the model) to avoid using this information? -
Whenever we encounter a potential
SearchMemory
orSearchDatabase
in thek-th
turn (the one on which the generation and the action prediction are evaluated) we do not have the annotation about the newfocus
item (during the first phase of the challenge it was included in thedials_api
JSON file). Since the response of the wizard is conditioned on the item she/he is looking at, how can we generate such a response if we do not have information about the item?
An example here below (dialogue1902
):
{
"domain": "fashion",
"visual_objects": {},
"system_transcript": "",
"turn_idx": 2,
"belief_state": {},
"transcript": "Show me another coat, but one that 212 Localts more."
}
from simmc.
Related Issues (20)
- Incorrect evaluation script provided for MM-DST baseline HOT 1
- Incorrect Hyperparameters ? HOT 8
- Baselines results for API call prediction HOT 1
- action_evaluation expected file format HOT 4
- Question about retrieval evaluation HOT 3
- Baselines results HOT 1
- Bug in baseline? (missing sigmoid) HOT 1
- Possible bugs in evaluation script in SubTask #1 HOT 2
- Are we allowed to use "turn_label" fields for subtasks 1-2 ? HOT 8
- Question about Fashion attributes HOT 1
- SubTask #3 evaluation lower case issue HOT 1
- Bug in mm_dst baseline HOT 1
- Question about submission models HOT 2
- Question about the new evaluation method for Task 1&2 HOT 1
- KeyError caused by ~teststd_dials_retrieval_candidates_public.json HOT 1
- How to get images HOT 2
- bug in run scripts/preprocess_simmc.sh HOT 3
- Question about mm_action_prediction/scripts/train_simmc_model.sh HOT 1
- How can I get images of fashion items?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simmc.