bigscience-workshop / promptsource Goto Github PK

View Code? Open in Web Editor NEW

2.6K 31.0 342.0 5.96 MB

Toolkit for creating, sharing and using natural language prompts.

License: Apache License 2.0

Python 99.46% Makefile 0.54%

natural-language-processing nlp machine-learning

promptsource's Introduction

PromptSource

PromptSource is a toolkit for creating, sharing and using natural language prompts.

Recent work has shown that large language models exhibit the ability to perform reasonable zero-shot generalization to new tasks. For instance, GPT-3 demonstrated that large language models have strong zero- and few-shot abilities. FLAN and T0 then demonstrated that pre-trained language models fine-tuned in a massively multitask fashion yield even stronger zero-shot performance. A common denominator in these works is the use of prompts which has gained interest among NLP researchers and engineers. This emphasizes the need for new tools to create, share and use natural language prompts.

Prompts are functions that map an example from a dataset to a natural language input and target output. PromptSource contains a growing collection of prompts (which we call P3: Public Pool of Prompts). As of January 20, 2022, there are ~2'000 English prompts for 170+ English datasets in P3.

PromptSource provides the tools to create, and share natural language prompts (see How to create prompts, and then use the thousands of existing and newly created prompts through a simple API (see How to use prompts). Prompts are saved in standalone structured files and are written in a simple templating language called Jinja. An example of prompt available in PromptSource for SNLI is:

{{premise}}

Question: Does this imply that "{{hypothesis}}"? Yes, no, or maybe? ||| {{answer_choices[label]}}

You can browse through existing prompts on the hosted version of PromptSource.

Setup

If you do not intend to create new prompts, you can simply run:

pip install promptsource

Otherwise, you need to install the repo locally:

Download the repo
Navigate to the root directory of the repo
Run pip install -e . to install the promptsource module

Note: for stability reasons, you will currently need a Python 3.7 environment to run the last step. However, if you only intend to use the prompts, and not create new prompts through the interface, you can remove this constraint in the setup.py and install the package locally.

How to use prompts

You can apply prompts to examples from datasets of the Hugging Face Datasets library.

# Load an example from the datasets ag_news
>>> from datasets import load_dataset
>>> dataset = load_dataset("ag_news", split="train")
>>> example = dataset[1]

# Load prompts for this dataset
>>> from promptsource.templates import DatasetTemplates
>>> ag_news_prompts = DatasetTemplates('ag_news')

# Print all the prompts available for this dataset. The keys of the dict are the UUIDs the uniquely identify each of the prompt, and the values are instances of `Template` which wraps prompts
>>> print(ag_news_prompts.templates)
{'24e44a81-a18a-42dd-a71c-5b31b2d2cb39': <promptsource.templates.Template object at 0x7fa7aeb20350>, '8fdc1056-1029-41a1-9c67-354fc2b8ceaf': <promptsource.templates.Template object at 0x7fa7aeb17c10>, '918267e0-af68-4117-892d-2dbe66a58ce9': <promptsource.templates.Template object at 0x7fa7ac7a2310>, '9345df33-4f23-4944-a33c-eef94e626862': <promptsource.templates.Template object at 0x7fa7ac7a2050>, '98534347-fff7-4c39-a795-4e69a44791f7': <promptsource.templates.Template object at 0x7fa7ac7a1310>, 'b401b0ee-6ffe-4a91-8e15-77ee073cd858': <promptsource.templates.Template object at 0x7fa7ac7a12d0>, 'cb355f33-7e8c-4455-a72b-48d315bd4f60': <promptsource.templates.Template object at 0x7fa7ac7a1110>}

# Select a prompt by its name
>>> prompt = ag_news_prompts["classify_question_first"]

# Apply the prompt to the example
>>> result = prompt.apply(example)
>>> print("INPUT: ", result[0])
INPUT:  What label best describes this news article?
Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\which has a reputation for making well-timed and occasionally\controversial plays in the defense industry, has quietly placed\its bets on another part of the market.
>>> print("TARGET: ", result[1])
TARGET:  Business

In the case that you are looking for the prompts available for a particular subset of a dataset, you should use the following syntax:

dataset_name, subset_name = "super_glue", "rte"

dataset = load_dataset(f"{dataset_name}/{subset_name}", split="train")
example = dataset[0]

prompts = DatasetTemplates(f"{dataset_name}/{subset_name}")

You can also collect all the available prompts for their associated datasets:

>>> from promptsource.templates import TemplateCollection

# Get all the prompts available in PromptSource
>>> collection = TemplateCollection()

# Print a dict where the key is the pair (dataset name, subset name)
# and the value is an instance of DatasetTemplates
>>> print(collection.datasets_templates)
{('poem_sentiment', None): <promptsource.templates.DatasetTemplates object at 0x7fa7ac7939d0>, ('common_gen', None): <promptsource.templates.DatasetTemplates object at 0x7fa7ac795410>, ('anli', None): <promptsource.templates.DatasetTemplates object at 0x7fa7ac794590>, ('cc_news', None): <promptsource.templates.DatasetTemplates object at 0x7fa7ac798a90>, ('craigslist_bargains', None): <promptsource.templates.DatasetTemplates object at 0x7fa7ac7a2c10>,...}

You can learn more about PromptSource's API to store, manipulate and use prompts in the documentation.

How to create prompts

PromptSource provides a Web-based GUI that enables developers to write prompts in a templating language and immediately view their outputs on different examples.

There are 3 modes in the app:

Sourcing: create and write new prompts
Prompted dataset viewer: check the prompts you wrote (or the existing ones) on the entire dataset
Helicopter view: aggregate high-level metrics on the current state of P3

To launch the app locally, please first make sure you have followed the steps in Setup, and from the root directory of the repo, run:

streamlit run promptsource/app.py

You can also browse through existing prompts on the hosted version of PromptSource. Note the hosted version disables the Sourcing mode (streamlit run promptsource/app.py -- --read-only).

Writing prompts

Before creating new prompts, you should read the contribution guidelines which give an step-by-step description of how to contribute to the collection of prompts.

Datasets that require manual downloads

Some datasets are not handled automatically by datasets and require users to download the dataset manually (story_cloze for instance ).

To handle those datasets as well, we require users to download the dataset and put it in ~/.cache/promptsource. This is the root directory containing all manually downloaded datasets.

You can override this default path using PROMPTSOURCE_MANUAL_DATASET_DIR environment variable. This should point to the root directory.

Development structure

PromptSource and P3 were originally developed as part of the BigScience project for open research 🌸, a year-long initiative targeting the study of large models and datasets. The goal of the project is to research language models in a public environment outside large technology companies. The project has 600 researchers from 50 countries and more than 250 institutions.

In particular, PromptSource and P3 were the first steps for the paper Multitask Prompted Training Enables Zero-Shot Task Generalization.

You will find the official repository to reproduce the results of the paper here: https://github.com/bigscience-workshop/t-zero. We also released T0* (pronounce "T Zero"), a series of models trained on P3 and presented in the paper. Checkpoints are available here.

Known Issues

Warning or Error about Darwin on OS X: Try downgrading PyArrow to 3.0.0.

ConnectionRefusedError: [Errno 61] Connection refused: Happens occasionally. Try restarting the app.

Citation

If you find P3 or PromptSource useful, please cite the following reference:

@misc{bach2022promptsource,
      title={PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts},
      author={Stephen H. Bach and Victor Sanh and Zheng-Xin Yong and Albert Webson and Colin Raffel and Nihal V. Nayak and Abheesht Sharma and Taewoon Kim and M Saiful Bari and Thibault Fevry and Zaid Alyafeai and Manan Dey and Andrea Santilli and Zhiqing Sun and Srulik Ben-David and Canwen Xu and Gunjan Chhablani and Han Wang and Jason Alan Fries and Maged S. Al-shaibani and Shanya Sharma and Urmish Thakker and Khalid Almubarak and Xiangru Tang and Xiangru Tang and Mike Tian-Jian Jiang and Alexander M. Rush},
      year={2022},
      eprint={2202.01279},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

promptsource's People

Contributors

Stargazers

Watchers

Forkers

srush craffel victorsanh shanyas10 mmarius tianjianjiang arnaudstiegler urmish gchhablani elsanns ftarlaci awebson ibeltagy cccntu jetrunner arunraja-hub hannight zaidalyafeai josrzn om304 trishalaneeraj tae898 drugilsberg osanseviero rbawden patrick-s-h-lewis abheesht17 thomasw21 nohtow majauhar yongzx debajyotidatta teelinsan rteehas satvikipathak relaxedplan tevenlescao sincerass priyanksonis wojciechkusa nihalnayak tbers22 stellaathena bwang482 nafabrar salam1995 dragomirradev dav009 graphgrailai akamil-etsy georgebregman andreajparker dvsrepo techthiyanes c00renut helloscribe fundou shubhampachori12110095 moqingxinai sambroy avineshwar codeaudit sukiakiumo allensmile ai-natural-language-processing-lab edward-sun yizhongw ianderrington magedsaeed zeeroocooll sbmaruf harsh4799 bigdatasciencegroup dliofindia shakedbr srulikbd jason-fries khalidalt vazkir jaketae big-data-ai xinjicong infinite-joy tangxiangru philschmid miyoungko cmcmaster1 guitaricet du-jia tojarieh97 lksenel jonnodt hendrikstrobelt shuhua886 manikant92 dumpmemory marcus-arcadius babajideowoyele rayjue jcmc00

promptsource's Issues

Error in loading proto_qa dataset

When attempting to view the proto_qa dataset I get the following error

App to scroll through an entire prompted source

It's sometimes hard to cover all the cases especially if you are not aware of them.
For instance, I was prompting the DROP dataset, but didn't initially realize that:

the "spans" field always contain multiple entries in the validation split
there can also be multiple spans in the train split... which made it not straightforward how to combine them (especially if you don't know much about the dataset)

It would be easier to spot these potential introduced bugs if we have a viz app to scroll through the dataset/subset/split for a given prompt, and look at a bunch of prompted examples.
Let's start simple with a second separate app (and can ultimately aim at integrating it into the main tagging app in some way).
At minimum, we need:

dataset selectbox
subset selectbox
split selectbox
-prompt selectbox
something like a 2 columns table with on the left the "raw" instances, and on the right the "prompted" instances

I already have a similar code for that, I'll adapt and integrate it

fix #13

Originally posted by @arnaudstiegler in #17 (comment)

Streamlit error version 0.88.0

Version 0.88.0 causes the following error

File "/home/zaid/tmp/add_story_cloze_hf/datasets/env/lib/python3.8/site-packages/streamlit/script_runner.py", line 354, in _run_script
    exec(code, module.__dict__)
  File "/home/zaid/tmp/all_offline_story_cloze/promptsource/promptsource/promptsource.py", line 600, in <module>
    state.sync()
  File "/home/zaid/tmp/all_offline_story_cloze/promptsource/promptsource/session.py", line 68, in sync
    self._state["session"].request_rerun()
TypeError: request_rerun() missing 1 required positional argument: 'client_state'

Downgrading to version 0.82.0 seems to fix the error

Option to add configuration names manually

I was trying to add templates for babi_qa dataset: https://huggingface.co/datasets/babi_qa
This dataset has a conditional-text-generation task that involves multiple types of questions. It has over 20 configurations per set. And it has 8 sets. See data splits.

However, in the app, only the following configurations appear.

This happens because when the dataset was added, the dataset_infos.json was not generated for all configs to keep things light. However, one might want to add configurations that are not present in infos. This can happen for other datasets as well.

Hence, there should be an option to add a custom config name. I'll be adding this in a PR.

Interface doesn't load correctly

I'm running on windows 10 with python 3.9
the web interface used to worked until I tried cloning the repo again..
this is the error:

omptsource.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://10.0.0.101:8501

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 116, i
n spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 125, i
n _main
    prepare(preparation_data)
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 236, i
n prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 287, i
n _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "c:\users\user\appdata\local\programs\python\python39\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "c:\users\user\appdata\local\programs\python\python39\lib\runpy.py", line 97, in _run_module_cod
e
    _run_code(code, mod_globals, init_globals,
  File "c:\users\user\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\BigScience\promptsource\promptsource\promptsource
.py", line 34, in <module>
    state = _get_state()
  File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\BigScience\promptsource\promptsource\session.py",
 line 84, in _get_state
    session = _get_session()
  File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\BigScience\promptsource\promptsource\session.py",
 line 74, in _get_session
    session_id = get_report_ctx().session_id
AttributeError: 'NoneType' object has no attribute 'session_id'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 116, i
n spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 125, i
n _main
    prepare(preparation_data)
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 236, i
n prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "c:\users\user\appdata\local\programs\python\python39\lib\multiprocessing\spawn.py", line 287, i
n _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "c:\users\user\appdata\local\programs\python\python39\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "c:\users\user\appdata\local\programs\python\python39\lib\runpy.py", line 97, in _run_module_cod
e
    _run_code(code, mod_globals, init_globals,
  File "c:\users\user\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\BigScience\promptsource\promptsource\promptsource
.py", line 34, in <module>
    state = _get_state()
  File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\BigScience\promptsource\promptsource\session.py",
 line 84, in _get_state
    session = _get_session()
  File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\BigScience\promptsource\promptsource\session.py",
 line 74, in _get_session
    session_id = get_report_ctx().session_id
AttributeError: 'NoneType' object has no attribute 'session_id'

Markdown interpretation mess up examples visualisation in prompted dataset viewer mode

In prompted dataset viewer mode, if the template contains two _, it is interpreted as markdown syntax and begin writing the example in italic until the second _.

This comportement has been observed on winogrande_winogrande_xs_question_sentence_answer and winogrande_winogrande_xs_sentence_question_answer. A picture of the result is attached.

I don't know how promptsource will behave in case the example contains only one _, maybe it will write in italic until the end, maybe it won't interpret it as markdown syntax at all.
cc @VictorSanh

Writing: Final run for the gpt-3 and bigbench figures.

These scripts need to be run and finalized for all the tasks. Currently there are about 20% missing or not working.

Add some dummy checks to the jinja template

Add a suite of sanity checks on the jinja template. for instance,

are all variables called in the template actual keys in the example dictionary?
template content is a duplicated with another one

Please add ideas!

can duplicate these tests into the "common pitfalls" section of the contribution guide (#26)

[Epic] Data Augmentation Issue Overview and Discussions

@tianjianjiang (Mike) hopes that this issue ticket will act like an Epic issue that covers potential tasks of data augmentation.
Please kindly note that Mike is merely the first member who tries to work on data augmentation and by no means an authority of this topic. Feel free to participate in any possible way.

Issues (PRs)

#10 (#19)
#49

Request for Proposal

Although the section title is https://en.wikipedia.org/wiki/Request_for_proposal but this section isn't anywhere near a formal RFP yet. Please don't hesitate to edit or comment.

Purpose

Diversify our prompts and generalize our models.
(For example, see this comment: #52 (comment).)

Scopes

Prompt template
(Possibly in the future) perturbation or more complex approaches for training/fine-tuning, e.g., papers of prompt/prefix tuning.

Implementation Plans

Prompt template

Expected input/output in Jinja

Presumably world politics, sports, business, and science and technology must remain intact.

Input

{{text}} 
Is this a piece of news regarding world politics, sports, business, or science and technology? ||| 
{{ ["World politics", "Sports", "Business", "Science and technology"][label] }}

Output

{{text}} 
Is this news about world politics, sports, business, or science and technology? ||| 
{{ ["World politics", "Sports", "Business", "Science and technology"][label] }}

TBD

Input/output formats for a shared API
- Jinja compatibility; on-going discussions (Mike was hoping that we can use https://docs.github.com/en/discussions but no strong preference here):
  1. #49 (comment)
  2. #49 (comment)
  3. #49 (comment)
- Streamlit UI behavior
- Template PR flow
Different strategies (e.g, back-translation, summarization, https://github.com/makcedward/nlpaug, https://github.com/QData/TextAttack, etc.)
- Integration test

Example keys with dashes are inaccessible

Seems to be a quirk of the way we're using Jinja. See, e.g., TREC. A template like

{{text}}

Is this asking about a description, an entity, an abbreviation, a person, a quantity, or a location?
|||
{{ ["Description", "Entity", "Abbreviation", "Person", "Quantity", "Location"] [label-coarse] }}

doesn't render correctly because label-coarse is parsed as label. See, e.g., https://stackoverflow.com/questions/28652373/jinja2-in-variable-bug

Can probably just fix by replacing dashes with underscores in keys of the example dictionary when rendering. Might be good to change the displayed information in the sidebar on the fly to match.

Datasets which requires manual download failing to load with object has no attribute 'keys' error

I think we need to handle the datasets which require manual download (https://huggingface.co/docs/datasets/loading_datasets.html#manually-downloading-files). eg. msr_zhen_translation_parity, hippocorpus, msr_genomics_kbcomp etc.

Inconsistent output for code quality checks

See this for an example: bc0a8f1

make style says that it is reformatting promptsource/promptsource.py but there is no diff afterwards. Rerunning make style produces the same output and make quality fails, saying that it would be reformatted.

Jinja highlighter not working well for a dark template

Cancel button to cancel an incomplete template

While trying to create a template, if someone would want to Cancel it rather than Saving it, then having a Cancel button might be useful since deleting an incomplete entry leads to an error message (in Jinja).

🐛`UnboundLocalError: local variable 'd_to_b' referenced before assignment`

When loading a dataset (apparently, datasets with configs), this error pops up:

UnboundLocalError: local variable 'd_to_b' referenced before assignment
Traceback:
File "/home/hf/promptsource/.venv/lib/python3.7/site-packages/streamlit/script_runner.py", line 338, in _run_script
exec(code, module.dict)
File "/home/hf/promptsource/promptsource/promptsource.py", line 252, in
state.sync()
File "promptsource/session.py", line 70, in sync
self._state["hash"] = d_to_b

Writing: Broader Impacts Section

youtube_caption_corrections - downloading failed

this it the error messege:

NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=610252351, num_examples=532812, dataset_name='blog_authorship_corpus'), 'recorded': SplitInfo(name='train', num_bytes=614706451, num_examples=535568, dataset_name='blog_authorship_corpus')}, {'expected': SplitInfo(name='validation', num_bytes=37500394, num_examples=31277, dataset_name='blog_authorship_corpus'), 'recorded': SplitInfo(name='validation', num_bytes=32553710, num_examples=28521, dataset_name='blog_authorship_corpus')}] Traceback: File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\streamlit\script_runner.py", line 338, in _run_script exec(code, module.__dict__) File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\promptsource-main\promptsource\promptsource.py", line 181, in <module> dataset, failed = get_dataset(dataset_key, str(conf_option.name) if conf_option else None) File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\streamlit\caching.py", line 573, in wrapped_func return get_or_create_cached_value() File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\streamlit\caching.py", line 557, in get_or_create_cached_value return_value = func(*args, **kwargs) File "C:\Users\User\Desktop\Srulik\Work\HuggingFace\promptsource-main\promptsource\utils.py", line 48, in get_dataset builder_instance.download_and_prepare() File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\datasets\builder.py", line 574, in download_and_prepare self._download_and_prepare( File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\datasets\builder.py", line 662, in _download_and_prepare verify_splits(self.info.splits, split_dict) File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\datasets\utils\info_utils.py", line 74, in verify_splits raise NonMatchingSplitsSizesError(str(bad_splits))

I already prepared prompts so Ill try embed them into the yaml file.

Different Rendering in Sourcing Mode and Prompted Dataset Viewer Mode

I encounter a bug that arises when I switch between the two modes.

Expected Behavior: the second "no" text should not appear in the Prompted Dataset Viewer Mode.

Dataset: Mocha

Data Instance (the error appears for data coming from train/validation/test):

{
  "candidate": "He's a child and it's a very rare thing.",
  "candidate2": "",
  "constituent_dataset": "cosmosqa",
  "context": "Somewhere in me I knew it all along , there are all those moments when he stares into my eyes and his start to sparkle while this gorgeous grin spreads across his face . When he first started to do it I would ask him \" what ? What s funny ? \" he would always say nothing and attempt to divert his attention elsewhere .",
  "id": "002b5d9aa346d492b02705ae2c9f4abd",
  "metadata": {
    "scores": [
      1
    ],
    "source": "gpt2"
  },
  "question": "What's a possible reason the guy stares into the writer's eyes ?",
  "reference": "Because he likes her a lot .",
  "score": 1,
  "score2": -1
}

Code:

{% if score != 3 %}
Passage: {{ context }}

Question: {{ question }}

Answer: {{ reference }}

Is the answer "{{ candidate }}" similar to the answer above? Answer yes or no. 

{% if candidate2 and score2 != 3 %}
Is the answer "{{ candidate2 }}" similar to the answer above? Answer yes or no. 
{% endif %}
|||
{{ ["no", "yes"][score > 3] }} 

{{["no", "yes"][score2 > 3] if candidate2 != ""}}
{% endif %}

Linebreaks are rarely showing

Single linebreaks are not faithfully displayed on the streamlit interface (viewer mode or sourcing mode).
Example from glue/mrpc:

{{sentence1}}
{{sentence2}}

will not faithfully display the linebreak between sentence1 and sentence2

Surface datasets without prompts

It would be nice to visually indicate which datasets people should work on next, and which have a lot of prompts already.

Priority filter hides subsets that do not have minimum number of templates

For example, if adversarial_qa/adversarialQA has sufficient number, the other subsets are hidden.

Template checking fails for datasets that require additional imports

We are creating a dataset builder to get the features. While we can do this without downloading the data, datasets.load.prepare_module can try to import things that are not necessarily installed, causing the automated tests to crash.

Example: https://github.com/bigscience-workshop/promptsource/pull/63/checks?check_run_id=2747124281

Options:

Skip tests on such datasets
Add needed dependencies to requirements.txt as they arise.
Find a way to get the features without using prepare_module

Thoughts?

Write Template Writing Guide

There should be some guidelines for users. For now, we can assume they are BigScience participants, so the technical details of using the system are less important. It's more about what kind of information should be included in prompts, pitfalls to avoid, and other tips (i.e., how to map labels to strings in a way consistent with few-shot evaluation techniques).

Special eval metrics and scripts

Since we have the string outputs of all tasks, in principal we should be able to run arbitrary metrics, especially for datasets require fancy metrics.@lintangsutawika has imported the official eval scripts for ReCoRD, SQuAD v2, Natural Questions, TriviaQA, and DROP.

Update: Even when using Lintang's eval scripts, all extractive QAs and closed-book (generative) QAs still have abnormally low numbers, namely:

Also, I think all eval of extractive QA from the training mixture also failed.

(Note that ARC is closed-book, but its performance is fine because it's multiple-choice. A great point in case that machine task categories care more about format way more than human skill/knowledge.)

Others with issues to keep an eye on:

HellaSwag
Lambada
Winogrande

Adding Story Cloze which requires manual download

To allow manual download of a dataset, we can add a data_dir field in the builder. It will be helpful to know what the conf variable does.

def get_dataset_builder(path, conf=None):
    "Get a dataset builder from name and conf."
    module_path = datasets.load.prepare_module(path, dataset=True)
    builder_cls = datasets.load.import_main_class(module_path[0], dataset=True)
    if conf:
        builder_instance = builder_cls(name=conf, cache_dir=None, hash=module_path[1],
        data_dir = 'path/to/local/')
    else:
        builder_instance = builder_cls(cache_dir=None, hash=module_path[1])
    return builder_instance

Also, since Story Cloze is a special case we can add it to the dataset_list field i.e dataset_list += ["story_cloze"].
A couple of things that we need to figure out ?

What is the directory for storing the dataset locally in the server?
Since the dataset is not public we should only show a few samples to help in making the templates.
If the PR is not merged by the weekend we should consider adding custom data loaders.

Trivia-QA dataset download failed.

Could not download the trivia-QA dataset.

ChunkedEncodingError: ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))
Traceback:
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/streamlit/script_runner.py", line 338, in _run_script
    exec(code, module.__dict__)
File "/Users/bariamzn/Documents/promptsource/promptsource/promptsource.py", line 183, in <module>
    dataset, failed = get_dataset(dataset_key, str(conf_option.name) if conf_option else None)
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/streamlit/caching.py", line 573, in wrapped_func
    return get_or_create_cached_value()
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/streamlit/caching.py", line 557, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
File "/Users/bariamzn/Documents/promptsource/promptsource/utils.py", line 48, in get_dataset
    builder_instance.download_and_prepare()
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/builder.py", line 574, in download_and_prepare
    self._download_and_prepare(
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/builder.py", line 630, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/Users/bariamzn/.cache/huggingface/modules/datasets_modules/datasets/trivia_qa/9977a5d6f72acfd92f587de052403e8138b43bb0d1ce595016c3baf7e14deba6/trivia_qa.py", line 172, in _split_generators
    file_paths = dl_manager.download_and_extract(download_urls)
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/download_manager.py", line 287, in download_and_extract
    return self.extract(self.download(url_or_urls))
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/download_manager.py", line 195, in download
    downloaded_path_or_paths = map_nested(
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 203, in map_nested
    mapped = [
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 204, in <listcomp>
    _single_map_nested((function, obj, types, None, True)) for obj in tqdm(iterable, disable=disable_tqdm)
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 142, in _single_map_nested
    return function(data_struct)
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/download_manager.py", line 218, in _download
    return cached_path(url_or_filename, download_config=download_config)
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 281, in cached_path
    output_path = get_from_cache(
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 663, in get_from_cache
    http_get(
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 497, in http_get
    for chunk in response.iter_content(chunk_size=1024):
File "/Users/bariamzn/anaconda3/envs/pt3.8-bigscience/lib/python3.8/site-packages/requests/models.py", line 756, in generate
    raise ChunkedEncodingError(e)

Can we download the dataset from any other source? We can also include the nq_open prompt blindly to the trivia-qa task since both of them are open domain QA task.

MRQA Dataset: Unable to download the dataset

I am unable to download the MRQA Dataset. I am not sure if it is a system-specific error.

Stacktrace:

    exec(code, module.__dict__)
File "/home/abheesht/Academics/Research/BigScience/promptsource/promptsource/promptsource.py", line 183, in <module>
    dataset, failed = get_dataset(dataset_key, str(conf_option.name) if conf_option else None)
File "/home/abheesht/anaconda3/lib/python3.7/site-packages/streamlit/caching.py", line 573, in wrapped_func
    return get_or_create_cached_value()
File "/home/abheesht/anaconda3/lib/python3.7/site-packages/streamlit/caching.py", line 557, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
File "promptsource/utils.py", line 48, in get_dataset
    builder_instance.download_and_prepare()
File "/home/abheesht/anaconda3/lib/python3.7/site-packages/datasets/builder.py", line 579, in download_and_prepare
    dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/home/abheesht/anaconda3/lib/python3.7/site-packages/datasets/builder.py", line 656, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/abheesht/anaconda3/lib/python3.7/site-packages/datasets/builder.py", line 977, in _prepare_split
    generator, unit=" examples", total=split_info.num_examples, leave=False, disable=not_verbose
File "/home/abheesht/anaconda3/lib/python3.7/site-packages/tqdm/std.py", line 1133, in __iter__
    for obj in iterable:
File "/home/abheesht/.cache/huggingface/modules/datasets_modules/datasets/mrqa/dc80e7d4b01c458c67875a0fb87f7e7e47b19320d2c816e5a9d05ac137fcd746/mrqa.py", line 171, in _generate_examples
    paragraph = json.loads(row)
File "/home/abheesht/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
File "/home/abheesht/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/abheesht/anaconda3/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)

Prompt filters

Some prompts may only apply to a subset of the data. (For instance datasets with multiple languages.) Maybe an optional filter function per prompt.

Creating unique identifier in the template.yaml

For now, it looks like we can sort of uniquely identify each template using a combination of template name and dataset name, but I'm expecting potential collisions when a lot of people start contributing. Besides, naming each template might not be useful (like if we end up with names like template1 template2 etc...), and it would help contributors if they don't have to add a name/check conflicts on the naming part before merging their template.yaml.

I was thinking that we could add an ID to each entry by getting the hash of timestamp + dataset + string of prompt python function or jinja template? That should be more than enough to prevent collisions

`find_undeclared_variables` issue on template

Because of this test my pull is failing.

Pull on #170
There is an unused variable needed to update a dictionary. Otherwise we need do extension access. Please see the template,

{% set _ner_label_dict = ({
0:"O",
1:"B-PER",
2:"I-PER",
3:"B-ORG",
4:"I-ORG",
5:"B-LOC",
6:"I-LOC",
7:"B-MISC",
8:"I-MISC"
}) %}
{% set _random_ner_label_dict = ({}) %}
{% for k,v in _ner_label_dict.items() -%}
    {% set _rand_num=range(0, 100) | random %}
    {% if _rand_num > 50 %}
        {% set _dummy = _random_ner_label_dict.update({k:v}) %}
    {% endif %}
{% endfor %}
Generate named entities from the following sentence. The named entities are
{% for k,v in _random_ner_label_dict.items() -%}
    {{ v }}
    {{- ", " if not loop.last else "" -}}
{% endfor %}
{{""}}
{% for i in tokens -%}
    {{- " " if not loop.last else "" -}}
    {{ i }}
{% endfor %}
|||
{%- for tok in tokens -%}
    {%- set outer_loop = loop -%}
    {%- for label in ner_tags -%}
         {%- if outer_loop.index == loop.index -%}
            {{- "" if outer_loop.first else " " -}}
            {%- if label in _random_ner_label_dict -%}
                {{tok}}:{{ _random_ner_label_dict[label] }}
            {%- else -%}
                {{tok}}:O
            {% endif %}
        {%- endif -%}
    {%- endfor -%}
{%- endfor -%}

there is no error in the template but it is failing because of the _dummy variable.

Pull on #214

 {% if language == "english" %}
{% if annotations.yes_no_answer[0] == "YES" or annotations.yes_no_answer[0] == "NO" %}
{% set _position = ["above", "following"] |choice %}
{% if  _position == "above" %}
Question: {{question_text}}
{{"\n"}}
{% endif %}
Answer the {{_position}} question.
{% if  _position == "following" %}
{{"\n"}}
Question: {{question_text}}
{% endif %}
|||
{{annotations. yes_no_answer[0]}}
{% endif %}
{% endif %}

The variable _position is detected as unused variable. Not sure why. But there is no error in the template. Because of this issue it is not passing the test.

Any workaround to solve the issue?

Cached target files are all -1

Currently, these target files have all -1. Colin proposed a fix in Slack and I will try that first thing tomorrow morning.

######
winogrande_winogrande_xl
Prompt: _Replace
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_xl_Replace_targets
Prompt: _does_underscore_refer_to
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_xl_does_underscore_refer_to_targets
Prompt: _fill_in_the_blank
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_xl_fill_in_the_blank_targets
Prompt: _stand_for
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_xl_stand_for_targets
Prompt: _underscore_refer_to
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_xl_underscore_refer_to_targets
######
winogrande_winogrande_debiased
Prompt: _Replace
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_debiased_Replace_targets
Prompt: _does_underscore_refer_to
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_debiased_does_underscore_refer_to_targets
Prompt: _fill_in_the_blank
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_debiased_fill_in_the_blank_targets
Prompt: _stand_for
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_debiased_stand_for_targets
Prompt: _underscore_refer_to
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/winogrande_winogrande_debiased_underscore_refer_to_targets
######
race_high
Prompt: _Read_the_article_and_answer_the_question_no_option_
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/race_high_Read_the_article_and_answer_the_question_no_option__targets
Prompt: _Select_the_best_answer_generate_span_
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/race_high_Select_the_best_answer_generate_span__targets
######
race_middle
Prompt: _Read_the_article_and_answer_the_question_no_option_
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/race_middle_Read_the_article_and_answer_the_question_no_option__targets
Prompt: _Select_the_best_answer_generate_span_
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/race_middle_Select_the_best_answer_generate_span__targets
######
social_i_qa
Prompt: _Generate_answer
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/social_i_qa_Generate_answer_targets
Prompt: _I_was_wondering
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/social_i_qa_I_was_wondering_targets
Prompt: _Show_choices_and_generate_answer
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/social_i_qa_Show_choices_and_generate_answer_targets
######
super_glue_copa
Prompt: _C1_or_C2_premise_so_because_
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_C1_or_C2_premise_so_because__targets
Prompt: __As_a_result_C1_or_C2_
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa__As_a_result_C1_or_C2__targets
Prompt: __What_could_happen_next_C1_or_C2_
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa__What_could_happen_next_C1_or_C2__targets
Prompt: __which_may_be_caused_by
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa__which_may_be_caused_by_targets
Prompt: __why_C1_or_C2
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa__why_C1_or_C2_targets
Prompt: _best_option
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_best_option_targets
Prompt: _cause_effect
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_cause_effect_targets
Prompt: _choose
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_choose_targets
Prompt: _exercise
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_exercise_targets
Prompt: _i_am_hesitating
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_i_am_hesitating_targets
Prompt: _more_likely
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_more_likely_targets
Prompt: _plausible_alternatives
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/super_glue_copa_plausible_alternatives_targets
######
piqa
Prompt: _finish_sentence_with_correct_choice
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/piqa_finish_sentence_with_correct_choice_targets
Prompt: _pick_correct_choice_with_choice_given_before_goal
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/piqa_pick_correct_choice_with_choice_given_before_goal_targets
Prompt: _what_is_the_correct_ending
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/piqa_what_is_the_correct_ending_targets
######
story_cloze_2016
Prompt: _Answer_Given_options
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/story_cloze_2016_Answer_Given_options_targets
Prompt: _Choose_Story_Ending
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/story_cloze_2016_Choose_Story_Ending_targets
Prompt: _Movie_What_Happens_Next
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/story_cloze_2016_Movie_What_Happens_Next_targets
Prompt: _Novel_Correct_Ending
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/story_cloze_2016_Novel_Correct_Ending_targets
Prompt: _Story_Continuation_and_Options
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/story_cloze_2016_Story_Continuation_and_Options_targets
######
hellaswag
Prompt: _Randomized_prompts_template
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/hellaswag_Randomized_prompts_template_targets
Prompt: _complete_first_then
gs://bigscience/experiment_d/finetune-t5-xxl-lm-d4-091621-512/validation_eval/hellaswag_complete_first_then_targets

basic quality code checks

It would be nice to set up some code quality checks (black, isort for instance) to ensure consistency in coding quality & style

[open for discussion] `get_dataset` for bigger datasets (downloading bottleneck)

Dumping here for discussion

When you query a new dataset, the datasets library downloads it (there is no way to look at a subset of the data without downloading it entirely), which becomes a bottleneck if 1/ you want to write prompts for bigger datasets, 2/ you write prompts for a lot of datasets.

To smooth as much as possible the prompt sourcing, a possibility to consider is to set up a unique (big) server with the app running on it with all the necessary datasets already loaded/downloaded.

This would also simplify the sync between multiple people (no need to push PRs on Github) but we would be compromising on the traceability...
How would we ensure "quality" though (for instance: grammaticality, understanding of the task, etc.)?

Fix save functionality to save a template

Currently, one needs to modify and save twice for template changes to reflect in templates.yaml. This needs to be fixed to save seamlessly at first go.

Priority dataset disappears from dataset list after adding a third template

Steps to reproduce the behavior:

Select Filter Priority Dataset
Select a dataset with 2 templates (e.g.: ag_news)
Type in a new template name, press Create
The form refreshes and the current dataset no longer appears in the template list (because of the count_dict[d]>2 condition)

One possible solution would be to save the name of the current working dataset and include it in the
templates list in list_datasets, something like:
dataset_list = list(set(dataset_list) - set(list(d for d in count_dict if count_dict[d]>2 and d!=state.current_dataset_name)))

When should it be cleared? Maybe as soon as the new template has been saved?

Dataset Schema getting displayed twice

Additional `<p>` tag in the template output

After writing the template, on the right side, there is a sample shown in the app.
I see <p> tags inside the samples (from the inspect of the browser). I am not sure if this is because of HTML or not. But we need make sure that it will not stay in the final version of the training data.

{% set _ner_lable_dict = ({
0:"O",
1:"B-PER",
2:"I-PER",
3:"B-ORG",
4:"I-ORG",
5:"B-LOC",
6:"I-LOC",
7:"B-MISC",
8:"I-MISC"
}) %}
Generate named entities from the following sentence. The named entities are
{% for k,v in _ner_lable_dict.items() -%}
    {{ v }}
    {{- ", " if not loop.last else "" -}}
    {{- "\n" if loop.last -}}
{% endfor %} 
{% for i in tokens -%}
    {{- " " if not loop.last else "" -}}
    {{ i }}
{% endfor %} 
|||
{% set flag = 0 %}
{% set outer_cnt = namespace(value=0) -%}
{% for tok in tokens -%}
    {% set inner_cnt = namespace(value=0) -%}
    {% for lable in ner_tags -%}
         {% if outer_cnt.value == inner_cnt.value -%}
            {% if flag != 0 -%}
                &nbsp;
            {% endif -%}
            {% set flag = 1 -%}
            {{tok}}:{{ _ner_lable_dict[lable] }}
        {% endif -%}
        {% set inner_cnt.value = inner_cnt.value + 1 -%}
    {% endfor -%} 
    {% set outer_cnt.value = outer_cnt.value + 1 -%}
{% endfor -%}

check the details here, d97c6b2

tweet_qa: failed to generate dataset

I am working on the prompts for tweet_qa.
After downloading the tweet_qa dataset, I encountered the following error on the app:

DuplicatedKeysError: FAILURE TO GENERATE DATASET ! Found duplicate Key: 2a167f9e016ba338e1813fed275a6a1e Keys should be unique and deterministic in nature
Traceback:
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\streamlit\script_runner.py", line 338, in _run_script
    exec(code, module.__dict__)
File "C:\Users\User\Documents\2021_dl_projects\promptsource\promptsource\promptsource.py", line 181, in <module>
    dataset, failed = get_dataset(dataset_key, str(conf_option.name) if conf_option else None)
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\streamlit\caching.py", line 573, in wrapped_func
    return get_or_create_cached_value()
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\streamlit\caching.py", line 557, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
File "promptsource\utils.py", line 48, in get_dataset
    builder_instance.download_and_prepare()
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\datasets\builder.py", line 575, in download_and_prepare
    dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\datasets\builder.py", line 652, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\datasets\builder.py", line 992, in _prepare_split
    num_examples, num_bytes = writer.finalize()
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\datasets\arrow_writer.py", line 409, in finalize
    self.check_duplicate_keys()
File "c:\users\user\anaconda_spyder\envs\tf\lib\site-packages\datasets\arrow_writer.py", line 349, in check_duplicate_keys
    raise DuplicatedKeysError(key)

Rank eval of vanilla D4

Since Colin is using the TPUs exclusively for training, @VictorSanh would you please use Jean Zay to evaluate the vanilla D4 checkpoints instead of other ablations? You don't have evaluate all checkpoints, maybe schedule them such that, 1/2 steps checkpoint first, then 1/4 and 3/4 steps checkpoints, then 1/5, 2/5, 3/5, 4/5 checkpoints, and so on…

Also, if ReCoRD is slow to eval we should skip it. Seqio doesn't currently handle multiple correct answers so we need to run their official eval script anyway. Let me know if there are other datasets that are unusually slow to eval.

Template is rendering Error in `Prompted dataset viewer`

After writing the template I don't see any error generated. But when I go the Prompted dataset viewer I get the following error,

{% set _ner_lable_dict = ({
0:"O",
1:"B-PER",
2:"I-PER",
3:"B-ORG",
4:"I-ORG",
5:"B-LOC",
6:"I-LOC",
7:"B-MISC",
8:"I-MISC"
}) %}
Generate named entities from the following sentence. The named entities are
{% for k,v in _ner_lable_dict.items() -%}
    {{ v }}
    {{- ", " if not loop.last else "" -}}
    {{- "\n" if loop.last -}}
{% endfor %} 
{% for i in tokens -%}
    {{- " " if not loop.last else "" -}}
    {{ i }}
{% endfor %} 
|||
{% set flag = 0 %}
{% set outer_cnt = namespace(value=0) -%}
{% for tok in tokens -%}
    {% set inner_cnt = namespace(value=0) -%}
    {% for lable in ner_tags -%}
         {% if outer_cnt.value == inner_cnt.value -%}
            {% if flag != 0 -%}
                &nbsp;
            {% endif -%}
            {% set flag = 1 -%}
            {{tok}}:{{ _ner_lable_dict[lable] }}
        {% endif -%}
        {% set inner_cnt.value = inner_cnt.value + 1 -%}
    {% endfor -%} 
    {% set outer_cnt.value = outer_cnt.value + 1 -%}
{% endfor -%}

check the details here, d97c6b2

Streamlit version and `beta_columns`

beta_columns was moved out of beta in the latest streamlit version (at least 0.88.0 but might have happened before). We should replace these beta_columns into columns (and pin the streamlit version), otherwise there are warnings popping up everywhere.

Writing: Finalize dataset split image.

Need to know the exact splits for this figure.

datasets (2).pdf
e.

parse or annotate Jinja template | before augmentation → augmenters know where the off-limit variables are

related: #10 #48

One of the quick'n'dirty ways that I imagine comprises the 4 steps below:

Get the variable list of a template string

parsed_content = env.parse(template_string)
variable_list = meta.find_undeclared_variables(parsed_content)

Send parsed_content and variable_list to an augmenter
An augmenter replace variables with some special tokens that its algorithm won't touch (to some extent)
After augmentation, put variables back to where they were

FileNotFoundError while loading a dataset

Loading datasets like Salesforce/QAConv, wmt/wmt17, yluisfern/PBU etc. failing due to FileNotFoundError: Couldn't find file locally at Salesforce/QAConv\QAConv.py, or remotely at https://huggingface.co/datasets/Salesforce/QAConv/resolve/main/QAConv.py. Please provide a valid dataset name

Job Story

When

Using Streamlit for generating templates

I want to

Have data augmentation API defined

Input arguments
- Instance or template? Template might be tricky for preserving slots.
- Prompt and/or input?
Output type
- Just one string or a list of strings?

So I can

Produce variations of prompt and/or input

Notes

I'm gonna do it during the weekend (2021-05-23/24)

Error in loading definite_pronoun_resolution dataset

It seems that the data source http://www.hlt.utdallas.edu/~vince/data/emnlp12/train.c.txt is not available.

I have marked "Skip" on the spreadsheet, but I would like to bring this issue to attention.

.DS_Store breaks template loading

Went to look at the templates folder in OS X and broke template loading.) (For non-Mac users, Finder creates a hidden file for storing view preferences.)

NotADirectoryError: [Errno 20] Not a directory: './templates/.DS_Store'
Traceback:
File "/Users/bach/anaconda3/envs/bigscience/lib/python3.7/site-packages/streamlit/script_runner.py", line 338, in _run_script
    exec(code, module.__dict__)
File "/Users/bach/promptsource/promptsource/promptsource.py", line 84, in <module>
    template_collection = TemplateCollection()
File "promptsource/templates.py", line 27, in __init__
    self.datasets_templates: Dict[(str, Optional[str]), DatasetTemplates] = self._collect_dataset()
File "promptsource/templates.py", line 43, in _collect_dataset
    for filename in os.listdir(os.path.join(TEMPLATES_FOLDER_PATH, dataset)):

Should skip hidden files in the templates directory.