ucinlp / autoprompt Goto Github PK

View Code? Open in Web Editor NEW

566.0 11.0 82.0 78.05 MB

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

License: Apache License 2.0

Python 76.38% Shell 1.37% HTML 22.25%

nlp language-model evaluation

autoprompt's People

Contributors

Stargazers

Watchers

autoprompt's Issues

Question on using this in conjunction with CLIP / Open_CLIP / VQGAN

mlfoundations/open_clip#1

@gabrielilharco suggested this research may help with my problem above.
Basically want to introspect an image to shed light on what prompts would be appropriate to recreate a similar image.

I got different triggers with the one in fact_retrieval_bert_prompts.jsonl

`class FactRetriveArgs():

train = Path("/path/P108/train.jsonl")
dev = Path('/path/P108/dev.jsonl')
template = '[CLS] {sub_label} [T] [T] [T] [T] [T] [P]. [SEP]'
num_cand = 10
accumulation_steps = 1
model_name = '/var/www/pre_models/bert-base-cased'
bsz = 32
eval_size = 32
iters = 1000
label_field = 'obj_label'
tokenize_labels = True
filter = True
print_lama = True
use_ctx = False
seed = 0
label_map = None
initial_trigger = None
perturbed = False
patience = 5
debug = False
limit = None`

I have trained P108 from TREx by the parameters above, the result is down below.

INFO:autoprompt.create_trigger:No improvement detected. Skipping evaluation. INFO:autoprompt.create_trigger:Iteration: 999 INFO:autoprompt.create_trigger:Accumulating Gradient 100%|██████████| 1/1 [00:00<00:00, 36.45it/s] INFO:autoprompt.create_trigger:Evaluating Candidates 100%|██████████| 1/1 [00:00<00:00, 10.50it/s] INFO:autoprompt.create_trigger:No improvement detected. Skipping evaluation. INFO:autoprompt.create_trigger:Best tokens: ['helped', 'survives', 'computer', 'lawsuit', 'against'] INFO:autoprompt.create_trigger:Best dev metric: -3.59419067382812 {"relation": "P108", "template": "[ x ] helped survives computer lawsuit against [Y]."}

The final template is "template": "[ x ] helped survives computer lawsuit against [Y]." which is different with {"relation": "P108", "template": "[X] 1987adeNBC computing succeeded [Y]."} in fact_retrieval_bert_prompts.jsonl.

So is there anything wrong with my training?

A question for the meaning of letter 'Ġ'

Hello Taylor :
Thank you again for solving my question last time, with the command you provided, now I can run the create_trigger.py normally. Recently, I have another question about the setting of the verbalizer for sentiment analysis. In the command you provided, all the words of the verbalizer begin with the letter 'Ġ', I don't know the meaning of it, so I try to remove the letter 'Ġ' in order to figure out the meaning of it. However, after I removed the letter 'Ġ' and run create_trigger.py, the program goes wrong and I can't obtain the "Best dev metrics'. In summary, could you please explain the meaning or usage of letter 'Ġ' to me?
Wishing for your reply!

How to run the code

We configured the environment and downloaded the pre-trained model when debugging the code, but there are problems when running the code, such as how to execute run.py, how to execute and generate templates in the correct order in the shell? Please also tell me the necessary components to run and the steps to run. Thank you very much!

Streamlit demo

Main TODOs left after PR #20 are:

confirm code is correct in terms of implementation
make it prettier (see other HF+ST demos, like https://huggingface.co/zero-shot/, https://huggingface.co/rag/)
get caching working
good goal: refactor/copy autoprompt code without the task specific stuff, i.e. customized to this demo

Is there the code for downstream task in this project？

to compare the efficiency between manual prompt and autoprompt

Figures for the corrected finetuning version

I found the figure data is available at:
https://github.com/ucinlp/autoprompt/tree/emnlp2020-figures
However, it looks like the fine-tuning results for (at least) dataset_size experiments is not updated in the repository. Can you please update them so we can refer to your results?

Label Search of NLI

Hi,

I noticed that the template is '[CLS] {sentence} [T] [T] [T] [P]. [SEP]' in the given command for NLI label search (See readme.md).

Is that correct? I suppose it should be '[CLS] {sentence_A} [P] [T] [T] [T] [T] {sentence_B}. [SEP]'.

And the keys of label_map should be in the capital, i.e entailment -> ENTAILMENT.

By the way, the repo is the clearest one I have ever seen! Many thanks for your share.

AttributeError: 'str' object has no attribute 'masked_select'

I am running the code on LAMA data (Fact-retreival dataset) and am facing following issue-
I ran this command--

python -m autoprompt.create_trigger    
      --train data/train.jsonl     
      --dev data/dev.jsonl
      --template '<s> {sub_label} [T] [T] [T] [P] . </s>'
      --num-cand 10     
      --accumulation-steps 1     
      --model-name roberta-large     
      --bsz 56     --eval-size 56     --iters 1000     
      --label-field 'obj_label'     
      --tokenize-labels     
      --filter     
      --print-lama

Traceback (most recent call last):
  File "/mnt/infonas/data/baekgupta/miniconda3/envs/tkgqa_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/infonas/data/baekgupta/miniconda3/envs/tkgqa_env/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/infonas/data/baekgupta/codlb/lm-kbc/autoprompt/autoprompt/create_trigger.py", line 529, in <module>
    run_model(args)
  File "/mnt/infonas/data/baekgupta/codlb/lm-kbc/autoprompt/autoprompt/create_trigger.py", line 295, in run_model
    predict_logits = predictor(model_inputs, trigger_ids)
  File "/mnt/infonas/data/baekgupta/codlb/lm-kbc/autoprompt/autoprompt/create_trigger.py", line 59, in __call__
    predict_logits = logits.masked_select(predict_mask.unsqueeze(-1)).view(logits.size(0), -1)
AttributeError: 'str' object has no attribute 'masked_select'

What is the T in your label search algorithm?

I am just very curious about your T tokens here - trigger placeholders (a new added special token in the tokenizer with random initialization?)! In other words, can you give me some explanations to your label set searching algorithm? Thanks in advance for your kind help!

请问自己生成的的提示在哪里，我运行完prompt的程序也没见到自己生成的提示啊？

A program error in create_trigger.py for sentiment analysis task

When running the python file create_trigger.py (for sentiment analysis task) , the training process can run normally, but when the program step into the part of saving the template to json file, an KeyError rasied. I think the error is because that the the only two key for the dict is "[X]" and "[Y]", but the program tends to get the value of "label", so the error raised. Moreover, I think the fuction __call__ of class TriggerTemplatizer is used for load dataset, so the error raised when the same function is used to get the best template. So I don't know how to fix it, looking forward to your reply!

Traceback (most recent call last):
File "/home/yzh/remote_folder/Major_Code/autoprompt-master/autoprompt/create_trigger.py", line 527, in
run_model(args)
File "/home/yzh/remote_folder/Major_Code/autoprompt-master/autoprompt/create_trigger.py", line 448, in run_model
'obj_label': tokenizer.lama_y,
File "/home/yzh/remote_folder/Major_Code/autoprompt-master/autoprompt/utils.py", line 157, in __call__
label = format_kwargs[self._label_field]
KeyError: 'label'

The error code is shown in the following figure

problem about autoprompt.create_trigger.py

I ran this file and got the best triggers with 0.73 dev_metric. However, when I ran it again with the initial triggers just as the best I got before, I just get 0.4 dev_metric in the first iteration. Why...

Question about code running on SNLI dataset

Hi, thank you so much for such an informative code.

I'm now trying to run this code with other NLI datasets (like MNLI, SNLI). But they all performed very poorly in the label_search part, and the loss was difficult to converge, which also result in a poor performance in the creat_trigger part (acc≈0.3). But before this, the results on the (short version) SICK-E dataset are consistent with those mentioned in your paper.

I would like to ask if this code can support me to find suitable labels for the MNLI dataset. If so, do you have any recommendations for the setting of hyperparameters(e.g. lr)? In addition, I use the bert-base-cased model.

Thanks a lot.

The effect of proper noun in the filtering part

Hi, I really enjoyed reading the paper and appreciate for making the code publicly available! I have a quick question about the filtering part where you exclude proper nouns and label token from the candidate vocabulary.

Would it degrade prompt quality without that filtering, let's say in LAMA for instance? I'm thinking to adapt AutoPrompt in our setting where the loss function is a bit different from what shown in the paper and wondering whether I need this filtering or not.

How to run the code --queries

@rloganiv @taylorshin
I am using this model to generate prompts for sentiment analysis on imdb.
I ran this code to generate trigger tokens :
python -m autoprompt.create_trigger \ --train glue_data/SST-2/train.tsv \ --dev glue_data/SST-2/dev.tsv \ --template '<s> {sentence} [T] [T] [T] [P] . </s>' \ --label-map '{"0": ["Ġworse", "Ġincompetence", "ĠWorse", "Ġblamed", "Ġsucked"], "1": ["ĠCris", "Ġmarvelous", "Ġphilanthrop", "Ġvisionary", "Ġwonderful"]}' \ --num-cand 100 \ --accumulation-steps 30 \ --bsz 24 \ --eval-size 48 \ --iters 180 \ --model-name roberta-large

My 1st question is that the trigger words generated will be same for all imdb reviews?

Secondly, I also ran code to generate the labels (as you suggested in .Readme), so now I have trigger tokens and labels generated by your model. I want to know what is the next step? How to prompt the LM to generate labels?
Also, is that the set of labels generated by your model same for all imdb reviews in dataset?

In the command for generating labels (which you wrote in readme), should I replace [T] the trigger tokens with the trigger tokens generated by. model?

I urgently need these answers. I request authors to look into these queries.
Thanks.

Query about the function "encode_label" in autoprompt/utils.py

Hi,

I am trying to reproduce your experiment's result of the relation extraction (Table 5 in the paper).

However, I got an error that "Label "xxx" gets mapped to multiple tokens." which is alerted by the function encode_label in the file autopromot/utils.py.

The command I used is:

python -m autoprompt.create_trigger \
    --train  {train_path} \
    --dev {dev_path} \
    --template '<s> {context} {sub_label} [T] [T] [T] [T] [T] [P] . </s>' \
    --num-cand 10 \
    --accumulation-steps 1 \
    --model-name roberta-large \
    --bsz 32 \
    --eval-size 32 \
    --iters 500 \
    --label-field 'obj_label' \
    --filter \
    --print-lama \
    --use-ctx \
    --perturbed \
    --tokenize-labels \

I noticed that the exception is customized by you. Could you please help me to check what's wrong with my command?

About accuracy

I run the program according to the github, and the accuracy rate based on roberta-large can only reach 86%. How can I modify it to improve the accuracy rate?
Thank you!

Demo Improvements Round 2

The demo is nice for someone who is familiar with your method already, but it's very inaccessible for someone who doesn't and therefore doesn't serve as a very good education/promotion tool. I would recommend adding a few paragraphs of explanatory text in the style of an accessible blog post, explaining how to use the demo, what the method could be used for, and how it works at a very simplified high level.
I'm not sure I understand the need to have evaluation instances when using the manual input. I think the interactive demo is nice because it lets people see and play with the algorithm interactively, so having to come up with a set of evaluation points seems more cumbersome than helpful. If you do want to keep it, I'd change the free-form eval label fields to be dropdowns populated by the available labels. Also, I assume that "Train accuracy" should be "Evaluation accuracy?"
I recommend pre-populating the fields with a simple default example. This can go a long way in building the user's intuition of how the demo works.
The fact that labels have to be single tokens is unfortunate for our purposes. Maybe this is too ambitious, but maybe the error message could suggest a few possible synonyms from wordnet that are single tokens?
The yellow highlighting of the predicted score makes the text unreadable. I'd make the background a darker version of the color, and would use green rather than yellow.
Add a train button so that the training script doesn't run prematurely after adjusting a single parameter or example.
I'd put the training parameters above the template section. Right now when you open the demo, it seems like you have to read and understand the whole template section in order to use the demo, when in reality only a minority of advanced users will likely play with the template. Moving it below the training parameters might help mitigate that impression.

Question about proper settings of prompt extraction attack.

I ran this code with the command of

python  carve_sigil.py \
        name=llama2_gcg_sys \
        wandb.tags=[extraction] \
        model=llama2-7b-chat \
        optimizer=gcg \
        sigil.num_tokens=32 \
        sigil=sysrepeater

I evaluated the result prompt with 50 test samples in awesome-chatgpt-prompts dataset, but only 3 samples was correctly responsed (correctly repeat the system prompt). I don't know if it is normal or I just did something wrong. Could you please offer me the best setting of prompt extraction attack or tell me where I went wrong? Thank you very much.

"Gradient-Based Prompt Search"&" Automating Label Token Selection".which one is the first step?

Both steps depend on each other?

Could you share the corresponding paper :)

The code seems to be incomplete at the moment? Could you share the corresponding paper~

TypeErrors in label_search.py

Hello, thanks for great codebase!

in autoprompt/label_search.py, I found that there are some TypeErrors(missing arguments).

use_ctx is not denoted when loading dataset
when initializing utils.TriggerTemplatizer, 'config' argument is missing.

First one can be fixed as just add 'use_ctx=False' since label_search.py is only used for sentiment analysis and NLI tasks. (not for relation extraction)

Fixed in PR #31 .

Thank you again.

where is scripts/run_experiments.py

where are the found prompts?

Additional Wikidata triples

Hi there,

we are interested in working with your training data set for fact extraction.
In the paper you mention that TREX does not contain 1000 triples for all properties, so you add extra triples from Wikidata. However, I cannot find these triples in the .jsonl files. Some of the properties actually don't have 1000 triples.
Am I missing something?
It would be nice, if you could clarify how I can find these additional triples or whether you did not use them after all in the training.
Bests,
Jan

Zero value for evaluation_fn in create_trigger.py

I am running your code for the sentimental analysis with the given command, but I keep getting zero for evaluation_fn and the output is "no improvement detected". Is this supposed to be happened?
I passed in the label map as default

NLI Templates

Hi all,

Thanks for releasing this codebase, it looks really nice! I'm interested in running some NLI experiments, but I noticed that the templates are missing (coming soon) for NLI on both the master and the dev branches---do you happen to remember what the run command would be?

Thanks!

Two arguments missed in label_search.py

Hello:
The first one is at line 70, config is not passed to utils.TriggerTemplatizer.init():

    templatizer = utils.TriggerTemplatizer(
        args.template,
        tokenizer,
        label_map=label_map,
        label_field=args.label_field,
        add_special_tokens=False
    )

The second one is at line 96, use_ctx is required:

train_dataset = utils.load_trigger_dataset(args.train, templatizer)

Regards.

ucinlp / autoprompt Goto Github PK

autoprompt's People

Contributors

Stargazers

Watchers

Forkers

autoprompt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs