yushi-hu / promptcap Goto Github PK

View Code? Open in Web Editor NEW

67.0 67.0 7.0 23.67 MB

natual language guided image captioning

Shell 1.69% Python 98.31%

promptcap's People

Contributors

Stargazers

Watchers

Forkers

vinay-atgithub jonathanrayner samuelcahyawijaya bennokrojer dh-boy jasonz777 viettham1998

promptcap's Issues

Evaluation result on OK-VQA

Hi, thanks for your interesting work.

I wonder how you evaluated your final results of OK-VQA.

Paper says it was evaluated with soft accuracy of VQAv2,
so I tried to evaluate your OKVQA_val_gpt3.json in Evaluation Logs using VQA evalutation code.
https://github.com/GT-Vision-Lab/VQA

It shows 58.89 score, but 60.4 in paper.
I used "mscoco_val2014_annotations.json" file in OK-VQA website for annotation file.

Didn't you use the VQA evaluation code?
or the log files are not final result?

Thank you

Reproduce paper results

Hi!

Thanks for the repository and the paper (cool idea!).

I was wondering how I can reproduce your results with either GPT-3 or Flan-T5? What you show in the README is UnifiedQA, as far as I can see works without any few-shot demonstrations, and also from my experiments performs significantly worse on OK-VQA (around 32% even with a larger T5 than you use in the README).
Would I need to run https://github.com/Yushi-Hu/PromptCap/blob/main/new_pica/run_pica_okvqa.sh ?
If so, could you make the needed files available? They seem to be custom files that do not come with the standard datasets.

Thank you!
Benno

PromptCAP is not working on google colab

Hey there!
Thanks a lot for amazing work and making it public.
Unfortunately when i tried to run the code on colab, i got the following error:

TypeError Traceback (most recent call last)

in <cell line: 7>()
5 image = "/content/temp1.jpg"
6
----> 7 print(model.caption(prompt, image))

2 frames

/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
1573 if self.config.is_encoder_decoder and "encoder_outputs" not in model_kwargs:
1574 # if model is encoder decoder encoder_outputs are created and added to model_kwargs
-> 1575 model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
1576 inputs_tensor, model_kwargs, model_input_name, generation_config
1577 )

TypeError: OFAModel._prepare_encoder_decoder_kwargs_for_generation() takes from 3 to 4 positional arguments but 5 were given

The code that i tried to run is as follow:

In one cell, i run:
!pip install promptcap

In other cell, i run:
import torch
from promptcap import PromptCap

model = PromptCap("tifa-benchmark/promptcap-coco-vqa") # also support OFA checkpoints. e.g. "OFA-Sys/ofa-large"
if torch.cuda.is_available():
model.cuda()

prompt = "what does the image describe?"
image = "/content/temp1.jpg"

print(model.caption(prompt, image))

Any help will be appreciated.

Error with “model = PromptCap("vqascore/promptcap-coco-vqa")”

Hello, I encountered the following error while using this interface. Is it not working?

vqascore/promptcap-coco-vqa
<super: <class 'OFATokenizer'>, >
Traceback (most recent call last):
File "/home/lh/mukea-clip/pc.py", line 4, in
model = PromptCap("vqascore/promptcap-coco-vqa") # also support OFA checkpoints. e.g. "OFA-Sys/ofa-large"
File "/home/lh/.conda/envs/pytorch/lib/python3.6/site-packages/promptcap/promptcap.py", line 12, in init
self.tokenizer = OFATokenizer.from_pretrained(ckpt)
File "/home/lh/.conda/envs/pytorch/lib/python3.6/site-packages/promptcap/tokenization_ofa.py", line 67, in from_pretrained
tokenizer = super().from_pretrained(pretrained_model_name_or_path, *init_inputs, **kwargs)
File "/home/lh/.conda/envs/pytorch/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1664, in from_pretrained
local_files_only=local_files_only,
File "/home/lh/.conda/envs/pytorch/lib/python3.6/site-packages/transformers/file_utils.py", line 2242, in get_file_from_repo
use_auth_token=use_auth_token,
File "/home/lh/.conda/envs/pytorch/lib/python3.6/site-packages/transformers/file_utils.py", line 1854, in cached_path
local_files_only=local_files_only,
File "/home/lh/.conda/envs/pytorch/lib/python3.6/site-packages/transformers/file_utils.py", line 2103, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Batch Inference (Captioning)

Thanks for your interesting work and for sharing the code.

In the README, you only provide examples of how to generate captions for one image at a time (batch size = 1). Could you (@Yushi-Hu) explain how to generate captions in batches (multiple questions and corresponding images) in one go, instead of iteratively calling the model to improve time efficiency?

caption_generate_gpt3

Are you using GPT3 here to generate issue-aware image captions? If so? Do you want to know that each image in the dataset has 5 human-annotated subtitles? Like you mentioned in the paper; or are only the 20 examples here in-context with 5 human-annotated subtitles, and the rest of the images in the dataset still only have one subtitle? If you have time, please give some suggestions

PromptCap/promptcap-gen/caption_gen_greedy.py

Line 225 in 86dd6aa

 prompt += f"Original contexts: {'. '.join([it['caption'] for it in vqa_dataset[i]['caption']])}\n" 

Fine-Tuning?

How can we fine-tune this model? Is there any script available?

training code

Will the code for training the model be provided? thanks!

API key

Hey,

You've pushed your code with the OpenAI API key in it. Not sure if you wanted to put it out there, just letting you know :)

VQA samples for in-context learning during inference

Hi, thanks for your nice work. I'm wondering what is the source of VQA samples used for in-context learning during OKVQA evaluation. Is it the VQAV2 training set, OKVQA training set or OKVQA validation set? I would appreciate it if you could provide information about my queries.

yushi-hu / promptcap Goto Github PK

promptcap's People

Contributors

Stargazers

Watchers

Forkers

promptcap's Issues

Hey there! Thanks a lot for amazing work and making it public. Unfortunately when i tried to run the code on colab, i got the following error:

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Hey there!
Thanks a lot for amazing work and making it public.
Unfortunately when i tried to run the code on colab, i got the following error: