Hi, thanks for the amazing work you did with the library! I am curre

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

BLIP generated caption length about lavis HOT 5 OPEN

salesforce commented on June 4, 2024

BLIP generated caption length

from lavis.

Comments (5)

dxli94 commented on June 4, 2024

Hi @giacomocamposampiero, thanks for your interest. Glad to hear you are making progress.

Your configuration looks good to me. I see two possible aspects that you might want to consider:

training data size: BLIP is trained mostly on short captions. It could be the case that finetuning BLIP on 5k samples for 20 epochs is not enough to shift its behaviours towards generating long captions. If possible, you might want to supplement your training data.
BERT capacity. BERT was not initially trained for text generation. If you observe that even for training data, long captions cannot be well fitted even after some over-fitting experiments, then this might be the case. In this case, you might want to consider other language models, such as GPT, yet re-train the VL model may be required.

By the way, just curious what is the data you are using? Is it like a collection of images described by a paragraph each?

These are just my guess. Please feel welcome to discuss.

Thanks.

from lavis.

giacomocamposampiero commented on June 4, 2024

Thanks @dxli94 for the quick answer and your meaningful suggestions! I will try to increase the training data size/number of epochs and, if that doesn't make it, to explore different language models more suitable for longer text generation.

About the data: yes, I'm using a collection of images described by a paragraph each. The images however are quite simple (compositions of abstract geometric shapes) and the captions very structured and repetitive, hence I was hoping that my current data would have been enough to fine-tune the model.

from lavis.

shams2023 commented on June 4, 2024

您好，感谢您为图书馆所做的出色工作！

我目前正在尝试在自定义数据集上微调 BLIP。我按照您关于自定义数据集生成的教程进行操作，并设置了所有必要的文件进行微调，一切都按预期进行。我遇到的唯一问题是生成的字幕的最大长度。在我的训练配置文件中，此长度设置为 256，但模型永远不会生成长度超过约 50 个单词（平均大约 90 个标记）的标题。

我已经将 BERT 嵌入的大小增加到 256，并在这一行中对其进行硬编码：

LAVIS/lavis/models/blip_models/blip_caption.py

Line 51 in 6c6c981

self.max_txt_len = max_txt_len

并将默认 max_lengths 更改为 256：

LAVIS/lavis/models/blip_models/blip_caption.py

Line 214 in 6c6c981

max_txt_len = cfg.get("max_txt_len", 40)

和这里

LAVIS/lavis/models/blip_models/blip_caption.py

Line 141 in 6c6c981

max_length=30,

我的训练配置文件如下所示
model:
  arch: blip_caption

  model_type: base_coco
  load_finetuned: False

datasets:
  custom_caption: # name of the dataset builder
    vis_processor:
        train:
          name: "blip_image_train"
        eval:
          name: "blip_image_eval"
    text_processor:
        train:
          name: "blip_caption"
          prompt: "a picture of "
        eval:
          name: "blip_caption"

run:
  task: captioning
  # optimizer
  lr_sched: "linear_warmup_cosine_lr"
  init_lr: 1e-5
  min_lr: 0
  weight_decay: 0.05
  max_epoch: 20
  batch_size_train: 2
  batch_size_eval: 8
  num_workers: 1

  max_len: 256
  min_len: 5
  num_beams: 3

  seed: 42
  output_dir: "output/BLIP/Caption_custom"

  amp: False
  resume_ckpt_path: null

  evaluate: False 
  train_splits: ["train"]
  valid_splits: ["val"]
  test_splits: ["test"]

  device: "cuda"
  world_size: 1
  dist_url: "env://"
  distributed: True
我正在用 5000 个样本训练模型。您对我的微调配置中可能存在的错误或缺失有什么建议吗？我应该为优化器使用不同的参数吗？使用 BLIP 是否可以生成这么长的字幕？

谢谢！

Hi, brother!
May I ask if you also do image captions? I also want to use blip2 to generate image caption for my dataset. Have you implemented it? What is the quality of his captions for image generation? Do you need to make minor adjustments?

from lavis.

giacomocamposampiero commented on June 4, 2024

Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.

from lavis.

shams2023 commented on June 4, 2024

Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.

If you want to generate longer sentences, you can try using the llava model. I have tried using the text generated by his demo

from lavis.

BLIP generated caption length about lavis HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs