GithubHelp home page GithubHelp logo

BLIP generated caption length about lavis HOT 5 OPEN

salesforce avatar salesforce commented on June 4, 2024
BLIP generated caption length

from lavis.

Comments (5)

dxli94 avatar dxli94 commented on June 4, 2024

Hi @giacomocamposampiero, thanks for your interest. Glad to hear you are making progress.

Your configuration looks good to me. I see two possible aspects that you might want to consider:

  • training data size: BLIP is trained mostly on short captions. It could be the case that finetuning BLIP on 5k samples for 20 epochs is not enough to shift its behaviours towards generating long captions. If possible, you might want to supplement your training data.
  • BERT capacity. BERT was not initially trained for text generation. If you observe that even for training data, long captions cannot be well fitted even after some over-fitting experiments, then this might be the case. In this case, you might want to consider other language models, such as GPT, yet re-train the VL model may be required.

By the way, just curious what is the data you are using? Is it like a collection of images described by a paragraph each?

These are just my guess. Please feel welcome to discuss.

Thanks.

from lavis.

giacomocamposampiero avatar giacomocamposampiero commented on June 4, 2024

Thanks @dxli94 for the quick answer and your meaningful suggestions! I will try to increase the training data size/number of epochs and, if that doesn't make it, to explore different language models more suitable for longer text generation.

About the data: yes, I'm using a collection of images described by a paragraph each. The images however are quite simple (compositions of abstract geometric shapes) and the captions very structured and repetitive, hence I was hoping that my current data would have been enough to fine-tune the model.

from lavis.

shams2023 avatar shams2023 commented on June 4, 2024

您好,感谢您为图书馆所做的出色工作!

我目前正在尝试在自定义数据集上微调 BLIP。我按照您关于自定义数据集生成的教程进行操作,并设置了所有必要的文件进行微调,一切都按预期进行。 我遇到的唯一问题是生成的字幕的最大长度。在我的训练配置文件中,此长度设置为 256,但模型永远不会生成长度超过约 50 个单词(平均大约 90 个标记)的标题。

我已经将 BERT 嵌入的大小增加到 256,并在这一行中对其进行硬编码:

self.max_txt_len = max_txt_len

并将默认 max_lengths 更改为 256:

max_txt_len = cfg.get("max_txt_len", 40)

和这里

我的训练配置文件如下所示

model:
  arch: blip_caption

  model_type: base_coco
  load_finetuned: False

datasets:
  custom_caption: # name of the dataset builder
    vis_processor:
        train:
          name: "blip_image_train"
        eval:
          name: "blip_image_eval"
    text_processor:
        train:
          name: "blip_caption"
          prompt: "a picture of "
        eval:
          name: "blip_caption"

run:
  task: captioning
  # optimizer
  lr_sched: "linear_warmup_cosine_lr"
  init_lr: 1e-5
  min_lr: 0
  weight_decay: 0.05
  max_epoch: 20
  batch_size_train: 2
  batch_size_eval: 8
  num_workers: 1

  max_len: 256
  min_len: 5
  num_beams: 3

  seed: 42
  output_dir: "output/BLIP/Caption_custom"

  amp: False
  resume_ckpt_path: null

  evaluate: False 
  train_splits: ["train"]
  valid_splits: ["val"]
  test_splits: ["test"]

  device: "cuda"
  world_size: 1
  dist_url: "env://"
  distributed: True

我正在用 5000 个样本训练模型。您对我的微调配置中可能存在的错误或缺失有什么建议吗?我应该为优化器使用不同的参数吗?使用 BLIP 是否可以生成这么长的字幕?

谢谢!

Hi, brother!
May I ask if you also do image captions? I also want to use blip2 to generate image caption for my dataset. Have you implemented it? What is the quality of his captions for image generation? Do you need to make minor adjustments?

from lavis.

giacomocamposampiero avatar giacomocamposampiero commented on June 4, 2024

Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.

from lavis.

shams2023 avatar shams2023 commented on June 4, 2024

Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.

If you want to generate longer sentences, you can try using the llava model. I have tried using the text generated by his demo

from lavis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.