Comments (5)
Hi @giacomocamposampiero, thanks for your interest. Glad to hear you are making progress.
Your configuration looks good to me. I see two possible aspects that you might want to consider:
- training data size: BLIP is trained mostly on short captions. It could be the case that finetuning BLIP on 5k samples for 20 epochs is not enough to shift its behaviours towards generating long captions. If possible, you might want to supplement your training data.
- BERT capacity. BERT was not initially trained for text generation. If you observe that even for training data, long captions cannot be well fitted even after some over-fitting experiments, then this might be the case. In this case, you might want to consider other language models, such as GPT, yet re-train the VL model may be required.
By the way, just curious what is the data you are using? Is it like a collection of images described by a paragraph each?
These are just my guess. Please feel welcome to discuss.
Thanks.
from lavis.
Thanks @dxli94 for the quick answer and your meaningful suggestions! I will try to increase the training data size/number of epochs and, if that doesn't make it, to explore different language models more suitable for longer text generation.
About the data: yes, I'm using a collection of images described by a paragraph each. The images however are quite simple (compositions of abstract geometric shapes) and the captions very structured and repetitive, hence I was hoping that my current data would have been enough to fine-tune the model.
from lavis.
您好,感谢您为图书馆所做的出色工作!
我目前正在尝试在自定义数据集上微调 BLIP。我按照您关于自定义数据集生成的教程进行操作,并设置了所有必要的文件进行微调,一切都按预期进行。 我遇到的唯一问题是生成的字幕的最大长度。在我的训练配置文件中,此长度设置为 256,但模型永远不会生成长度超过约 50 个单词(平均大约 90 个标记)的标题。
我已经将 BERT 嵌入的大小增加到 256,并在这一行中对其进行硬编码:
并将默认 max_lengths 更改为 256:
和这里
我的训练配置文件如下所示
model: arch: blip_caption model_type: base_coco load_finetuned: False datasets: custom_caption: # name of the dataset builder vis_processor: train: name: "blip_image_train" eval: name: "blip_image_eval" text_processor: train: name: "blip_caption" prompt: "a picture of " eval: name: "blip_caption" run: task: captioning # optimizer lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-5 min_lr: 0 weight_decay: 0.05 max_epoch: 20 batch_size_train: 2 batch_size_eval: 8 num_workers: 1 max_len: 256 min_len: 5 num_beams: 3 seed: 42 output_dir: "output/BLIP/Caption_custom" amp: False resume_ckpt_path: null evaluate: False train_splits: ["train"] valid_splits: ["val"] test_splits: ["test"] device: "cuda" world_size: 1 dist_url: "env://" distributed: True我正在用 5000 个样本训练模型。您对我的微调配置中可能存在的错误或缺失有什么建议吗?我应该为优化器使用不同的参数吗?使用 BLIP 是否可以生成这么长的字幕?
谢谢!
Hi, brother!
May I ask if you also do image captions? I also want to use blip2 to generate image caption for my dataset. Have you implemented it? What is the quality of his captions for image generation? Do you need to make minor adjustments?
from lavis.
Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.
from lavis.
Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.
If you want to generate longer sentences, you can try using the llava model. I have tried using the text generated by his demo
from lavis.
Related Issues (20)
- Input of multiple images
- how use it output target class。
- how to deal with “Missing keys ” HOT 1
- huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: ''.
- How should I use blip2 for vqa task training? HOT 3
- XInstructBLIP demo text generation
- salesforce-lavis 1.0.2 requires transformers<4.27,>=4.25.0, but you have transformers 4.40.0 which is incompatible.
- Use BLIP-2 for Image Captioning HOT 1
- safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge HOT 2
- Loss calculation across GPUs using all_gather_with_grad function
- what should be samples["text_output"] during finetuning HOT 3
- AttributeError: 'NoneType' object has no attribute 'from_pretrained' HOT 1
- Blip2-caption only generate "a photo of"
- About Text Preprocessing of InstructBLIP
- How are Learned Queries generated? What line of code is implemented?
- CUstom dataset Inference
- How is `Total Params` calculated?
- cache_version Value Error
- How can I change cache_root?
- generated output error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lavis.