Comments (11)
We just released a set of CLIPPO checkpoints. Please refer to the readme for details and check out the colab to use the checkpoints.
from big_vision.
Tagging @zzhanghub @Adonis-galaxy @jianghaojun @nahidalam for visibility.
Could someone with permission please close this issue (it seems I can't close it myself).
from big_vision.
Hi @yukang123, we did not plan to release additional checkpoints.
I could look into training one L/16 model for release, for example one with ImageNet21k init, trained on YFCC-100M + 25%C4 data. This one might improve a bit over the released corresponding B/16 model, but generally the models trained on YFCC-100M do not perform as well as the main models in the paper trained on WebLI. Let me know if such an L/16 model could be interesting for your use case.
from big_vision.
Looking forward to the release of CLIPPO checkpoints too~
from big_vision.
from big_vision.
We're looking into it, but I can't promise a strict timeline. Near term we will likely only be able to release checkpoints trained on the same data sets as the released LiT models (CC12M and/or YFCC100M).
from big_vision.
+1
from big_vision.
Hi @mitscha I am working on a distillation problem and CLIPPO model checkpoint will be really useful. Looking forward to it.
from big_vision.
Tagging @zzhanghub @Adonis-galaxy @jianghaojun @nahidalam for visibility. Could someone with permission please close this issue (it seems I can't close it myself).
Thank you very much!
from big_vision.
Hi all,
I saw multiple checkpoints of ViT-B/16 models have been released. I am wondering if you plan to release the checkpoints of ViT models of other scales, such as ViT-H-14, ViT-L. The pretrained ViT-H model seems to be more suitable for our research on the downstream image generation task. I would appreciate it if you could share these pretrained checkpoints. That would help a lot!
@mitscha
Thanks!
from big_vision.
@mitscha Thanks for your reply!
I am currently using the released checkpoints of stable diffusion v2, which use CLIP text encoder (the corresponding image encoder is ViT-H-14) to generate the text embedding of length 1024, for AIGC tasks.
I would like to combine the image embedding generated by CLIP image encoder with the text embedding. It would bring less uncertainty on the training if the dimension of image embedding matches the text embedding (i.e., 1024) because I do not need to train another full-connected layer to transform the features before concatenation.
Besides, the current task I've been working could be inspired by the idea of CLIPPO about using the images with text rendered on them. Thus, it would be very helpful for my research if I could have opportunities to transfer the released CLIPPO checkpoints onto my task. A ViT-H-14 pretrained CLIPPO model would be more suitable for my use case. If such checkpoints would be not available, could you please give me some suggestions on how to transform the dimension of image embedding without dampening the strengths of pretrained CLIPPO model?
Thanks for your understanding! Appreciate it!
from big_vision.
Related Issues (20)
- Error with putting arrays on CPU in cloud TPUs HOT 1
- How to save fine tuned PaliGemma model? HOT 3
- [QUESTION] How to perform inference on trained model? HOT 8
- Can I convert paligemma npz model to pytorch to safetensors? HOT 3
- Implementation of contrast() seems wrong
- Behavior of `solarize()` depends on integer overflow
- No ChartQA ft weights released on Kaggle HOT 1
- PlaiGemma finetuned model HOT 1
- Loss Scale for Training Siglip
- [Question] How to inference / captioning short video? HOT 4
- Question about the evaluation on instance segmentation about PaliGemma
- Could you please give me an instruction on how to use paligemma's evaluator?
- What's a "`rezied` method"?
- Negative rho values in GSAM training HOT 1
- FlexiVit is also flexible with image resolution?
- Load ViT with CLIPPO Weights HOT 2
- Mixup Per Example?
- Clarification: SigLIP Image Transform
- Errors in notebooks
- Confusion on FlexiViT
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from big_vision.