Comments (7)
@theadamsabra if not, you are more than welcome to take it up
from multimodal.
@ebsmothers thanks! If I don't get a response by tomorrow I'll just pick it up myself
from multimodal.
Thanks for your answer @ebsmothers, I would like to add the model to torchmultimodal/models
first.
from multimodal.
That sounds reasonable to me. We already have CLIP visual encoders in the library here, so feel free to reuse those. Then the bulk of the work for the model should be to add the LLM. A couple pointers to help with that: TransformerDecoderLayer, RMSNorm. We also have an open PR for rotary positional embeddings (#450) that might be useful. Let me know if this makes sense, happy to provide more details as needed.
from multimodal.
Nice ! I'll come back to you with more questions later, not sure I'll start working on it this week.
from multimodal.
Hi @youssefadr, thanks for opening this issue. LLaVA is definitely something we're interested in adding and we would be happy to have you contribute. Is there a specific portion of the model you're especially interested in helping out with?
from multimodal.
@youssefadr have you worked on this to any capacity? i'm interested in picking this up if not
from multimodal.
Related Issues (20)
- Albef model dataset & caption file HOT 3
- Linear probing on vision tasks HOT 6
- Fine-tuning and scaling up blog post? HOT 2
- Clip model sample training code HOT 3
- Use CLIP models with pretrained weights HOT 1
- ALBEF: Train from scratch HOT 2
- Incremental addition of the new modality HOT 2
- Tutorial/reference to finetune FLAVA on custom dataset HOT 1
- [FLAVA]Can't Access ImageNet HOT 2
- training flava with ddp and activation checkpointing gives runtime error HOT 1
- [DOCUMENTATION] Fix FLAVA example's link in the page Introducing Trochmultimodal HOT 3
- Support for CoCa Model HOT 4
- OOM while finetuning flava HOT 1
- Training log file for flava full HOT 3
- How to perform multimodal multitask instance segmentation in torchmultimodal? HOT 4
- Train diffusion on MNIST HOT 2
- mini-imageNet HOT 1
- CoCa model implementation HOT 1
- Cannot fine-tune CLIP model in GPU HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multimodal.