🚀 The feature, motivation and pitch LLaVA seems to be currently a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for your answer <a class="user-mention notranslate" data-hovercard-type="user"

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add support for LLaVA model about multimodal HOT 7 OPEN

facebookresearch commented on July 21, 2024

Add support for LLaVA model

from multimodal.

Comments (7)

ebsmothers commented on July 21, 2024 2

@theadamsabra if not, you are more than welcome to take it up

from multimodal.

theadamsabra commented on July 21, 2024 2

@ebsmothers thanks! If I don't get a response by tomorrow I'll just pick it up myself

from multimodal.

youssefadr commented on July 21, 2024 1

Thanks for your answer @ebsmothers, I would like to add the model to torchmultimodal/models first.

from multimodal.

ebsmothers commented on July 21, 2024 1

That sounds reasonable to me. We already have CLIP visual encoders in the library here, so feel free to reuse those. Then the bulk of the work for the model should be to add the LLM. A couple pointers to help with that: TransformerDecoderLayer, RMSNorm. We also have an open PR for rotary positional embeddings (#450) that might be useful. Let me know if this makes sense, happy to provide more details as needed.

from multimodal.

youssefadr commented on July 21, 2024 1

Nice ! I'll come back to you with more questions later, not sure I'll start working on it this week.

from multimodal.

ebsmothers commented on July 21, 2024

Hi @youssefadr, thanks for opening this issue. LLaVA is definitely something we're interested in adding and we would be happy to have you contribute. Is there a specific portion of the model you're especially interested in helping out with?

from multimodal.

theadamsabra commented on July 21, 2024

@youssefadr have you worked on this to any capacity? i'm interested in picking this up if not

from multimodal.

Recommend Projects

Add support for LLaVA model about multimodal HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs