Topic: llava Goto Github

Some thing interesting about llava

👇 Here are 125 public repositories matching this topic...

apocas / restai

llava,RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex, Ollama and HF Pipelines. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama. Precise embeddings usage and tuning.

User: apocas

Home Page: https://apocas.github.io/restai/

embeddings langchain llm openai python fastapi rag llama openaiapi llamaindex llava stable-diffusion transformers ollama

ashleykleynhans / llava-docker

llava,Docker image for LLaVA: Large Language and Vision Assistant

User: ashleykleynhans

ai chatbot chatgpt docker docker-image foundation-models gpt-4 instruction-tuning llama llama-2

avisoori1x / seemore

llava,From scratch implementation of a vision language model in pure PyTorch

User: avisoori1x

large-language-models llm multimodal multimodal-large-language-models pytorch pytorch-implementation vision-language-model deep-learning neural-networks artificial-intelligence

awaescher / ollamasharp

llava,Ollama API bindings for .NET

User: awaescher

Home Page: https://www.nuget.org/packages/OllamaSharp

llm ollama ollama-api gemma lama llama llama2 llama3 llava mistral

blaizzy / mlx-vlm

llava,MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

User: blaizzy

llava llm mlx vision-transformer apple-silicon idefics local-ai paligemma vision-framework vision-language-model

blib-la / captain

llava,Give your computer an AI Brain

Organization: blib-la

Home Page: https://get-captain.com

ai artificial-intelligence captioning-images datasets generative-ai human-in-the-loop llava llm lora model-training sdk sdxl stable-diffusion toolkit

chenking2020 / findthechatgpter

llava,ChatGPT爆火，开启了通往AGI的关键一步，本项目旨在汇总那些ChatGPT的开源平替们，包括文本大模型、多模态大模型等，为大家提供一些便利

User: chenking2020

chatglm llama belle vicuna chatgpt alpaca guanaco lora llava minigpt4

developersdigest / ai-devices

llava,AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more

User: developersdigest

Home Page: https://developersdigest.tech

groq gpt-4-vision llama3 llm openai tts whisper function-calling langchain langsmith llava serper

eliranwong / freegenius

llava,FreeGenius AI, an advanced AI assistant that can talk and take multi-step actions. Supports numerous open-source LLMs via Llama.cpp or Ollama or Groq Cloud API, with optional integration with AutoGen agents, OpenAI API, Google Gemini Pro and unlimited plugins.

User: eliranwong

Home Page: https://letmedoit.ai

ai autogen chatgpt gemini google ollama openai vision mistral stable-diffusion

fanghua-yu / supir

llava,SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild

User: fanghua-yu

Home Page: http://supir.xpixel.group/

deep-learning diffusion-models llava sdxl stable-diffusion super-resolution restoration pytorch pytorch-lightning

fly-apps / ollama-open-webui

llava,Self-host a ChatGPT-style web interface for Ollama 🦙

Organization: fly-apps

Home Page: https://fly.io/docs/gpus/

gpu ollama ollama-webui llama3 mixtral gemma llava mistral ai

fuxiaoliu / lrv-instruction

llava,[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

User: fuxiaoliu

Home Page: https://fuxiaoliu.github.io/LRV/

evaluation gpt-4 hallucination object-detection vision vqa llama vicuna llava gpt multimodal prompt-engineering chatgpt evaluation-metrics foundation-models vision-and-language iclr iclr2024

fuxiaoliu / mmc

llava,[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning

User: fuxiaoliu

arxiv benchmark chart dataset gpt instruction-tuning llava minigpt4 mplug-owl multimodal

gbaptista / ollama-ai

llava,A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally.

User: gbaptista

Home Page: https://rubygems.org/gems/ollama-ai

ai alpaca dolphin llama llama2 llm mistral mistral-ai mixtral nano-bots

gokayfem / awesome-vlm-architectures

llava,Famous Vision Language Models and Their Architectures

User: gokayfem

clip llava vlm image-encoder text-encoder multimodal blip cogvlm internlm kosmos

gokayfem / comfyui_vlm_nodes

llava,Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

User: gokayfem

nodes comfyui custom-nodes llava llm siglip phi15 img2text joytag image-captioning mllm vlm img2sfx

haotian-liu / llava

llava,[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

User: haotian-liu

Home Page: https://llava.hliu.cc

gpt-4 chatbot chatgpt llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning

herrera-luis / vision-core-ai

llava,Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.

User: herrera-luis

llamacpp llava whisper-ai bakllava

internlm / xtuner

llava,An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Organization: internlm

Home Page: https://xtuner.readthedocs.io/zh-cn/latest/

baichuan chatglm2 internlm large-language-models llama2 llm llm-training peft qwen chatbot conversational-ai supervised-finetuning agent chatglm3 msagent llava mixtral llama3 phi3

jakobdylanc / discord-llm-chatbot

llava,llmcord.py • Talk to LLMs with your friends!

User: jakobdylanc

chatgpt gpt openai gpt-4 discord chatbot llm llmcord streamed bot

jhc13 / taggui

llava,Tag manager and captioner for image datasets

User: jhc13

image-captioning image-tagging pyside6 stable-diffusion tag-manager llava cogvlm moondream

kwaivgi / uniaa

llava,Unified Multi-modal IAA Baseline and Benchmark

Organization: kwaivgi

benchmark dataset image-aesthetic-assessment llava mllm

lostxine / llara

llava,LLaRA: Large Language and Robotics Assistant

User: lostxine

llava robotics vlm instruction-tuning self-supervised-learning behavioral-cloning

mbzuai-oryx / llava-pp

llava,🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Organization: mbzuai-oryx

conversation llama3 llava llm lmms phi3 vision-language llama-3-llava llama-3-vision llama3-llava

mbzuai-oryx / video-chatgpt

llava,[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Organization: mbzuai-oryx

Home Page: https://mbzuai-oryx.github.io/Video-ChatGPT

chatbot clip gpt-4 llama llava mulit-modal vicuna vision-language vision-language-pretraining video-chatboat

mbzuai-oryx / videogpt-plus

llava,Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Organization: mbzuai-oryx

chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini

mgonzs13 / llama_ros

llava,llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

User: mgonzs13

cpp ggml gguf gpt llama llamacpp llava llavacpp llm ros2 vlm

modelscope / data-juicer

llava,A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

Organization: modelscope

data-analysis data-science dataset large-language-models llm nlp chinese data-visualization opendata gpt

modelscope / swift

llava,ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Organization: modelscope

Home Page: https://swift.readthedocs.io/zh-cn/latest/

agent llm lora llama pre-training sft deploy multimodal dpo qwen

mu-cai / matryoshka-mm

llava,Matryoshka Multimodal Models

User: mu-cai

Home Page: https://matryoshka-mm.github.io/

chatb llama llava llava-next matryoshka multimodal multimodal-large-language-models

notune / captcha-solver

llava,basic google recaptcha solver using llava-v1.6-7b

User: notune

ai captcha captcha-solver llava ml opencv python python3

ollama / ollama

llava,Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

Organization: ollama

Home Page: https://ollama.com

llama llm llama2 llms go golang ollama mistral gemma llama3

open-compass / vlmevalkit

llava,Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Organization: open-compass

Home Page: https://huggingface.co/spaces/opencompass/open_vlm_leaderboard

gpt-4v large-language-models llava multi-modal openai vqa llm openai-api qwen gpt

paddlepaddle / paddlemix

llava,Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Organization: paddlepaddle

aigc stable-diffusion blip2 clip minigpt4 image-to-text text-to-image ppdiffusers controlnet multimodal

rlhf-v / rlaif-v

llava,RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

User: rlhf-v

chatbot gpt-4v multimodal vision-language-learning rlaif-v llava llava-next minicpm-v

roboflow / multimodal-maestro

llava,Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Organization: roboflow

Home Page: https://maestro.roboflow.com

lmm multimodality segment-anything instance-segmentation object-detection gpt-4 gpt-4-vision llava prompt-engineering visual-prompting

salt-nlp / llavar

llava,Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Organization: salt-nlp

Home Page: https://llavar.github.io/

chatgpt gpt-4 llava multimodal ocr instruction-tuning vision-and-language chatbot

scisharp / llamasharp

llava,A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Organization: scisharp

Home Page: https://scisharp.github.io/LLamaSharp

chatbot gpt llama llamacpp llm semantic-kernel llava multi-modal llama2 llama3

skalskip / awesome-foundation-and-multimodal-models

llava,👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

User: skalskip

blip clip foundational-models grounding-dino llava multimodal segment-anything computer-vision nlp open-vocabulary-detection

sshh12 / multi_token

llava,Embed arbitrary modalities (images, audio, documents, etc) into large language models.

User: sshh12

large-language-models llava large-multimodal-models multi-modality multimodal vision-language-model large-context llm

thomas-yanxin / karmavlm

llava,🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

User: thomas-yanxin

llama2 llava qwen2 vlm vision-language-model visual-language-learning multimodel

tianyi-lab / hallusionbench

llava,[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Organization: tianyi-lab

benchmark vlms gpt-4 gpt-4v llava benchmarks hallucination llm lmm large-language-models

tinyllava / tinyllava_factory

llava,A Framework of Small-scale Large Multimodal Models

User: tinyllava

Home Page: https://arxiv.org/abs/2402.14289

large-multimodal-models llama llava nlp tinyllama transformers vision-language

tosiyuki / llava-jp

llava,LLaVA-JP is a Japanese VLM trained by LLaVA method

User: tosiyuki

llava llm python

trzy / llava-cpp-server

llava,LLaVA server (llama.cpp).

User: trzy

llama llama2 llava llm multimodal vision-transformer

unum-cloud / uform

llava,Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Organization: unum-cloud

Home Page: https://unum-cloud.github.io/uform/

huggingface-transformers language-vision multimodal pytorch semantic-search transformer cross-attention vector-search bert neural-network