Topic: llava Goto Github
Some thing interesting about llava
Some thing interesting about llava
llava,RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex, Ollama and HF Pipelines. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama. Precise embeddings usage and tuning.
User: apocas
Home Page: https://apocas.github.io/restai/
llava,Docker image for LLaVA: Large Language and Vision Assistant
User: ashleykleynhans
llava,From scratch implementation of a vision language model in pure PyTorch
User: avisoori1x
llava,Ollama API bindings for .NET
User: awaescher
Home Page: https://www.nuget.org/packages/OllamaSharp
llava,MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
User: blaizzy
llava,Give your computer an AI Brain
Organization: blib-la
Home Page: https://get-captain.com
llava,ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
User: chenking2020
llava,AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
User: developersdigest
Home Page: https://developersdigest.tech
llava,FreeGenius AI, an advanced AI assistant that can talk and take multi-step actions. Supports numerous open-source LLMs via Llama.cpp or Ollama or Groq Cloud API, with optional integration with AutoGen agents, OpenAI API, Google Gemini Pro and unlimited plugins.
User: eliranwong
Home Page: https://letmedoit.ai
llava,SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild
User: fanghua-yu
Home Page: http://supir.xpixel.group/
llava,Self-host a ChatGPT-style web interface for Ollama 🦙
Organization: fly-apps
Home Page: https://fly.io/docs/gpus/
llava,[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
User: fuxiaoliu
Home Page: https://fuxiaoliu.github.io/LRV/
llava,Famous Vision Language Models and Their Architectures
User: gokayfem
llava,Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
User: gokayfem
llava,[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
User: haotian-liu
Home Page: https://llava.hliu.cc
llava,Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
User: herrera-luis
llava,An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Organization: internlm
Home Page: https://xtuner.readthedocs.io/zh-cn/latest/
llava,llmcord.py • Talk to LLMs with your friends!
User: jakobdylanc
llava,Tag manager and captioner for image datasets
User: jhc13
llava,LLaRA: Large Language and Robotics Assistant
User: lostxine
llava,🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Organization: mbzuai-oryx
llava,[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Organization: mbzuai-oryx
Home Page: https://mbzuai-oryx.github.io/Video-ChatGPT
llava,Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Organization: mbzuai-oryx
llava,A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Organization: modelscope
llava,ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Organization: modelscope
Home Page: https://swift.readthedocs.io/zh-cn/latest/
llava,Matryoshka Multimodal Models
User: mu-cai
Home Page: https://matryoshka-mm.github.io/
llava,Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Organization: open-compass
Home Page: https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
llava,Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Organization: paddlepaddle
llava,RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
User: rlhf-v
llava,Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Organization: roboflow
Home Page: https://maestro.roboflow.com
llava,Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Organization: salt-nlp
Home Page: https://llavar.github.io/
llava,A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Organization: scisharp
Home Page: https://scisharp.github.io/LLamaSharp
llava,👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
User: skalskip
llava,Embed arbitrary modalities (images, audio, documents, etc) into large language models.
User: sshh12
llava,🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
User: thomas-yanxin
llava,[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Organization: tianyi-lab
llava,A Framework of Small-scale Large Multimodal Models
User: tinyllava
Home Page: https://arxiv.org/abs/2402.14289
llava,LLaVA server (llama.cpp).
User: trzy
llava,Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Organization: unum-cloud
Home Page: https://unum-cloud.github.io/uform/
llava,[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Organization: wisconsinaivision
Home Page: https://vip-llava.github.io/
llava,An open-source implementation of LLaVA-NeXT.
User: xiaoachen98
llava,Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
User: yihedeng9
Home Page: https://stic-lvlm.github.io/
llava,VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
User: yxxxb
Home Page: https://yxxxb.github.io/VoCo-LLaMA-page/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.