Topic: vlm Goto Github

Some thing interesting about vlm

👇 Here are 76 public repositories matching this topic...

adithya-s-k / yologemma

vlm,Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detection and segmentation.

User: adithya-s-k

gemma paligemma vlm

alwinw / katzplotkinpy

vlm,Python companion to Low Speed Aerodynamics by Joseph Katz and Allen Plotkin

User: alwinw

aerodynamics low-speed-aerodynamics mit-license panel-methods python vlm vortex-lattice-method

andreagalle / labcalaero-20

vlm,Computational Aerodynamics Lab

User: andreagalle

aerospace aerodynamics cfd vlm matlab

vlm,The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Organization: baai-agents

Home Page: https://baai-agents.github.io/Cradle/

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models llm lmm

baai-agents / gpa-lm

vlm,This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".

Organization: baai-agents

agent-framework agents ai awesome-list gameai gameplay games gcc general-computer-control generative-ai

baai-dcai / bunny

vlm,A family of lightweight multimodal models.

User: baai-dcai

mllm chatgpt gpt-4 multimodal-large-language-models vlm chinese english

baaivision / densefusion

vlm,DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Organization: baaivision

Home Page: https://huggingface.co/datasets/BAAI/DenseFusion-1M

image-descriptions mllm multimodal-large-language-models vision-language-models visual-perception vlm

baaivision / eve

vlm,EVE: Encoder-Free Vision-Language Models from BAAI

Organization: baaivision

instruction-following large-language-models mllm multimodal-large-language-models vlm encoder-free-vlm llm clip vision-language-models

batsresearch / alfred

vlm,A system for prompted weak supervision.

Organization: batsresearch

llm prompting programmatic-weak-supervision weak-supervision vlm annotation-tool data data-annotation

byuflowlab / flowvlm

vlm,Vortex lattice method for inviscid lifting-surface aerodynamics

Organization: byuflowlab

aerodynamics aircraft vortex-lattice vlm

camurban / pterasoftware

vlm,Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.

User: camurban

unmanned-aerial-vehicle ornithopter ornithology aerospace aerodynamics aerospace-engineering aeronautics computational-biology computational-fluid-dynamics vortex-lattice-method

dlr-ae / panelaero

vlm,An Implementation of the Vortex Lattice (VLM) and the Doublet Lattice Method (DLM) for aeroelasticity.

Organization: dlr-ae

python aircraft-design vlm dlm aeordynamics aeroelasticity

erfanshayegani / jailbreak-in-pieces

vlm,[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

User: erfanshayegani

Home Page: https://iclr.cc/virtual/2024/poster/17767

ai-safety alignment llm vision-language-models vlm cross-modality-safety-alignment multi-modal-models

flycole / dream2real

vlm,[ICRA 2024] Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models

User: flycole

Home Page: https://www.robot-learning.uk/dream2real

computer-vision manipulation nerf robotics vlm

foundation-multimodal-models / conbench

vlm,Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".

Organization: foundation-multimodal-models

benchmark consistency gpt-4o vlm qwen-vl-max

godotmisogi / aerofuse.jl

vlm,A toolbox meant for aircraft design analyses.

User: godotmisogi

Home Page: https://godotmisogi.github.io/AeroFuse.jl/

mdao vlm aerospace design julia

gokayfem / awesome-vlm-architectures

vlm,Famous Vision Language Models and Their Architectures

User: gokayfem

clip llava vlm image-encoder text-encoder multimodal blip cogvlm internlm kosmos

gokayfem / comfyui_vlm_nodes

vlm,Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

User: gokayfem

nodes comfyui custom-nodes llava llm siglip phi15 img2text joytag image-captioning

haorand / awesome-embodied-ai

vlm,A curated list of awesome papers on Embodied AI and related research/industry-driven resources.

User: haorand

awesome awesome-list classification detection embodied embodied-agent embodied-ai embodied-artificial-intelligence embodied-cognition languange

hasindri / hlss

vlm,[MICCAI 2024 🔥] HLSS, the first study to explore hierarchical information inherent in histopathology images and their language descriptions for strong multi-modal representation learning

User: hasindri

histopathology vlm

hon-wong / elysium

vlm,[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

User: hon-wong

Home Page: https://hon-wong.github.io/Elysium/

benchmark dataset eccv2024 gpt mllm sot tracking vlm eccv visual-object-tracking

illuin-tech / vidore-benchmark

vlm,Vision Document Retrieval (ViDoRe): Benchmark 👀. Evaluation code for the "ColPali: Efficient Document Retrieval with Vision Language Models" paper.

Organization: illuin-tech

Home Page: https://huggingface.co/vidore

computer-vision large-language-models retrieval vision-language-model vlm search colpali

josefalbers / phi-3-vision-mlx

vlm,Phi-3 for Mac: Locally-run Vision and Language Models for Apple Silicon

User: josefalbers

agent api fine-tuning finetuning llm lora mac macos metal mlx multi-agent-systems multimodal phi-3 phi-3-mini phi-3-vision vlm

jrgenerative / fixed-wing-sim

vlm,Matlab implementation to simulate the non-linear dynamics of a fixed-wing unmanned areal glider. Includes tools to calculate aerodynamic coefficients using a vortex lattice method implementation, and to extract longitudinal and lateral linear systems around the trimmed gliding state.

Organization: jrgenerative

matlab flightgear simulation uav remote-control flight-simulator aerodynamics vlm glider simulink

krproject-tech / fsi_by_fem_and_uvlm

vlm,Fluid-Structure Interaction Analysis Using FEM and UVLM

User: krproject-tech

fluid-simulation matlab fluid-structure-interaction absolute-nodal-coordinate-formulation aerodynamics vlm vortex-lattice fem finite-element-method ancf

letitiabanana / pnp-ovss

vlm,[CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models

User: letitiabanana

segmentation training-free vlm

lostxine / llara

vlm,LLaRA: Large Language and Robotics Assistant

User: lostxine

llava robotics vlm instruction-tuning self-supervised-learning behavioral-cloning

mbodiai / embodied-agents

vlm,Seamlessly integrate state-of-the-art transformer models into robotics stacks

Organization: mbodiai

Home Page: https://mbodi.ai/

large-language-models llm robotics transformer vision-language-model vlm artificial-intelligence diffusion generative-ai agents

mbzuai-oryx / geochat

vlm,[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

Organization: mbzuai-oryx

Home Page: https://mbzuai-oryx.github.io/GeoChat

remote-sensing vlm

mgonzs13 / llama_ros

vlm,llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

User: mgonzs13

cpp gpt llama llm ros2 ggml gguf llamacpp llava vlm

niuzaisheng / screenagent

vlm,ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)

User: niuzaisheng

Home Page: https://arxiv.org/abs/2402.07945

agent ai llm vlm

opendilab / psydi

vlm,PsyDI: A MBTI agent that helps you understand your personality type through a relaxed multi-modal interaction.

Organization: opendilab

Home Page: https://psydi.opendilab.org.cn

chatbot mbti reinforcement-learning llm vlm

openm3d / m3dbench

vlm,M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts. Furthermore, M3DBench provides a new benchmark to assess large models across 3D vision-centric tasks.

Organization: openm3d

Home Page: https://m3dbench.github.io/

3d benchmark dataset instruction-tuning language-model multi-modal llm mlm prompt vlm

partmor / ezaero

vlm,ezaero - Easy aerodynamics in Python :airplane:

User: partmor

Home Page: https://ezaero.readthedocs.io

aerodynamics aeronautics vlm python

peterdsharpe / aerosandbox

vlm,Aircraft design optimization made fast through modern automatic differentiation. Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

User: peterdsharpe

Home Page: https://peterdsharpe.github.io/AeroSandbox/

aerodynamics airplane cfd mdo aircraft-design aerospace optimization vlm aerodynamic-analysis xfoil

progamergov / vlm-captioning-tools

vlm,Python scripts to use for captioning images with VLMs

User: progamergov

cogvlm image-captioning text-summarization vlm mistral vision-language llm llama3 moondream

s4mpl3r / okra

vlm,Okra, your all in one personal AI assistant

User: s4mpl3r

ai ai-assistant deepgram groq image-recognition llm machine-learning opencv python speech-recognition

shure-dev / awesome-llm-related-papers-comprehensive-topics

vlm,Awesome LLM-related papers and repos on very comprehensive topics.

User: shure-dev

Home Page: https://shorturl.at/bmuwC

agent agi awesome-list chainofthought chatgpt instruction-tuning llm llm-agent multimodal papers prompt-engineering rag reasoning reinforcement-learning robot robotics survey vlm vqa zero-shot

sid2697 / hoi-ref

vlm,Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

User: sid2697

Home Page: https://sid2697.github.io/hoi-ref/

dataset dataset-generation egocentric-vision hand-object-interaction large-language-models visual-language-models vlm

thomas-yanxin / karmavlm

vlm,🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

User: thomas-yanxin

llama2 llava multimodel qwen2 vision-language-model visual-language-learning vlm

thuccslab / awesome-lm-ssp

vlm,A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

User: thuccslab

Home Page: https://github.com/ThuCCSLab/Awesome-LM-SSP

adversarial-attacks awesome-list diffusion-models jailbreak language-model llm nlp privacy safety security

thuccslab / figstep

vlm,Jailbreaking Large Vision-language Models via Typographic Visual Prompts

User: thuccslab

gpt-4 jailbreak llm multi-modal safety security vlm

tidedra / vl-rlhf

vlm,A RLHF Infrastructure for Vision-Language Models

User: tidedra

dpo llm lmm mllm rlhf vlm

tiger-ai-lab / mantis

vlm,Official code for Paper "Mantis: Multi-Image Instruction Tuning"

Organization: tiger-ai-lab

Home Page: https://tiger-ai-lab.github.io/Mantis/

language vision fuyu llava-llama3 lmm mantis mllm video vlm multi-image-understanding

tobiaslee / vec

vlm,Visual and Embodied Concepts evaluation benchmark

User: tobiaslee

vlm emnlp2023

ucsc-vlaa / sight-beyond-text

vlm,This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

Organization: ucsc-vlaa

llama2 llava llm mllm vicuna vision-language ai-alignment alignment vlm