View Code? Open in Web Editor NEW

Forked CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

License: MIT License

Shell 0.18% JavaScript 0.02% C++ 0.26% Python 91.91% C 0.01% Java 0.02% CSS 0.01% Cuda 0.47% Makefile 0.01% HTML 0.01% CMake 0.01% Jupyter Notebook 7.05% Cython 0.01% Dockerfile 0.04% Jsonnet 0.01%

covlm's Introduction

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

This repository contains the official code for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding.

[Project Page] [Paper]

News and ToDo List

Installation

conda create -n covlm python=3.9
conda activate covlm
# CUDA 10.2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
# CUDA 11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
# CUDA 11.6
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -e transformers/
pip install -e YOLOX/
pip install -r requirements.txt
pip install -e .
python -m spacy download en_core_web_md

Checkpoint

Model	vision encoder	LLM	Checkpoint
CoVLM-1.4B	ViT-L-14	pythia-1.4b	Hugging Face
CoVLM-2.8B	ViT-L-14	pythia-2.8b	Hugging Face

Evaluation

Prepare evaluation datasets

RefCOCO/RefCOCOg/RefCOCOplus

bash eval_refcocog.sh CHECKPOINT

Cola

bash eval_cola.sh CHECKPOINT

ARO

bash eval_aro.sh CHECKPOINT

VQAv2

bash eval_vqav2.sh CHECKPOINT

More tasks will be available soon

Citation

If our work is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@misc{li2023covlm,
      title={CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding}, 
      author={Junyan Li and Delin Chen and Yining Hong and Zhenfang Chen and Peihao Chen and Yikang Shen and Chuang Gan},
      year={2023},
      eprint={2311.03354},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Recommend Projects

jiachen2cc / covlm Goto Github PK

covlm's Introduction

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

News and ToDo List

Installation

Checkpoint

Evaluation

Prepare evaluation datasets

RefCOCO/RefCOCOg/RefCOCOplus

Cola

ARO

VQAv2

More tasks will be available soon

Citation

covlm's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs