GithubHelp home page GithubHelp logo

jiachen2cc / covlm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from umass-foundation-model/covlm

0.0 0.0 0.0 62.68 MB

Forked CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

License: MIT License

Shell 0.18% JavaScript 0.02% C++ 0.26% Python 91.91% C 0.01% Java 0.02% CSS 0.01% Cuda 0.47% Makefile 0.01% HTML 0.01% CMake 0.01% Jupyter Notebook 7.05% Cython 0.01% Dockerfile 0.04% Jsonnet 0.01%

covlm's Introduction

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

1699293310655

This repository contains the official code for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding.

[Project Page] [Paper]

News and ToDo List

  • Release training scripts
  • Release pre-training dataset
  • Release demo
  • 2023-11-1: Release 1.4B/2.8B checkpoint
  • 2023-11-1: Release initial code

Installation

conda create -n covlm python=3.9
conda activate covlm
# CUDA 10.2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
# CUDA 11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
# CUDA 11.6
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -e transformers/
pip install -e YOLOX/
pip install -r requirements.txt
pip install -e .
python -m spacy download en_core_web_md

Checkpoint

Model vision encoder LLM Checkpoint
CoVLM-1.4B ViT-L-14 pythia-1.4b Hugging Face
CoVLM-2.8B ViT-L-14 pythia-2.8b Hugging Face

Evaluation

Prepare evaluation datasets

RefCOCO/RefCOCOg/RefCOCOplus

bash eval_refcocog.sh CHECKPOINT

Cola

bash eval_cola.sh CHECKPOINT

ARO

bash eval_aro.sh CHECKPOINT

VQAv2

bash eval_vqav2.sh CHECKPOINT

More tasks will be available soon

Citation

If our work is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@misc{li2023covlm,
      title={CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding}, 
      author={Junyan Li and Delin Chen and Yining Hong and Zhenfang Chen and Peihao Chen and Yikang Shen and Chuang Gan},
      year={2023},
      eprint={2311.03354},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

covlm's People

Contributors

senfu avatar eltociear avatar compositionalvlm avatar jiachen2cc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.