GithubHelp home page GithubHelp logo

theocoombes / clipcap Goto Github PK

View Code? Open in Web Editor NEW
95.0 6.0 15.0 94.97 MB

Using pretrained encoder and language models to generate captions from multimedia inputs.

Python 99.68% Shell 0.32%
audio-captioning encoder-decoder image-captioning language-model vision-transformer vqa

clipcap's Introduction

ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs, allowing high fidelity text generation using the rich textual detail already learned by pretrained LMs on tasks such as image captioning, VQA, audio captioning and more.

More details and results to come soon.

Installation

By default, the encoders remained uninstalled for ease of access. View the data preprocessing documentation for info on how to install these.

pip install git+https://github.com/TheoCoombes/ClipCap.git

Supported Encoders

  • CLIP for tasks such as Image Captioning, VQA etc.
  • CLAP for tasks such as Audio Captioning, Audio Question Answering, etc.

You can run the data preprocess script using the command below. (More info)

python3 -m clipcap.preprocess --help

You can run the training script using preprocessed data with the command below. (More info)

python3 -m clipcap.train --help

Acknowledgments

This repository is heavily based on @rmokady's original implementation of ClipCap and also contains modified versions of @rom1504's clip-inference and embedding-reader libraries. Many thanks to both for their amazing work :)

TODO

Improved documentation and eval + inference scripts to come soon.

clipcap's People

Contributors

andreaskoepf avatar igor0 avatar rom1504 avatar theocoombes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

clipcap's Issues

inference

hello,thx for your codes,can you share the packages version and the pretrained model,thx

Evaluation using pre-trained model

Hello,

I love your work, really impressive stuff! I'm working on something similar and was wondering if you might have a pretrained model I could play around with for some basic tests.

Thanks!

minimal usage instruction

Install with pip install ClipClap

use with

import clipclap

model = clipclap.load_pretrained()
text = clipclap.generate(PIL.open("https://some/img"))

text = clipclap.generate(my_clip_embedding)

(this is an example, please tune the API as you like)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.