GithubHelp home page GithubHelp logo

niatzt / multilingual-clip Goto Github PK

View Code? Open in Web Editor NEW

This project forked from freddefrallan/multilingual-clip

0.0 0.0 0.0 8.48 MB

OpenAI CLIP text encoders for multiple languages!

Python 1.04% Shell 0.10% Jupyter Notebook 98.86%

multilingual-clip's Introduction


Multilingual-CLIP

OpenAI CLIP text encoders for any language

Colab Notebook · Pre-trained Models · Report Bug

Open In Colab

Overview

Alt text

OpenAI recently released the paper Learning Transferable Visual Models From Natural Language Supervision in which they present the CLIP (Contrastive Language–Image Pre-training) model. This model is trained to connect text and images, by matching their corresponding vector representations using a contrastive learning objective. CLIP consists of two separate models, a visual encoder and a text encoder. These were trained on a wooping 400 Million images and corresponding captions. OpenAI has since released a set of their smaller CLIP models, which can be found on the official CLIP Github.

We propose a fine-tuning to replace the original English text encoder with a pre-trained text model in any language. This method makes it possible to adapt the powerful CLIP model to any language in roughly 24 GPU hours.

This repository contains

  • Pytorch inference code
  • Tensorflow training code
  • Pre-trained CLIP-Text encoders for multiple languages
  • Training data and pre-computed CLIP text encodings for a large porton of the the image captions of GCC + MSCOCO + VizWiz

Requirements

While it is possible that other versions works equally fine, we have worked with the following:

  • Python = 3.6.9
  • Transformers = 4.1.1
  • Model Weights

Usage

Download CLIP Model
$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU. For more information please see the official CLIP repostitory.

Download Linear Weights
# Linear Model Weights
$ bash get-weights.sh

Inference

from src import multilingual_clip

print(multilingual_clip.AVAILABLE_MODELS.keys())

model = multilingual_clip.load_model('M-BERT-Distil-40')

embeddings = model(['Älgen är skogens konung!', 'Wie leben Eisbären in der Antarktis?', 'Вы знали, что все белые медведи левши?'])
print(embeddings.shape)
# Yields: torch.Size([3, 640])

For a more elaborate example, comparing the textual embeddings to the CLIP image embeddings see this colab notebook.

Pre-trained Models

Every text encoder is a Huggingface available transformer, with an additional linear layer on top. Neither of the models have been extensively tested, but for more information and qualitative test results for a specific model, click the Model Name to see its model card.

*** Make sure to update to the most recent version of the repostitory when downloading a new model, and re-run the shell script to download the Linear Weights. ***

Name Model Base Vision Model Pre-trained Languages Target Languages #Parameters
Multilingual
M-BERT Distil 40 M-BERT Distil RN50x4 101 Languages 40 Languages 66 M
M-BERT Base 69 M-BERT Base RN50x4 101 Languages 68 Languages 110 M
M-BERT Base ViT-B M-BERT Base ViT-B/32 101 Languages 68 Languages 110 M
Monolingual
Swe-CLIP 500k KB-BERT RN50x4 Swedish Swedish 110 M
Swe-CLIP 2M KB-BERT RN50x4 Swedish Swedish 110 M

Training a new model

This folder contains the code used for training the above models. If you wsh to train your own model you must do the following things:

  • Prepare a set of translated sentence pairs from English -> Your Language(s)
  • Compute regular CLIP-Text embeddings for the English sentences.
  • Edit Training.py to load your data.
  • Train a new CLIP-Text encoder via Teacher Learning

Pre-computed CLIP Embeddings & Translaton Data

This Google Drive folder contains both pre-computed CLIP-Text Embeddings for a large porton of the the image captions of GCC + MSCOCO + VizWiz.

The Google Drive folder also contains the translation data used to train the currently available models. Good Luck

Contribution

If you have trained a CLIP Text encoder specific to your language, or another model covering a language not supported here, Please feel free to contact us and we will either upload your model and credit you, or simply link to your already uploaded model.

Contact

If you have questions regarding the code or otherwise related to this Github page, please open an issue.

For other purposes, feel free to contact me directly at: [email protected]

Acknowledgements

License

Distributed under the MIT License. See LICENSE for more information.

multilingual-clip's People

Contributors

freddefrallan avatar ekgren avatar mrm8488 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.