GithubHelp home page GithubHelp logo

jclip's Introduction

JCLIP

[Blog] [Paper]

JCLIP为CLIP的Jittor版本,CLIP(Contrastive Language-Image Pre-Training)是一个在各种(图像、文本)对上训练的神经网络。可以用自然语言指示它在给定图像的情况下预测最相关的文本片段,而无需直接对任务进行优化,这与 GPT-2和3的zero-shot功能类似。

网络结构

CLIP

使用方法

安装依赖环境

pip install jittor
pip install ftfy regex tqdm
python setup.py develop

模型权重

下载VIT-B-32或利用转换脚本,将PyTorch权重转换为Jittor权重。

import torch
import jittor as jt
clip = torch.load('ViT-B-32.pt').state_dict()

for k in clip.keys():
    clip[k] = clip[k].float().cpu()
jt.save(clip, 'ViT-B-32.pkl')

demo

import jittor as jt
import jclip as clip
from PIL import Image

jt.flags.use_cuda = 1

model, preprocess = clip.load("ViT-B-32.pkl")

image = preprocess(Image.open("CLIP.png")).unsqueeze(0)

text = clip.tokenize(["a diagram", "a dog", "a cat"])

with jt.no_grad():
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).numpy()

print("Label probs:", probs)  # prints: [[0.9927937  0.00421068 0.00299572]]

第四届计图比赛Baseline

  • Training-Free版本
python baseline.py
  • Training版本
python baseline_ft.py

得到result.txt,打包为zip,提交即可。

jclip's People

Contributors

uyzhang avatar

Stargazers

Zheng-Yuan Xie avatar  avatar  avatar rx-Li avatar 周家未 avatar  avatar  avatar Meo avatar  avatar  avatar Yongchuan Cui avatar  avatar

Watchers

 avatar

jclip's Issues

use

请问如果jittor是使用ubuntu的docker安装的,这个base怎么使用呢

baseline_ft版本分类结果全部为相同的值

运行baseline之后,结果正常,且分类结果符合视觉感受;但是运行baseline_ft之后,所有测试图片的分类结果的前5个类别ID,都是相同的值,如下图所示。
image

module 'jittor' has no attribute 'triu_'

在运行demo.py的时候报错:
Traceback (most recent call last):
File "demo.py", line 7, in
model, preprocess = clip.load("ViT-B-32.pkl")
File "/data1/mcy/code/DailyProject/jittor-race-few-shot/baseline/JCLIP-main/jclip/clip.py", line 159, in load
model = build_model(state_dict)
File "/data1/mcy/code/DailyProject/jittor-race-few-shot/baseline/JCLIP-main/jclip/model.py", line 279, in build_model
transformer_width, transformer_heads, transformer_layers)
File "/data1/mcy/code/DailyProject/jittor-race-few-shot/baseline/JCLIP-main/jclip/model.py", line 160, in init
attn_mask=self.build_attention_mask())
File "/data1/mcy/code/DailyProject/jittor-race-few-shot/baseline/JCLIP-main/jclip/model.py", line 193, in build_attention_mask
mask = jt.triu_(mask, 1) # zero out the lower diagonal
AttributeError: module 'jittor' has no attribute 'triu_'
之前jittor框架搭建里面的所有都正常运行

clip.load("ViT-B-32.pkl") 报错

Traceback (most recent call last):
File "baseline.py", line 15, in
model, preprocess = clip.load("ViT-B-32.pkl")
File "/root/contest/JCLIP-main/jclip/clip.py", line 159, in load
model = build_model(state_dict)
File "/root/contest/JCLIP-main/jclip/model.py", line 279, in build_model
transformer_width, transformer_heads, transformer_layers)
File "/root/contest/JCLIP-main/jclip/model.py", line 155, in init
output_dim=embed_dim)
File "/root/contest/JCLIP-main/jclip/model.py", line 99, in init
self.transformer = Transformer(width, layers, heads)
File "/root/contest/JCLIP-main/jclip/model.py", line 73, in init
for _ in range(layers)
File "/root/contest/JCLIP-main/jclip/model.py", line 73, in
for _ in range(layers)
File "/root/contest/JCLIP-main/jclip/model.py", line 47, in init
self.attn = MultiheadAttention(d_model, n_head)
File "/root/contest/JCLIP-main/jclip/mha.py", line 519, in init
self.in_proj_bias = jt.empty(3 * embed_dim, **factory_kwargs)
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.ops.empty)).

Types of your inputs are:
self = module,
args = (int, ),
kwargs = {dtype=builtin_function_or_method, },

The function declarations are:
VarHolder* empty(NanoVector shape, NanoString dtype=ns_float32)

Failed reason:[f 0714 20:31:21.807070 08 pyjt_jit_op_maker.cc:18171] Not a valid call.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.