rinnakk / japanese-clip Goto Github PK
View Code? Open in Web Editor NEWJapanese CLIP by rinna Co., Ltd.
Home Page: https://huggingface.co/rinna
License: Apache License 2.0
Japanese CLIP by rinna Co., Ltd.
Home Page: https://huggingface.co/rinna
License: Apache License 2.0
I tried to calculate clip loss using this code.
from PIL import Image
import torch
import japanese_clip as ja_clip
device = "cuda" if torch.cuda.is_available() else "cpu"
# ja_clip.available_models()
# ['rinna/japanese-clip-vit-b-16', 'rinna/japanese-cloob-vit-b-16']
model, preprocess = ja_clip.load("rinna/japanese-clip-vit-b-16", cache_dir="/tmp/japanese_clip", device=device)
tokenizer = ja_clip.load_tokenizer()
image = preprocess(Image.open("./img/dog.jpeg")).unsqueeze(0).to(device)
encodings = ja_clip.tokenize(
texts="象",
max_seq_len=77,
device=device,
tokenizer=tokenizer, # this is optional. if you don't pass, load tokenizer each time
)
with torch.no_grad():
res = model(input_ids=encodings['input_ids'], pixel_values=image, return_loss=True)
print("clip loss:", res.loss.item())
but, no matter how many times I change the image, always clip loss return 0.
If the usage seems to be wrong, would you tell me how to use it correctly?
It was mentioned in requirements.txt sentencepiece >=0.1.91 and <=0.1.94 but there is no wheel for silicon chip based macos for those versions of sentencepiece library and when ever I am trying to run
$ pip install git+https://github.com/rinnakk/japanese-clip.git
I am getting an error as shown in below screeen shot.
But i downloaded the zip file and changed sentencepiece version to latest in requirements.txt and then i tried to make installations from that folder then i was able to download all the required dependencies and installed everything required for my model.
Thank you very much for great work.
I would like to know how Japanese CC12M was generated. Did you translate it by your own machine translation model, or any other public services?
Hi,
Awesome repo
Could you please share the imagenet class names / prompt you used for zero shot evaluation?
I'm trying to build a good evaluation framework for clip (https://github.com/LAION-AI/CLIP_benchmark) and having these class names in other languages would be valuable
Thanks for your help!
First of all, great work!!
I strongly believe this model has made big contribution to the Vision-and-Language community in Japan.
I find there is no description about the initialization of the vision encoder in CLIP/CLOOB.
Did you use some pre-trained weights available in HuggingFace, or just randomly initialize and train it from scratch?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.