jina-ai / clip-as-service Goto Github PK

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Home Page: https://clip-as-service.jina.ai

License: Other

Python 94.25% Shell 3.07% Dockerfile 2.68%

bert sentence-encoding deep-learning clip-model clip-as-service bert-as-service cross-modal-retrieval multi-modality neural-search openai

clip-as-service's Introduction

CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.

⚡ Fast: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS^[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks.

🫐 Elastic: Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing.

🐥 Easy-to-use: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding.

👒 Modern: Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression.

🍱 Integration: Smooth integration with neural search ecosystem including Jina and DocArray. Build cross-modal and multi-modal solutions in no time.

^{[*] with default config (single replica, PyTorch no JIT) on GeForce RTX 3090.}

Text & image embedding

via HTTPS 🔐

via gRPC 🔐⚡⚡

curl \
-X POST https://<your-inference-address>-http.wolf.jina.ai/post \
-H 'Content-Type: application/json' \
-H 'Authorization: <your access token>' \
-d '{"data":[{"text": "First do it"}, 
    {"text": "then do it right"}, 
    {"text": "then do it better"}, 
    {"uri": "https://picsum.photos/200"}], 
    "execEndpoint":"/"}'

# pip install clip-client
from clip_client import Client

c = Client(
    'grpcs://<your-inference-address>-grpc.wolf.jina.ai',
    credential={'Authorization': '<your access token>'},
)

r = c.encode(
    [
        'First do it',
        'then do it right',
        'then do it better',
        'https://picsum.photos/200',
    ]
)
print(r)

Visual reasoning

There are four basic visual reasoning skills: object recognition, object counting, color recognition, and spatial relation understanding. Let's try some:

You need to install jq (a JSON processor) to prettify the results.

Image	via HTTPS 🔐
	curl \ -X POST https://<your-inference-address>-http.wolf.jina.ai/post \ -H 'Content-Type: application/json' \ -H 'Authorization: <your access token>' \ -d '{"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"}' \ \| jq ".data[].matches[] \| (.text, .scores.clip_score.value)" gives: `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl \ -X POST https://<your-inference-address>-http.wolf.jina.ai/post \ -H 'Content-Type: application/json' \ -H 'Authorization: <your access token>' \ -d '{"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"}' \ \| jq ".data[].matches[] \| (.text, .scores.clip_score.value)" gives: `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl \ -X POST https://<your-inference-address>-http.wolf.jina.ai/post \ -H 'Content-Type: application/json' \ -H 'Authorization: <your access token>' \ -d '{"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"}' \ \| jq ".data[].matches[] \| (.text, .scores.clip_score.value)" gives: `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

Documentation

Install

CLIP-as-service consists of two Python packages clip-server and clip-client that can be installed independently. Both require Python 3.7+.

Install server

Pytorch Runtime ⚡	ONNX Runtime ⚡⚡	TensorRT Runtime ⚡⚡⚡
pip install clip-server	pip install "clip-server[onnx]"	pip install nvidia-pyindex pip install "clip-server[tensorrt]"

You can also host the server on Google Colab, leveraging its free GPU/TPU.

Install client

pip install clip-client

Quick check

You can run a simple connectivity check after install.

C/S	Command	Expect output
Server	python -m clip_server
Client	from clip_client import Client c = Client('grpc://0.0.0.0:23456') c.profile()

You can change 0.0.0.0 to the intranet or public IP address to test the connectivity over private and public network.

Get Started

Basic usage

Start the server: python -m clip_server. Remember its address and port.

Create a client:

 from clip_client import Client

 c = Client('grpc://0.0.0.0:51000')

To get sentence embedding:

r = c.encode(['First do it', 'then do it right', 'then do it better'])

print(r.shape)  # [3, 512]

To get image embedding:

r = c.encode(['apple.png',  # local image 
              'https://clip-as-service.jina.ai/_static/favicon.png',  # remote image
              'data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7'])  # in image URI

print(r.shape)  # [3, 512]

More comprehensive server and client user guides can be found in the docs.

Text-to-image cross-modal search in 10 lines

Let's build a text-to-image search using CLIP-as-service. Namely, a user can input a sentence and the program returns matching images. We'll use the Totally Looks Like dataset and DocArray package. Note that DocArray is included within clip-client as an upstream dependency, so you don't need to install it separately.

Load images

First we load images. You can simply pull them from Jina Cloud:

from docarray import DocumentArray

da = DocumentArray.pull('ttl-original', show_progress=True, local_cache=True)

or download TTL dataset, unzip, load manually

Alternatively, you can go to Totally Looks Like official website, unzip and load images:

from docarray import DocumentArray

da = DocumentArray.from_files(['left/*.jpg', 'right/*.jpg'])

The dataset contains 12,032 images, so it may take a while to pull. Once done, you can visualize it and get the first taste of those images:

da.plot_image_sprites()

Encode images

Start the server with python -m clip_server. Let's say it's at 0.0.0.0:51000 with GRPC protocol (you will get this information after running the server).

Create a Python client script:

from clip_client import Client

c = Client(server='grpc://0.0.0.0:51000')

da = c.encode(da, show_progress=True)

Depending on your GPU and client-server network, it may take a while to embed 12K images. In my case, it took about two minutes.

Download the pre-encoded dataset

If you're impatient or don't have a GPU, waiting can be Hell. In this case, you can simply pull our pre-encoded image dataset:

from docarray import DocumentArray

da = DocumentArray.pull('ttl-embedding', show_progress=True, local_cache=True)

Search via sentence

Let's build a simple prompt to allow a user to type sentence:

while True:
    vec = c.encode([input('sentence> ')])
    r = da.find(query=vec, limit=9)
    r[0].plot_image_sprites()

Showcase

Now you can input arbitrary English sentences and view the top-9 matching images. Search is fast and instinctive. Let's have some fun:

"a happy potato"	"a super evil AI"	"a guy enjoying his burger"

"professor cat is very serious"	"an ego engineer lives with parent"	"there will be no tomorrow so lets eat unhealthy"

Let's save the embedding result for our next example:

da.save_binary('ttl-image')

Image-to-text cross-modal search in 10 Lines

We can also switch the input and output of the last program to achieve image-to-text search. Precisely, given a query image find the sentence that best describes the image.

Let's use all sentences from the book "Pride and Prejudice".

from docarray import Document, DocumentArray

d = Document(uri='https://www.gutenberg.org/files/1342/1342-0.txt').load_uri_to_text()
da = DocumentArray(
    Document(text=s.strip()) for s in d.text.replace('\r\n', '').split('.') if s.strip()
)

Let's look at what we got:

da.summary()

            Documents Summary            
                                         
  Length                 6403            
  Homogenous Documents   True            
  Common Attributes      ('id', 'text')  
                                         
                     Attributes Summary                     
                                                            
  Attribute   Data type   #Unique values   Has empty value  
 ────────────────────────────────────────────────────────── 
  id          ('str',)    6403             False            
  text        ('str',)    6030             False

Encode sentences

Now encode these 6,403 sentences, it may take 10 seconds or less depending on your GPU and network:

from clip_client import Client

c = Client('grpc://0.0.0.0:51000')

r = c.encode(da, show_progress=True)

Download the pre-encoded dataset

Again, for people who are impatient or don't have a GPU, we have prepared a pre-encoded text dataset:

from docarray import DocumentArray

da = DocumentArray.pull('ttl-textual', show_progress=True, local_cache=True)

Search via image

Let's load our previously stored image embedding, randomly sample 10 image Documents, then find top-1 nearest neighbour of each.

from docarray import DocumentArray

img_da = DocumentArray.load_binary('ttl-image')

for d in img_da.sample(10):
    print(da.find(d.embedding, limit=1)[0].text)

Showcase

Fun time! Note, unlike the previous example, here the input is an image and the sentence is the output. All sentences come from the book "Pride and Prejudice".


Besides, there was truth in his looks	Gardiner smiled	what’s his name	By tea time, however, the dose had been enough, and Mr	You do not look well


“A gamester!” she cried	If you mention my name at the Bell, you will be attended to	Never mind Miss Lizzy’s hair	Elizabeth will soon be the wife of Mr	I saw them the night before last

Rank image-text matches via CLIP model

From 0.3.0 CLIP-as-service adds a new /rank endpoint that re-ranks cross-modal matches according to their joint likelihood in CLIP model. For example, given an image Document with some predefined sentence matches as below:

from clip_client import Client
from docarray import Document

c = Client(server='grpc://0.0.0.0:51000')
r = c.rank(
    [
        Document(
            uri='.github/README-img/rerank.png',
            matches=[
                Document(text=f'a photo of a {p}')
                for p in (
                    'control room',
                    'lecture room',
                    'conference room',
                    'podium indoor',
                    'television studio',
                )
            ],
        )
    ]
)

print(r['@m', ['text', 'scores__clip_score__value']])

[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'], 
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]

One can see now a photo of a television studio is ranked to the top with clip_score score at 0.992. In practice, one can use this endpoint to re-rank the matching result from another search system, for improving the cross-modal search quality.

Rank text-image matches via CLIP model

In the DALL·E Flow project, CLIP is called for ranking the generated results from DALL·E. It has an Executor wrapped on top of clip-client, which calls .arank() - the async version of .rank():

from clip_client import Client
from jina import Executor, requests, DocumentArray


class ReRank(Executor):
    def __init__(self, clip_server: str, **kwargs):
        super().__init__(**kwargs)
        self._client = Client(server=clip_server)

    @requests(on='/')
    async def rerank(self, docs: DocumentArray, **kwargs):
        return await self._client.arank(docs)

Intrigued? That's only scratching the surface of what CLIP-as-service is capable of. Read our docs to learn more.

Support

Join our Discord community and chat with other community members about ideas.
Watch our Engineering All Hands to learn Jina's new features and stay up-to-date with the latest AI techniques.
Subscribe to the latest video tutorials on our YouTube channel

Join Us

CLIP-as-service is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.

clip-as-service's People

Contributors

Stargazers

Watchers

Forkers

benzei orangepepermint breakjiang larryjianfeng gailysun tonyxia2016 loretoparisi johndpope panyang sunzequn zhouyonglong trendingtechnology dyf102 eos21 yingywang shubhampachori12110095 modernstar ml-lab qsong4 stevenlol mogaio xxcharles juzenn hitum-dev yuhonghong66 hydercps colinsongf zys0070 qgzang leowood wurentidai zhangjiekui muximuxi mrbearwithhissword qianyiwei codemanyep guanlongtianzi lmm6895071 shihuaxing lbda1 hfxunlp jangqh dingyunxia yuanjie-ai fendaq mzdu shaunstanislauslau batermj lucien-qiang fence gatarelib lidhcs delaiahz lymcurry juanlp ramana459 ttklm20 walden2013 xinsongdu wanghm92 for-research slaine2018 hongshunyang kshitizkhatri zepen hanst waiteryee1 gingersna windyjune zedzero wqw123 dengmengsha zlzly nanaakwasiabayieboateng brucedai003 excelsimon andrewzhengxiao qqgeogor xjzhou hosford42 wolfhu hhh920406 fujiyuu75 sakuranew shishi11 happynoom n-one szhl haejupark jxyyjm ningding97 jdegange zhensongqian nikolayvoronchikhin sumad mulinfro ashritdeebadi kjeanclaude jianchengss nonva

clip-as-service's Issues

The principle.

You take the final representation of [CLS] as the sentence vector?

support cpu?

Key global_step not found in checkpoint

clone repo and run, throw errors in finding model files, like below:

usage: app.py -model_dir /tmp/chinese_L-12_H-768_A-12/ -num_worker=4
                 ARG   VALUE

      max_batch_size = 256
         max_seq_len = 25
           model_dir = /tmp/chinese_L-12_H-768_A-12/
          num_worker = 4
       pooling_layer = [-2]
    pooling_strategy = REDUCE_MEAN
                port = 5555
            port_out = 5556

I:VENTILATOR:[ser:__i: 78]:frontend-sink ipc: ipc://tmpFYdJp7/socket
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp6bk_6gdw
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb7af77dd90>) includes params argument, but params are not passed to Estimator.
I:WORKER-0:[ser:run:273]:ready and listening
self._model_dir: /tmp/tmp6bk_6gdw, checkpoint_path: None
Process BertWorker-2:
Traceback (most recent call last):
  File "/home/anaconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/ssd1/NLP/bert-as-service/service/server.py", line 275, in run
    for r in self.estimator.predict(input_fn, yield_single_examples=False):
  File "/home/anaconda/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 491, in predict
    self._model_dir))
ValueError: Could not find trained model in model_dir: /tmp/tmp6bk_6gdw.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpkrst4jsl
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb7ac6f12f0>) includes params argument, but params are not passed to Estimator.
I:WORKER-1:[ser:run:273]:ready and listening
self._model_dir: /tmp/tmpkrst4jsl, checkpoint_path: None
Process BertWorker-3:
Traceback (most recent call last):
  File "/home/anaconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/ssd1/NLP/bert-as-service/service/server.py", line 275, in run
    for r in self.estimator.predict(input_fn, yield_single_examples=False):
  File "/home/anaconda/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 491, in predict
    self._model_dir))
ValueError: Could not find trained model in model_dir: /tmp/tmpkrst4jsl.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpid1u669g
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb7ac6f1730>) includes params argument, but params are not passed to Estimator.
I:WORKER-2:[ser:run:273]:ready and listening
self._model_dir: /tmp/tmpid1u669g, checkpoint_path: None
Process BertWorker-4:
Traceback (most recent call last):
  File "/home/anaconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/ssd1/NLP/bert-as-service/service/server.py", line 275, in run
    for r in self.estimator.predict(input_fn, yield_single_examples=False):
  File "/home/anaconda/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 491, in predict
    self._model_dir))
ValueError: Could not find trained model in model_dir: /tmp/tmpid1u669g.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp_s9pmmxl
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb7ac6f1b70>) includes params argument, but params are not passed to Estimator.
I:WORKER-3:[ser:run:273]:ready and listening
self._model_dir: /tmp/tmp_s9pmmxl, checkpoint_path: None

then I set model_dir and checkpoint_path manually in predict, then throw exception "Key global_step not found in checkpoint"

Other Encoding block will be released ?

As you linked your blog in the README, I read it : it was so interesting !! Thanks for sharing it.

Now, the main pooling strategies are REDUCE_MEAN and REDUCE_MAX, as described in the first part of your blog.

Are you going to release other Sequence encoding blocks ?

If I understood well, it seems difficult because others strategies are based on CNN, which needs data to train on. (Am I right ?)

Undefined names: 'ident' and 'start'

flake8 testing of https://github.com/hanxiao/bert-as-service on Python 3.7.1

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./service/server.py:176:44: F821 undefined name 'ident'
                    worker.send_multipart([ident, b'', pickle.dumps(self.result)])
                                           ^
./service/server.py:178:55: F821 undefined name 'start'
                    time_used = time.perf_counter() - start
                                                      ^
./service/server.py:180:46: F821 undefined name 'ident'
                                (num_result, ident, time_used, int(num_result / time_used)))
                                             ^
./bert/tokenization.py:40:31: F821 undefined name 'unicode'
        elif isinstance(text, unicode):
                              ^
./bert/tokenization.py:63:31: F821 undefined name 'unicode'
        elif isinstance(text, unicode):
                              ^
5     F821 undefined name 'unicode'
5

TypeError: predict() got an unexpected keyword argument 'yield_single_examples'

i met a problem in run app.py,use the lastest version 1.2

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpf4r5hknt
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder..model_fn at 0x7f78dd079e18>) includes params argument, but params are not passed to Estimator.
I:WORKER-0:[ser:run:265]:ready and listening
Process BertWorker-2:
Traceback (most recent call last):
File "/home/work/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/work/bert-as-service-1.2/service/server.py", line 267, in run
for r in self.estimator.predict(input_fn, yield_single_examples=False):
TypeError: predict() got an unexpected keyword argument 'yield_single_examples'

Can Bert-as-service Support Chinese?

Can Bert-as-service Support Chinese?
If it can support Chinese? How to setup?
Thx!

publish to pypi

It would be nice to be able to depend on this library directly from pipy.
Ideally, there would be two packages published: one for the client (without a dependency to tensorflow) and one for the server.

Why ZMQ?

Hi, what benefit can we receive when we build a model service by using ZMQ? Thanks.

can not generate concurrent clients in a row

this doesn't work

[BertClient(show_server_config=True) for _ in range(num_concurrent_clients)]

whereas this works

[BertClient(show_server_config=False) for _ in range(num_concurrent_clients)]

some kind of deadlock in BertClient.get_server_config

[Enhancement] Choose the location where to create tmp files

Where I run the server side, I have plenty of tmp files accumulating.

Maybe add a server option to specify a temporary folder location.

(I'll look into it when I have time)

client-side python2 encoding error

server python3; client python 2; will break the server because of the encoding error.

client don't receive result

client b'8986ca10-6d73-4de6-9895-6d8beab68e11' 3 samples are done! sending back to client

you should NOT see this message multiple times! if you see it appears repeatedly, please consider moving "BertClient()" out of the loop.

Can I specify the service uses the specified GPU?

My machine has 4 GPUs, When start the server, it will run on the 4 GPUs at the same time. But I still need GPU to run other code, So can I specify the service only run on the 1 specified GPU?
BTW, I tried use 'CUDA_VISIBLE_DEVICES='0' ' when I run the server, it did't work.
thanks!

中文bert

在中文bert中，bert获得的是一句话的向量，还是一个词的向量，我在测试的时候，无论输入什么词或话，用余弦相似度计算，得出来的相似度都是在0.8以上，希望您能回复。

zmq.error.ZMQError: Address already in use

/home/inplus-dm/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from .conv import register_converters as register_converters
usage: app.py -model_dir ./BERT_BASE_DIR/english_L-12_H-768_A-12/ -num_worker=4
ARG VALUE________________________________________________
gpu_memory_fraction = 0.5 max_batch_size = 256 max_seq_len = 25
model_dir = ./BERT_BASE_DIR/english_L-12_H-768_A-12/
num_worker = 4
pooling_layer = [-2]
pooling_strategy = REDUCE_MEAN
port = 5555
port_out = 5556

Traceback (most recent call last):
File "app.py", line 44, in
server = BertServer(args)
File "/home/inplus-dm/gaoy/bert-as-service/service/server.py", line 62, in init
self.frontend.bind('tcp://*:%d' % self.port)
File "zmq/backend/cython/socket.pyx", line 547, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython. checkrc._check_rc
zmq.error.ZMQError: Address already in use
-------------------------spliting line------------------------------
I'm newer for ZMQ, when I operated on the steps, I got this exception. It's not working even though retrying.

Question about the input, Thank you!

The origin input for BERT is the concat of two sentences, but we only have one sentence here.
So, do you think the final 768 dim vector is good enough?
Thank you very much! @hanxiao

how to get word hidden states?

like QA tasks, it uses the hidden states of each word in the passage, to predict the answer, but the service only returns the last hidden states of a sequence, could you support word vectors?

run on cpu machine

thumb up for your smart work! I want to run this service on my cpu computer , what should I do ?

keyError on server side,

Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/epbot/pipeline/bert-as-service/service/server.py", line 168, in run
self.client_checksum[client_id]))
KeyError: b'edaf95b1-7d24-4012-bdb7-881b0fdf654c'

After this no more requests are accepting by Server, not able to debug, can you any please help on this.

is this error comes because of less resources , I,e CPU cores (my system cpu cores are busy with other jobs)

character embedding or word embedding?

Hello, thanks for your service, it is very useful. I notice that the word embedding is obtained for character 'h' rather than word 'hey' as follows. It seems like doesn't match with bert tokenizer.

`bc = BertClient()
x = ['hey you', 'whats up']

bc.encode(x) # [2, 25, 768]
bc.encode(x)[0] # [1, 25, 768], word embeddings for hey you
bc.encode(x)[0][0] # [1, 1, 768], word embedding for [CLS]
bc.encode(x)[0][1] # [1, 1, 768], word embedding for h
bc.encode(x)[0][8] # [1, 1, 768], word embedding for [SEP]
bc.encode(x)[0][9] # [1, 1, 768], word embedding for 0_PAD, meaningless
bc.encode(x)[0][25] # error, out of index!`

客户端使用问题

您好，
我拉取了这个服务之后，在本地试验代码
from service.client import BertClient
bc = BertClient()
bc.encode(['First do it', 'then do it right', 'then do it better'])
结果出现：you should NOT see this message multiple times! if you see it appears repeatedly, consider moving "BertClient()" out of the loop.
然后程序一直处于运行之中
请问这是什么原因呢？

"no available GPU thus back-off to CPU". How to support it?

Can I set the size of sentence vectors?

"In general, each sentence is translated to a 768-dimensional vector." If I need 256-dimensional vector, how to get it？

Classifier Predictions

Thanks for this service, it works like a charm and better than Google's!
How it difficult to support classifier predictions for the classification task here from a pre-trained model?

Set the GPU's memory usage size

how to set the GPU's memory usage size?like the follow:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.4)

ImportError: No module named 'tensorflow.python.platform'

Traceback (most recent call last):
  File "app.py", line 8, in <module>
    from service.server import BertServer
  File "/home/123/bert-as-service/service/server.py", line 12, in <module>
    import tensorflow as tf
  File "/home/123/.local/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/123/.local/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/123/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 25, in <module>
    from tensorflow.python.platform import self_check
ImportError: No module named 'tensorflow.python.platform'

Thanks

List index error

I encounter a list index error, which I believe is a bug.
In extract_features.py line94, you passed a list to model.all_encoder_layers, which is also a list.

May I ask for a verification test?

Hi,
I successfully deployed the server, however it just utilized around 700MB GPU memory, which make me doubt if something have gone wrong. (In https://github.com/google-research/bert, it suggested 12GB GPU memory at minimum, and no specification are shown from your README.md)
I tried comparing the result generated by GPU and CPU and they are nearly the same.
Could you offer a test code to check whether the server is giving correct result vector if you have time (using default parameter and standard model is ok. better if you can offer chinese model's) ? Verifying the head of a word's result vector will do as the simplest case. It really helps!

Environment I'm using:
Tesla K80
tensorflow 1.10.0
GPUtil 1.3.0
pyzmq 17.1.2

Thank you very much!

Support for multi-language version BERT

Hi,
nice idea and nice repo!
My question is if this server application is able to receive as input also a multi-language model of Bert, instead of English model.

I tried this command, but an error occurred.

>>> python app.py -num_worker=4 -model_dir ../multilingual_L-12_H-768_A-12/

parameters: 
batch_size_per_worker = 256
         max_seq_len = 25
           model_dir = ../multilingual_L-12_H-768_A-12/
          num_worker = 4
                port = 5555
Traceback (most recent call last):
  File "app.py", line 32, in <module>
    server = BertServer(args)
  File "/src/text/BERT/bert-as-service/service/server.py", line 27, in __init__
    super().__init__()
TypeError: super() takes at least 1 argument (0 given)

Thanks

Service not using GPU

I am trying to host the service from the server with the BERT model "multilingual_L-12_H-768_A-12"
And, it's not using GPU resources. I have Tesla M60 (8GB) x4. However, am seeing this message.

/home/maybe/anaconda3/envs/asr/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
usage:
app.py -num_worker=4 -model_dir ../model/multilingual_L-12_H-768_A-12/
parameters: 
      max_batch_size = 256
         max_seq_len = 25
           model_dir = ../model/multilingual_L-12_H-768_A-12/
          num_worker = 4
                port = 5555
W:[server.py:85]:only 0 GPU(s) is available, but ask for 4

I have a tensorflow-gpu version installed, it works perfectly fine.

Command line to host the pre-trained BERT model,

python app.py -num_worker=4 -model_dir ../model/multilingual_L-12_H-768_A-12/

Another Issue, i am trying to get the sentence embedding from the model from the client and it's just hanging forever.

>>> from service.client import BertClient
>>> 
>>> ec = BertClient()

How to obtain the word embeddings

Thanks for this service, I can get the sentence embeddings. I want to know how can I get the context representations for every token in the sentence through the service. Thank you very much.

CUDA_ERROR_NOT_INITIALIZED error

when i run it in my machine, encountered below error， how to fix it?

(/job:localhost/replica:0/task:0/device:GPU:0 with 10750 MB memory) -> physical GPU (device: 0, name: Tesla K40m, pci bus id: 0000:03:00.0, compute capability: 3.5)
2018-11-23 21:22:53.539204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10750 MB memory) -> physical GPU (device: 1, name: Tesla K40m, pci bus id: 0000:04:00.0, compute capability: 3.5)
2018-11-23 21:22:53.539319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10750 MB memory) -> physical GPU (device: 2, name: Tesla K40m, pci bus id: 0000:83:00.0, compute capability: 3.5)
2018-11-23 21:22:53.539443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10750 MB memory) -> physical GPU (device: 3, name: Tesla K40m, pci bus id: 0000:84:00.0, compute capability: 3.5)
Using TensorFlow backend.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp48khpdz2
WARNING:tensorflow:Estimator's model_fn (.model_fn at 0x7f965c0398c8>) includes params argument, but params are not passed to Estimator.
I:WORKER-0:[ser:run:230]:ready and listening
2018-11-23 21:23:05.496519: E tensorflow/stream_executor/cuda/cuda_driver.cc:1201] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error

Restoring model issue

I'm wondering how did you solved the problem of restoring model every time doing estimator.predict

ZMQError: Operation cannot be accomplished in current state

When running client side, I encounter this error :
zmq.error.ZMQError: Operation cannot be accomplished in current state

Any idea of how to solve this ? Thanks !

On server side, everything seems fine :

I:WORKER-0:[ser:run:234]:ready and listening
I:WORKER-0:[ser:gen:253]:received 64 from b'6bbd50cb-b7e1-46b0-b14f-f3e0511c85aa'
I:WORKER-0:[ser:run:242]:job b'6bbd50cb-b7e1-46b0-b14f-f3e0511c85aa' samples: 64 done: 10.66s
I:SINK:[ser:run:175]:received 64 of client b'6bbd50cb-b7e1-46b0-b14f-f3e0511c85aa' (64/64)
I:SINK:[ser:run:183]:client b'6bbd50cb-b7e1-46b0-b14f-f3e0511c85aa' 64 samples are done! sending back to client

Full stack :

File "train.py", line 175, in bert_embed
    embeddings = bert_client.encode(sentences)
  File "/home/remondn/workspace/Siamese_BERT/resources/BERT_Service/service/client.py", line 51, in encode
    self.socket.send_pyobj(texts)
  File "/home/remondn/.local/lib/python3.5/site-packages/zmq/sugar/socket.py", line 603, in send_pyobj
    return self.send(msg, flags=flags, **kwargs)
  File "/home/remondn/.local/lib/python3.5/site-packages/zmq/sugar/socket.py", line 392, in send
    return super(Socket, self).send(data, flags=flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 725, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 772, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 247, in zmq.backend.cython.socket._send_copy
  File "zmq/backend/cython/socket.pyx", line 242, in zmq.backend.cython.socket._send_copy
  File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Operation cannot be accomplished in current state

Sentences pair classification tasks

I can use bert-as-service to encode each sentence one by one.
Is it possible to use it to encode pair of sentences, as described in the official paper ?

I want to do :
bc.encode(['First do it ||| then do it right'])

So I can have one single vectors for these 2 sentences :

[CLS] First do it [SEP] then do it right [SEP]

Using BERT Service Remotely no response

from service.client import BertClient
bc = BertClient(ip='172.18.7.254')

there is no result feedback and long time waiting

Dependency between sentences embeddings within request

I run this code :

bc = BertClient()
a = bc.encode['hey you', 'hey you']
b = bc.encode['hey you']
c = bc.encode['hey you']

If I compare b and c, these are the same :

print((b == c).all())

True

This is expected behavior

But why a[0] and a[1] are not the same ?

print((a[0] == a[1]).all())

False

I would expect them to have the same embeddings.

Any ideas about sentence similarity in Chinese language?

Hi. This project is wonderful. But I try it for sentence similarity in Chinese, the result is bad.

Here was my process:
I used the default parameters and loaded Chinese BERT model (chinese_L-12_H-768_A-12), passed Chinese sentences ( or splited word with space ) to BERT-SERVICE. Then I got ndarrays from service. Finally, I calculated cos() of them. But the result wasn't well.

Is there any suggestion for me? Where am I wrong? Should I pass the original sentences or make some preprocess, like split etc?

Finetuning Example

Hi,
I am trying your code example (example5.py) to make a fine-tuning on my own mood dataset for a text classification task. I have already adapted the code and train starts correctly.
After 5000 steps loss is descending but accuracy on validation is poor and constant at 42%.

My doubt is that the fine-tuning is modifying weights of prediction task but not the input embeddings. When I used ELMo embedding for example, I set that embedding of input were trainable and classification task results are really better than this.

Any suggestion about that?

Thanks

FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'

Hi,
I am trying to use this repo doing this command, but I have no GPUs locally.

>>> python3.6 app.py -num_worker=4 -model_dir ../multilingual_L-12_H-768_A-12/


usage:
app.py -num_worker=4 -model_dir ../multilingual_L-12_H-768_A-12/
parameters: 
batch_size_per_worker = 256
         max_seq_len = 25
           model_dir = ../multilingual_L-12_H-768_A-12/
          num_worker = 4
                port = 5555
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/src/text/BERT/bert-as-service/service/server.py", line 67, in run
    available_gpus = GPUtil.getAvailable(limit=self.num_worker)
  File "/usr/local/lib/python3.6/dist-packages/GPUtil/GPUtil.py", line 123, in getAvailable
    GPUs = getGPUs()
  File "/usr/local/lib/python3.6/dist-packages/GPUtil/GPUtil.py", line 64, in getGPUs
    p = Popen(["nvidia-smi","--query-gpu=index,uuid,utilization.gpu,memory.total,memory.used,memory.free,driver_version,name,gpu_serial,display_active,display_mode", "--format=csv,noheader,nounits"], stdout=PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'

Is it possible to test this without having GPUs?

Thanks in advance

Problem with CPU server

Hi,

after last commits I am not able to use BERT CPU server anymore. I launch this command to initialize the server:

PATH_MODEL=`pwd`/cased_L_12_H_768_A_12/
docker build -t bert-as-service -f ./docker/Dockerfile_cpu .
docker run --runtime nvidia -it -p 5555:5555 -v $PATH_MODEL:/model -t bert-as-service

When I try to send request from a different machine in the LAN:

from service.client import BertClient
bc = BertClient(ip='192.xxxx.xx.xx')

This message comes out :

you should NOT see this message multiple times! if you see it appears repeatedly, consider moving "BertClient()" out of the loop.

and I am not able to encode my input sentence because the server seems not to receive request, but it is listening.

PS. I have already pulled last commit both from server and client and I am able to ping my LAN server.

PS. 2 If it can be helpful to you, with commit b05a985dd6f36016090371e7751fc96f328a64c7 I am able to do previous commands.

Thanks

On Windows, I got zmq.error:ZMQError: Protocal not supported when executing 'python app.py'

Your project is awesome. But I'm not sure if it will work on Windows 10 platform. I just cloned your project, and downloaded the BERT pre-trained model. The moment I run python app.py -model_dir F:\data\chinese_L-12_H-768_A-12\chinese_L-12_H-768_A-12/ -num_worker=4 , I got an error:

λ python app.py -model_dir F:\data\chinese_L-12_H-768_A-12\chinese_L-12_H-768_A-12/ -num_worker=4
usage: app.py -model_dir F:\data\chinese_L-12_H-768_A-12\chinese_L-12_H-768_A-12/ -num_worker=4
                 ARG   VALUE
__________________________________________________
      max_batch_size = 256
         max_seq_len = 25
           model_dir = F:\data\chinese_L-12_H-768_A-12\chinese_L-12_H-768_A-12/
          num_worker = 4
       pooling_layer = -2
    pooling_strategy = REDUCE_MEAN
                port = 5555

Exception in thread Thread-1:
Traceback (most recent call last):
  File "D:\Anaconda3\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "F:\Work\Github\bert-as-service\service\server.py", line 72, in run
    self.backend.bind('ipc://*')
  File "zmq/backend/cython/socket.pyx", line 495, in zmq.backend.cython.socket.Socket.bind (zmq\backend\cython\socket.c:5653)
  File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc (zmq\backend\cython\socket.c:10014)
zmq.error.ZMQError: Protocol not supported

I have no idea what this zmq is, and I googled, it seems that 'ipc' is not supported on Windows, we should use 'tcp' instead. I tried to just change 'ipc' to 'tcp' on line 72, but still got the similar error:

λ python app.py -model_dir F:\data\chinese_L-12_H-768_A-12\chinese_L-12_H-768_A-12/ -num_worker=4
usage: app.py -model_dir F:\data\chinese_L-12_H-768_A-12\chinese_L-12_H-768_A-12/ -num_worker=4
                 ARG   VALUE
__________________________________________________
      max_batch_size = 256
         max_seq_len = 25
           model_dir = F:\data\chinese_L-12_H-768_A-12\chinese_L-12_H-768_A-12/
          num_worker = 4
       pooling_layer = -2
    pooling_strategy = REDUCE_MEAN
                port = 5555

Exception in thread Thread-1:
Traceback (most recent call last):
  File "D:\Anaconda3\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "F:\Work\Github\bert-as-service\service\server.py", line 72, in run
    self.backend.bind('tcp://*')
  File "zmq/backend/cython/socket.pyx", line 495, in zmq.backend.cython.socket.Socket.bind (zmq\backend\cython\socket.c:5653)
  File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc (zmq\backend\cython\socket.c:10014)
zmq.error.ZMQError: Invalid argument

Any idea on how to correct this?

Lower layer giving better results

I'm using a custom model, using Bert as service feature vectors as input.
I'm solving the problem of sentence textual similarity. I'm using the SICK dataset, but the STS-B dataset (from GLUE) is similar and could be used as well.

I tried to use the default layer, -2, and got a score of ~75%.

I tried to use concatenation of last layers (as described in the paper), ie. -1 -2 -3 -4, but the score didn't improve (actually slightly decreased).

I finally tried a low layer, -11, and got a score of ~80%.

Why a lower layer would give a better score ?
I don't understand...

ImportError: cannot import name 'autograph'

I have updated tf to 1.12.0 and pyzmq to 17.1.0, but the sentence 'from tensorflow.contrib import autograph' occurred error. The version of python is 3.6. I do not know why.

How can I get the Word Embedding?

When I fellow the step:

bc = BertClient()
x = ['hey you', 'whats up?']

bc.encode(x)  # [2, 25, 768]

I got a vector in shape [2,768].
So How can I get the Word Embedding?

TypeError: not all arguments converted

I think the line 234 in ‘service/server.py’, it should be
self.logger.info(' %d is ready and listening' % self.worker_id)

The current code will cause a type error by missing its string formatting symbol.

[Clarification] Size of sentence vectors

From README.md :

Each sentence is translated to a 768-dimensional vector. One exception is REDUCE_MEAN_MAX pooling strategy, which translates a sentence into a 1536-dimensional vector.

Why the sentence vector's size does not change with the number of layers chosen ?

From README.md :

pooling_layer : the encoding layer that pooling operates on, where -1 means the last layer, -2 means the second-to-last, etc.

If -pooling_layer=-4, I expected to have 4 vectors of size 768 concatenated into 1 vectors of size 4 * 768 = 3072, because in the BERT paper :

The best performing method is to concatenate the token representations from the top four hidden layers of the pre-trained Transformer

Does this service support multiple GPU?

ValueError: Could not find trained model in model_dir: /tmp/tmp_st5oe05

Has the service been started up correctly? Why is it using an temporary folder as I have already indicated a model_dir in params?

WARNINGs are shown as follows:

usage: app.py -model_dir /tmp/bert/chinese_L-12_H-768_A-12 -num_worker=1
                 ARG   VALUE
__________________________________________________
      max_batch_size = 256
         max_seq_len = 25
           model_dir = /tmp/bert/chinese_L-12_H-768_A-12
          num_worker = 1
       pooling_layer = -2
    pooling_strategy = REDUCE_MEAN
                port = 5555

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp_st5oe05
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7f80e7184598>) includes params argument, but params are not passed to Estimator.
I:WORKER-2:[ser:run:227]:ready and listening
Process BertWorker-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/xxx/workspace/github/bert-as-service/service/server.py", line 229, in run
    for r in self.estimator.predict(input_fn, yield_single_examples=False):
  File "/home/xxx/pyenv/ternary/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 488, in predict
    self._model_dir))
ValueError: Could not find trained model in model_dir: /tmp/tmp_st5oe05.

Image	via HTTPS 🔐
	curl \ -X POST https://<your-inference-address>-http.wolf.jina.ai/post \ -H 'Content-Type: application/json' \ -H 'Authorization: <your access token>' \ -d '{"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"}' \ \| jq ".data[].matches[] \| (.text, .scores.clip_score.value)" gives: `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl \ -X POST https://<your-inference-address>-http.wolf.jina.ai/post \ -H 'Content-Type: application/json' \ -H 'Authorization: <your access token>' \ -d '{"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"}' \ \| jq ".data[].matches[] \| (.text, .scores.clip_score.value)" gives: `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl \ -X POST https://<your-inference-address>-http.wolf.jina.ai/post \ -H 'Content-Type: application/json' \ -H 'Authorization: <your access token>' \ -d '{"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"}' \ \| jq ".data[].matches[] \| (.text, .scores.clip_score.value)" gives: `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

jina-ai / clip-as-service Goto Github PK

clip-as-service's Introduction

Text & image embedding

Visual reasoning

Install

Install server

Install client

Quick check

Get Started

Basic usage

Text-to-image cross-modal search in 10 lines

Load images

Encode images

Search via sentence

Showcase

Image-to-text cross-modal search in 10 Lines

Encode sentences

Search via image

Showcase

Rank image-text matches via CLIP model

Rank text-image matches via CLIP model

Support

Join Us

clip-as-service's People

Contributors

Stargazers

Watchers

Forkers

clip-as-service's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs