GithubHelp home page GithubHelp logo

thepfarrer / jina Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jina-ai/jina

0.0 0.0 0.0 159.64 MB

An easier way to build neural search in the cloud

Home Page: https://docs.jina.ai

License: Apache License 2.0

Python 54.75% HTML 44.26% Shell 0.33% Dockerfile 0.23% JavaScript 0.08% CSS 0.24% EJS 0.11%

jina's Introduction

Jina logo: Jina is a cloud-native neural search framework

Cloud-Native Neural Search? Framework for Any Kind of Data

Python 3.7 3.8 3.9 PyPI Docker Image Version (latest semver) codecov

Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.

โฑ๏ธ Save time - The design pattern of neural search systems. Native support on PyTorch/Keras/ONNX/Paddle, solution building in just minutes.

๐ŸŒŒ All data types - Processing, indexing, querying, understanding of video, image, long/short text, music, source code, PDF, etc.

๐ŸŒฉ๏ธ Local & cloud friendly - Distributed architecture, scalable & cloud-native from day one. Same developer experience on both local and cloud.

๐Ÿฑ Own your stack - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.

Install

pip install -U jina

More install options including Conda, Docker, on Windows can be found here.

Get Started

Get started with Jina to build production-ready neural search solution via ResNet in less than 20 minutes

We promise you to build a scalable ResNet-powered image search service in 20 minutes or less, from scratch. If not, you can forget about Jina.

Basic Concepts

Document, Executor, and Flow are three fundamental concepts in Jina.

  • Document is the basic data type in Jina;
  • Executor is how Jina processes Documents;
  • Flow is how Jina streamlines and distributes Executors.

Leveraging these three components, let's build an app that find similar images using ResNet50.

ResNet50 Image Search in 20 Lines

๐Ÿ’ก Preliminaries: download dataset, install PyTorch & Torchvision

from jina import DocumentArray, Document

def preproc(d: Document):
    return (d.load_uri_to_image_blob()  # load
             .set_image_blob_normalization()  # normalize color 
             .set_image_blob_channel_axis(-1, 0))  # switch color axis
docs = DocumentArray.from_files('img/*.jpg').apply(preproc)

import torchvision
model = torchvision.models.resnet50(pretrained=True)  # load ResNet50
docs.embed(model, device='cuda')  # embed via GPU to speedup

q = (Document(uri='img/00021.jpg')  # build query image & preprocess
     .load_uri_to_image_blob()
     .set_image_blob_normalization()
     .set_image_blob_channel_axis(-1, 0))
q.embed(model)  # embed
q.match(docs)  # find top-20 nearest neighbours, done!

Done! Now print q.matches and you will see most-similar images URIs.

Print q.matches to get visual similar images in Jina using ResNet50

Add 3 lines of code to visualize them:

for m in q.matches:
    m.set_image_blob_channel_axis(0, -1).set_image_blob_inv_normalization()
q.matches.plot_image_sprites()

Visualize visual similar images in Jina using ResNet50

Sweet! FYI, one can use Keras, ONNX, PaddlePaddle for the embedding model. Jina supports them well.

As-a-Service in 10 Extra Lines

With an extremely trivial refactoring and 10 extra lines of code, you can make the local script as a ready-to-serve service:

  1. Import what we need.

    from jina import Document, DocumentArray, Executor, Flow, requests
  2. Copy-paste the preprocessing step and wrap it via Executor:

    class PreprocImg(Executor):
        @requests
        def foo(self, docs: DocumentArray, **kwargs):
            for d in docs:
                (d.load_uri_to_image_blob()  # load
                 .set_image_blob_normalization()  # normalize color
                 .set_image_blob_channel_axis(-1, 0))  # switch color axis
  3. Copy-paste the embedding step and wrap it via Executor:

    class EmbedImg(Executor):
        def __init__(self, **kwargs):
            super().__init__(**kwargs)
            import torchvision
            self.model = torchvision.models.resnet50(pretrained=True)        
    
        @requests
        def foo(self, docs: DocumentArray, **kwargs):
            docs.embed(self.model)
  4. Wrap the matching step into Executor:

    class MatchImg(Executor):
        _da = DocumentArray()
    
        @requests(on='/index')
        def index(self, docs: DocumentArray, **kwargs):
            self._da.extend(docs)
    
        @requests(on='/search')
        def foo(self, docs: DocumentArray, **kwargs):
            docs.match(self._da)
            for d in docs.traverse_flat('r,m'):  # only require for visualization
                d.convert_uri_to_datauri()  # convert to datauri
                d.pop('embedding', 'blob')  # remove unnecessary fields for save bandwidth
  5. Connect all Executors in a Flow, scale embedding to 3:

    f = Flow(port_expose=12345, protocol='http').add(uses=PreprocImg).add(uses=EmbedImg, replicas=3).add(uses=MatchImg)

    Plot it via f.plot('flow.svg') and you get:

  6. Index image data and serve REST query from public:

    with f:
        f.post('/index', DocumentArray.from_files('img/*.jpg'), show_progress=True, request_size=8)
        f.block()

Done! Now query it via curl you can get most-similar images:

Use curl to query image search service built by Jina & ResNet50

Or go to http://0.0.0.0:12345/docs and test requests via Swagger UI:

Visualize visual similar images in Jina using ResNet50

Or use a Python client to access the service:

from jina import Client, Document
from jina.types.request import Response

def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches):  # print top-3 matches
        print(f'[{idx}]{d.scores["cosine"].value:2f}: "{d.uri}"')

c = Client(protocol='http', port=12345)  # connect to localhost:12345
c.post('/search', Document(uri='img/00021.jpg'), on_done=print_matches)

At this point, you probably have taken 15 minutes but here we are: an image search service with rich features:

โœ… Solution as microservices โœ… Scale in/out any component โœ… Query via HTTP/WebSocket/gRPC/Client
โœ… Distribute/Dockerize components โœ… Async/non-blocking I/O โœ… Extendable REST interface

Deploy to Kubernetes in 7 Minutes

Have another 7 minutes? We can show you how to bring your service to the next level by deploying it to Kubernetes.

  1. Create a Kubernetes cluster and get credentials (example in GCP, more K8s providers here):
    gcloud container clusters create test --machine-type e2-highmem-2  --num-nodes 1 --zone europe-west3-a
    gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase
  2. Move each Executor class to a separate folder with one Python file:
    • PreprocImg -> ๐Ÿ“ preproc_img/exec.py
    • EmbedImg -> ๐Ÿ“ embed_img/exec.py
    • MatchImg -> ๐Ÿ“ match_img/exec.py
  3. Push all Executors to Jina Hub:
    jina hub push preproc_img
    jina hub push embed_img
    jina hub push embed_img
    You will get three Hub Executors that can be used via Docker container.
  4. Adjust Flow a bit and open it:
    f = Flow(name='readme-flow', port_expose=12345, infrastructure='k8s').add(uses='jinahub+docker://PreprocImg').add(uses='jinahub+docker://EmbedImg', replicas=3).add(uses='jinahub+docker://MatchImg')
    with f:
        f.block()

Intrigued? Then find more about Jina from our docs.

Run Quick Demo

Support

Join Us

Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

All Contributors

jina's People

Contributors

hanxiao avatar jina-bot avatar joanfm avatar nan-wang avatar deepankarm avatar alexcg1 avatar bwanglzu avatar cristianmtr avatar maximilianwerk avatar florian-hoenicke avatar catstark avatar davidbp avatar fhaase2 avatar yongxuanzhang avatar numb3r3 avatar jacobowitz avatar alaeddine-13 avatar mapleeit avatar rutujasurve94 avatar bhavsarpratik avatar anish2197 avatar yueliu1415926 avatar shivam-raj avatar bingho1013 avatar allcontributors[bot] avatar kelton8z avatar antonkurenkov avatar fionnd avatar winstonww avatar slettner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.