GithubHelp home page GithubHelp logo

zhutony / jina Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jina-ai/jina

0.0 2.0 0.0 104.11 MB

An easier way to build neural search on the cloud

Home Page: https://docs2.jina.ai

License: Apache License 2.0

Python 94.07% HTML 2.50% Shell 0.95% Dockerfile 0.64% JavaScript 0.48% CSS 1.07% EJS 0.28%

jina's Introduction

Jina banner

Cloud-Native Neural Search[?] Framework for Any Kind of Data

Python 3.7 3.8 3.9 Docker Image Version (latest semver) codecov

Jina allows you to build deep learning-powered search-as-a-service in just minutes.

๐ŸŒŒ Universal data type - Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.

๐ŸŒฉ๏ธ Fast & cloud-native - Distributed architecture from day one. Scalable & cloud-native by design: enjoy containerizing, distributing, sharding, async, REST/gRPC/WebSocket.

โฑ๏ธ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.

๐Ÿฑ Own your stack - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with fragmented, multi-vendor, generic legacy tools.

Installation

2.0 is still in pre-release, add --pre to install it. Why 2.0?

$ pip install --pre jina
$ jina -v
2.0.0rcN

via Docker

$ docker run jinaai/jina:master -v
2.0.0rcN
๐Ÿ“ฆ More installation options

x86/64,arm64,v6,v7,Apple M1
On Linux/macOS & Python 3.7/3.8/3.9 Docker Users
Standard pip install --pre jina docker run jinaai/jina:master
Daemon pip install --pre "jina[daemon]" docker run --network=host jinaai/jina:master-daemon
With Extras pip install --pre "jina[devel]" docker run jinaai/jina:master-devel

Version identifiers are explained here. Jina can run on Windows Subsystem for Linux. We welcome the community to help us with native Windows support.

Get Started

Document, Executor, Flow are three fundamental concepts in Jina.

Copy-paste the minimum example below and run it:

๐Ÿ’ก Preliminaries: character embedding, pooling, Euclidean distance

import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

class Indexer(Executor):
    _docs = DocumentArray()  # for storing all document in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embedding from query docs
        d = np.stack(self._docs.get_attributes('embedding'))  # get all embedding from stored docs
        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance
        for dist, query in zip(euclidean_dist, docs):  # add & sort match
            query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.score.value)  # sort matches by its value

f = Flow(port_expose=12345).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of this file
    f.block()  # block for listening request

Keep the above running and start a simple client:

from jina import Client, Document

def print_matches(req):  # the callback function invoked when task is done
    for idx, d in enumerate(req.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.score.value:2f}: "{d.text}"')
        
c = Client(host='localhost', port_expose=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

It finds most similar lines to "request(on=something)" from the server code snippet and prints the following:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"

๐Ÿ˜” Doesn't work? Our bad! Please report it here.

Run Quick Demo

Fork Demo & Build Your Own

Copy the source code of a hello world to your own directory and start from there:

$ jina hello fork fashion ../my-proj/ 

Read Tutorials

Support

  • Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
  • Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
  • Subscribe to the latest video tutorials on our YouTube channel.

Join Us

Jina is backed by Jina AI. We are actively hiring full-stack developers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

All Contributors

jina's People

Contributors

hanxiao avatar jina-bot avatar joanfm avatar nan-wang avatar alexcg1 avatar deepankarm avatar cristianmtr avatar maximilianwerk avatar bwanglzu avatar florian-hoenicke avatar catstark avatar fhaase2 avatar yongxuanzhang avatar rutujasurve94 avatar bhavsarpratik avatar anish2197 avatar yueliu1415926 avatar shivam-raj avatar davidbp avatar bingho1013 avatar allcontributors[bot] avatar antonkurenkov avatar fionnd avatar redram avatar guiferviz avatar maanavshah avatar kelton8z avatar arrrlex avatar joaopalotti avatar rameshwara avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.