GithubHelp home page GithubHelp logo

jina-ai / docs Goto Github PK

View Code? Open in Web Editor NEW
34.0 31.0 17.0 269.28 MB

Jina V1 Official Documentation. For the latest one, please check out https://docs.jina.ai

Home Page: https://docs1.jina.ai

CSS 35.20% HTML 46.04% Shell 5.88% Makefile 1.23% Python 11.65%
jina neural-search jina-search docs sphinx-doc doc-star

docs's Introduction

Jina Documentation

CD Release

“Open source is this magical thing right? You release code, and the code gnomes come and make it better for you.

Not quite. There are lots of ways that open source is amazing, but it doesn’t exist outside the laws of physics. You have to put work in, to get work out.

You only get contributions after you have put in a lot of work. You only get contributions after you have users. You only get contributions after you have documentation.”

From @ericholscher guide to writing docs

Table of contents

Hierarchical structure

Jina documentation adheres to the following hierarchical structure. Each Jina product has its own section, containing three subsections below.

Overview A high level conceptual overview of the product. Introducing terms and broad architectural concepts. Content here should apply to all Jina users. For example, all users should understand what a Pod is, but only some users need to understand deployment on a GPU.
Developer Guides Are technical how-to guides/tutorials which describe product features or implementations. Assumes basic knowledge of the product and related terms.
API References Are detailed descriptions of the product API. Possibility auto-generated from docstrings or open API references. Describes how the methods work and which parameters can be used.

Missing content

If you find a gap in our documentation, please submit a GitHub issue here.

Documentation process

All documentation should follow the same process as any other PR:

  1. Every developer who wrote the code should also write the Documentation.
  2. Documentation engineer will review the PR.
  3. After the PR is approved by the Doc-engineer it will be reviewed/edited by a technical writer.
  4. It will be reviewed once more and approved by Doc-engineer.

How to add pages

For getting started pages and developer guides:

  1. Read Documentation Style Guide

  2. Using Git, clone the repo: git clone https://github.com/jina-ai/docs .

  3. Create a git new branch: git checkout -b fix_pods .

  4. Use a template from the page_templates folder. We want to have an uniform structure in all of our docs, so we provide two templates for you to use:

  5. Your commit messages should following the standard Jina format seen here.

  6. Add your file to the chapters folder.

  7. Add your file to a table of contents.

  8. Push your branch and create a pull request. Add at least two people as reviewers for your PR. One product manager and one documentation engineer.

You can use Markdown or reStructuredText format. To preview how the docs website will look with your changes, navigate to checks and click 'preview with netlify'. After the pull request is merged, the website will automatically update.

Extra guides

  • A guide to RST can be found here.
  • A guide to MD formatting can be found here [Note that MD is more limited in functionality then RST].

Updating Docstrings

See details here.

Jina style guide

All documentation should follow this style guide.

Build docs locally

#Clone the code.
git clone https://github.com/jina-ai/docs.git

#Install dependencies.
pip install -r requirements.txt

# Clean & build docs locally
make dirhtml

# Serve the docs website with Python 3
python -m http.server 8080 -d _build/dirhtml

docs's People

Contributors

alexcg1 avatar arthur-milchior avatar bwanglzu avatar catstark avatar cristianmtr avatar davidbp avatar deepankarm avatar fionnd avatar florian-hoenicke avatar hanxiao avatar hofmannedv avatar jacobowitz avatar jina-bot avatar joanfm avatar kelton8z avatar lukeekul avatar maximilianwerk avatar nan-wang avatar rutujasurve94 avatar samjoy avatar shivaylamba avatar sixhobbits avatar thepfarrer avatar yongxuanzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs's Issues

Improve chapters folder.

Currently:
Our chapters folder has some files directly stored in chapters: chapters/pod_page.md
and some in their own folders: chapters/pod_page/index.md

In certain cases, this is to allow for local images to be stored in the chapter, but no always.

Problems:

  1. Inconsistent file naming affects SEO.
  2. Confusion for new developers adding files.
  3. Images will be duplicated across folders. (Two folders using the same image).
  4. Does not follow best web design practice

Solution:

  1. Create a folder for images chapters/images/
  2. Move all images (.png/.gif etc) to that folder.
  3. Move all files to the root of the chapter folder. A file like chapters/pods/pod_page.md should be moved to chapters/pod_page.md. This might require renaming some files.
  4. When a file has been moved, the TOC should be updated.
  5. Update all files in the chapter folder so images are correctly shown. For example, a which has .. image:: magnetic-balls.jpg should be updated to .. image:: ../images/magnetic-balls.jpg

[Suggestion] Add a new github action to assign the issues.

Hi,
Right now we have few actions already but it would be great to see an action that can be used to assign issues.

For example, I opened this issue as a suggestion or reporting a bug and I or someone else wants to work on it. They can simply comment Take or Assign me and they would be assigned to work on the issue automatically. I'd like to work on this if this would be useful and we can start by adding it here later expanding to other repositories.

I'd love to know your thoughts. Thanks!

Update simple-executor section

This section is out-of-date and contains wrong information.

We should not add copied code from the core since we risk of having it outdated

[Suggestion] Expand learning tutorials

As per my discussions with team members, we want a set of tutorials that are arranged in a very modular fashion, each building on the one(s) before.

See here for prior notes from product/dev rel interactions

My rough suggestion is below. Very open to edits and feedback!

To start

  1. 101, 102, what is neural search
  2. My First Jina App - we use this as a base moving forwards and build next units on top

Text track

Text search is a good way to start, since text is easily readable in editor and terminal, and if stuff goes wrong it's easier to understand why.

If users want to build up image/video/audio search, they can see other streams which also flow from My First Jina App

  1. Add metadata (similar to which character says which line in legacy South Park search)
  2. Segmenting/granularity (like lyrics search) - we may need to use dataset with longer Docs for this
  3. Crafting
  4. QueryLang
  5. Multimodal (e.g. searching tables and text in PDF)

Image track

Some steps are common to both text and image. Segmenting and crafting an image have very different IO and requirements than text so we add it here too, tweaked for images

  1. Convert My First Jina App to search images (like Pokemon example)
  2. Segmenting
  3. Crafting
  4. Feature detection
  5. Cross-modal (search text to image or vice-versa)

Video, audio tracks

TBC. Similar to image track I would guess

Scalability track

  1. Scale up with sharding/replicas
  2. Dockerize My First Jina App
  3. Add to Jina Hub
  4. Add incremental indexing (like this example)

`EncodeDriver` will only work on images if they have the same shape.

Describe the problem

EncodeDriver is using all_content property of a DocumentSet to extract the content of all the docs in the set in batches.

This has a hidden implication when working with image documents.

all_contents will try to extract the blob field from Document for all of them and stack them in batches

So from B images of shape (3, 224, 224), it will obtain an np.array of shape (B, 3, 224, 224).

This will only work if every Document has a blob of the same shape.

Proposed solution

For now, this should be clearly documented in the documentation.

Fixing this issue and generalization could be too costly, because not -stacking the contents would break the interface of every ImageEncoder and detecting if every document has the same shape can have extra undesired effects

Explain JinaRuntime

As we add more JinaRuntime(s) to the core of Jina, we should explain this concept in the documentation.

Tasks:

  1. Add an explanation article about the purpose of JinaRuntime.
  2. Add definition of JinaRuntime to the glossary

Create using a GPU with Jina

  • Create a how to guide on using GPU within Jina.

  • Please follow the guide template. This should include step by step instructions.

This documentation should live within: Jina_core > Guides

Please follow the correct template and Jina Style guide that can be found in the Docs repo.

Questions can be directed to the docs slack channel.

incremental linkcheck

Currently we have created link check in docs, it checks broken links at each PR without blocking the PR, at the docs level.

Now we want to split the link check workflow into 2 separate parts:

  1. Perform link check at each PR, only check the changed files. e.g. You updated primitive_data_type.rst, and we'll only run link check for this specific file. If link check fails, block the PR until link check passes.
  2. Periodical docs-scale link check, at each week, Monday.

Create a Jina Tutorial

User story:
As a user of Jina I would like to learn how to build a Jina app in a step-by-step manner. Initially, this should cover building a text app, in the future, it should cover other modalities.

  • Create a Design document
  • Set up meeting dev role about hosting tutorials
  • Decide on topic
  • Start writing
  • In review

add sitemaps to the robots.txt automatically for new versions

Currently, we have to add sitemaps for older versions manually.
Please automize this step by generating the robots.txt file in the script instead of having it directly in the repo.

User-agent: *
Disallow:
sitemap: https://docs.jina.ai/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.10/sitemap.xml

Moreover, the sitemap.xml needs to be adjusted to its actual version.
Example:
We have the following sitemap:

User-agent: *
Disallow:
sitemap: https://docs.jina.ai/sitemap.xml

Now, we do a release. Means, sitemap: https://docs.jina.ai/sitemap.xml becomes sitemap: https://docs.jina.ai/v1.0.10/sitemap.xml and we create a new sitemap for the new master sitemap: https://docs.jina.ai/sitemap.xml

User-agent: *
Disallow:
sitemap: https://docs.jina.ai/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.10/sitemap.xml

In addition, the links inside https://docs.jina.ai/v1.0.10/sitemap.xml have to be updated to v1.0.10.

UAC

  • robots.txt is automatically generated and updated
  • the links inside the sitemaps are updated

Checks for broken links

There should be some system that notifies the Jina team of broken URL in the current version of the docs.

dirhtml for netlify

Netlify currently builds the website using make html
Please change it to make dirhtml

Add explanation how to add drivers to .YAML

Feature request

As per this community request here, Max had to explain to a community member how to add a driver to their YAML file. We don't explain this in our documents.

To fix this, I suggest we add

  • A subsection to this page with how to add a driver to a YAML.
  • In point five of this page, add a reference to the above subsection, and that you should add the jina.resources/executors.requests.CompoundIndexer.yml and remove the !ExcludeQL as Max explains.

BaseUpdateDriver

Hey everyone, I'd like to write a custom UpdateDriver. Unfortunately, there is not a single driver like for all the other request types (index, search, delete).

I would appreciate it, if you could add one. The KVIndexDriver would work absolutely fine in my opinion. One just needs to change the default method to 'update'. Thank you!!

let jina bot commit to master branch

Once after a release, we need to update versions file and push to master branch. While dev-bot failed to do it because master is a protected branch. Currently I'm updating versions manually.

I'll figure out a way to update version automatically.

Minimum working example...isn't?

The minimum working example should be something more like a minimum working example of a Jina app, no? So basic index and query Flows, like Wikipedia.

The current doc is more of a minimum working example of an Executor. I'm happy to re-title and move if no-one has objections.

Introduction section is a scary wall of text

Intro for core is a scary big wall of text with dozens and dozens of links. This gives the impression there many many pages of content.

The intro is one of the first things new users see. We don't want to terrify them!

Suggestion

One heading and introductory paragraph each for:

  • 101
  • 102
  • Hello world

With link to the full page below. No links to sub pages (and certainly not sub-sub pages)

Some images would be nice too

raise a PR to update version after release

We need to update versions, and the latest versions will be backup and sync to docs website. Since master is a protected branch, we can not update master directly, a workaround would be:

after each jina-release-event, update versions file, and let JINA DEV BOT to raise a PR. We manually approve and merge.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.