jina-ai / docs Goto Github PK

Jina V1 Official Documentation. For the latest one, please check out https://docs.jina.ai

CSS 35.20% HTML 46.04% Shell 5.88% Makefile 1.23% Python 11.65%

jina neural-search jina-search docs sphinx-doc doc-star

docs's Introduction

Jina Documentation

“Open source is this magical thing right? You release code, and the code gnomes come and make it better for you.

Not quite. There are lots of ways that open source is amazing, but it doesn’t exist outside the laws of physics. You have to put work in, to get work out.

You only get contributions after you have put in a lot of work. You only get contributions after you have users. You only get contributions after you have documentation.”

From @ericholscher guide to writing docs

Hierarchical structure
Missing content
How to add pages
Updating Docstrings
Jina style guide
Technical aspects of the the doc site

Hierarchical structure

Jina documentation adheres to the following hierarchical structure. Each Jina product has its own section, containing three subsections below.


Overview	A high level conceptual overview of the product. Introducing terms and broad architectural concepts. Content here should apply to all Jina users. For example, all users should understand what a Pod is, but only some users need to understand deployment on a GPU.
Developer Guides	Are technical how-to guides/tutorials which describe product features or implementations. Assumes basic knowledge of the product and related terms.
API References	Are detailed descriptions of the product API. Possibility auto-generated from docstrings or open API references. Describes how the methods work and which parameters can be used.

Missing content

If you find a gap in our documentation, please submit a GitHub issue here.

Documentation process

All documentation should follow the same process as any other PR:

Every developer who wrote the code should also write the Documentation.
Documentation engineer will review the PR.
After the PR is approved by the Doc-engineer it will be reviewed/edited by a technical writer.
It will be reviewed once more and approved by Doc-engineer.

How to add pages

For getting started pages and developer guides:

Read Documentation Style Guide
Using Git, clone the repo: git clone https://github.com/jina-ai/docs .
Create a git new branch: git checkout -b fix_pods .
Use a template from the page_templates folder. We want to have an uniform structure in all of our docs, so we provide two templates for you to use:
- The How-to Documentation is for concrete guidelines. For topics that can be better explained step-by-step.
- The explanatory articles are to explain theory and background without any how-to details.
Your commit messages should following the standard Jina format seen here.
Add your file to the chapters folder.
Add your file to a table of contents.
Push your branch and create a pull request. Add at least two people as reviewers for your PR. One product manager and one documentation engineer.

You can use Markdown or reStructuredText format. To preview how the docs website will look with your changes, navigate to checks and click 'preview with netlify'. After the pull request is merged, the website will automatically update.

Extra guides

A guide to RST can be found here.
A guide to MD formatting can be found here [Note that MD is more limited in functionality then RST].

Updating Docstrings

See details here.

Jina style guide

All documentation should follow this style guide.

Build docs locally

#Clone the code.
git clone https://github.com/jina-ai/docs.git

#Install dependencies.
pip install -r requirements.txt

# Clean & build docs locally
make dirhtml

# Serve the docs website with Python 3
python -m http.server 8080 -d _build/dirhtml

docs's People

Contributors

Stargazers

Watchers

Forkers

mohamed-magde yongxuanzhang thepfarrer lukeekul hofmannedv atibaup shivaylamba ritza-co arthur-milchior fitolopeunsoix metricix hanxiao gauravdsingh yiouyou shradsruhela03 lijojosef rameez-sidd

docs's Issues

Improve chapters folder.

Currently:
Our chapters folder has some files directly stored in chapters: chapters/pod_page.md
and some in their own folders: chapters/pod_page/index.md

In certain cases, this is to allow for local images to be stored in the chapter, but no always.

Problems:

Inconsistent file naming affects SEO.
Confusion for new developers adding files.
Images will be duplicated across folders. (Two folders using the same image).
Does not follow best web design practice

Solution:

Create a folder for images chapters/images/
Move all images (.png/.gif etc) to that folder.
Move all files to the root of the chapter folder. A file like chapters/pods/pod_page.md should be moved to chapters/pod_page.md. This might require renaming some files.
When a file has been moved, the TOC should be updated.
Update all files in the chapter folder so images are correctly shown. For example, a which has .. image:: magnetic-balls.jpg should be updated to .. image:: ../images/magnetic-balls.jpg

update my_first_jina_app since cookiecutter is going to be removed

We can use the code in the example repo instead.
https://github.com/jina-ai/docs/blob/master/chapters/my_first_jina_app.md
https://github.com/jina-ai/examples

Missing release note after migration

While checking the docs, I discovered release note was not up-to-date in docs website

This is serious, I'll prioritise this task.

cc @FionnD

[Suggestion] Add a new github action to assign the issues.

Hi,
Right now we have few actions already but it would be great to see an action that can be used to assign issues.

For example, I opened this issue as a suggestion or reporting a bug and I or someone else wants to work on it. They can simply comment Take or Assign me and they would be assigned to work on the issue automatically. I'd like to work on this if this would be useful and we can start by adding it here later expanding to other repositories.

I'd love to know your thoughts. Thanks!

Update simple-executor section

This section is out-of-date and contains wrong information.

We should not add copied code from the core since we risk of having it outdated

[Suggestion] release process improvements

Adding Release Drafter and PR Labeler to Docs

Adds an actions that labels PRs that are submitted
Drafting the Releases automatically

Missing info: documenting _merge vs _pass

Add documentation for _merge vs _pass.

I think there is no good documentation on how to use _pass, _merge and how to use Flows with bifurcations.

[Suggestion] Expand learning tutorials

As per my discussions with team members, we want a set of tutorials that are arranged in a very modular fashion, each building on the one(s) before.

See here for prior notes from product/dev rel interactions

My rough suggestion is below. Very open to edits and feedback!

To start

101, 102, what is neural search
My First Jina App - we use this as a base moving forwards and build next units on top

Text track

Text search is a good way to start, since text is easily readable in editor and terminal, and if stuff goes wrong it's easier to understand why.

If users want to build up image/video/audio search, they can see other streams which also flow from My First Jina App

Add metadata (similar to which character says which line in legacy South Park search)
Segmenting/granularity (like lyrics search) - we may need to use dataset with longer Docs for this
Crafting
QueryLang
Multimodal (e.g. searching tables and text in PDF)

Image track

Some steps are common to both text and image. Segmenting and crafting an image have very different IO and requirements than text so we add it here too, tweaked for images

Convert My First Jina App to search images (like Pokemon example)
Segmenting
Crafting
Feature detection
Cross-modal (search text to image or vice-versa)

Video, audio tracks

TBC. Similar to image track I would guess

Scalability track

Scale up with sharding/replicas
Dockerize My First Jina App
Add to Jina Hub
Add incremental indexing (like this example)

`EncodeDriver` will only work on images if they have the same shape.

Describe the problem

EncodeDriver is using all_content property of a DocumentSet to extract the content of all the docs in the set in batches.

This has a hidden implication when working with image documents.

all_contents will try to extract the blob field from Document for all of them and stack them in batches

So from B images of shape (3, 224, 224), it will obtain an np.array of shape (B, 3, 224, 224).

This will only work if every Document has a blob of the same shape.

Proposed solution

For now, this should be clearly documented in the documentation.

Fixing this issue and generalization could be too costly, because not -stacking the contents would break the interface of every ImageEncoder and detecting if every document has the same shape can have extra undesired effects

jina batching docs needs to be updated after refactor

Explain JinaRuntime

As we add more JinaRuntime(s) to the core of Jina, we should explain this concept in the documentation.

Tasks:

Add an explanation article about the purpose of JinaRuntime.
Add definition of JinaRuntime to the glossary

Create using a GPU with Jina

Create a how to guide on using GPU within Jina.
Please follow the guide template. This should include step by step instructions.

This documentation should live within: Jina_core > Guides

Please follow the correct template and Jina Style guide that can be found in the Docs repo.

Questions can be directed to the docs slack channel.

Adjust CI/CD pipeline with 2.0 changes

Now master branch is 2.0. Some CI/CD / version numbers/ backups / linkcheck is failing (or impacted), need to fix.

Fix broken links in docs

Run make linkcheck locally or see results here

fix numbering in crud limitations

incremental linkcheck

Currently we have created link check in docs, it checks broken links at each PR without blocking the PR, at the docs level.

Now we want to split the link check workflow into 2 separate parts:

Perform link check at each PR, only check the changed files. e.g. You updated primitive_data_type.rst, and we'll only run link check for this specific file. If link check fails, block the PR until link check passes.
Periodical docs-scale link check, at each week, Monday.

[improvement] changing link for issue creation

Currently the link points to the docs repo. It can be changed to github.com/jina-ai/docs/issues/new

Create a Jina Tutorial

User story:
As a user of Jina I would like to learn how to build a Jina app in a step-by-step manner. Initially, this should cover building a text app, in the future, it should cover other modalities.

Improve documentation on Document type and tags as as field to contain meta info

I guess we need better documentation for this.

Originally posted by @JoanFM in jina-ai/jina#1496 (comment)

add sitemaps to the robots.txt automatically for new versions

Currently, we have to add sitemaps for older versions manually.
Please automize this step by generating the robots.txt file in the script instead of having it directly in the repo.

User-agent: *
Disallow:
sitemap: https://docs.jina.ai/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.10/sitemap.xml

Moreover, the sitemap.xml needs to be adjusted to its actual version.
Example:
We have the following sitemap:

User-agent: *
Disallow:
sitemap: https://docs.jina.ai/sitemap.xml

Now, we do a release. Means, sitemap: https://docs.jina.ai/sitemap.xml becomes sitemap: https://docs.jina.ai/v1.0.10/sitemap.xml and we create a new sitemap for the new master sitemap: https://docs.jina.ai/sitemap.xml

User-agent: *
Disallow:
sitemap: https://docs.jina.ai/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.10/sitemap.xml

In addition, the links inside https://docs.jina.ai/v1.0.10/sitemap.xml have to be updated to v1.0.10.

UAC

robots.txt is automatically generated and updated
the links inside the sitemaps are updated

Checks for broken links

There should be some system that notifies the Jina team of broken URL in the current version of the docs.

dirhtml for netlify

Netlify currently builds the website using make html
Please change it to make dirhtml

the latest versions are not available when switching to previous version

After switching to 1.0.0, the versions afterwards are no longer selectable.

remove robots.txt files for older versions

Currently, we keep older versions of our robots.txt in each release.
That does not harm but it would be nice to clean them up automatically.

Add explanation how to add drivers to .YAML

Feature request

As per this community request here, Max had to explain to a community member how to add a driver to their YAML file. We don't explain this in our documents.

To fix this, I suggest we add

A subsection to this page with how to add a driver to a YAML.
In point five of this page, add a reference to the above subsection, and that you should add the jina.resources/executors.requests.CompoundIndexer.yml and remove the !ExcludeQL as Max explains.

set up labels for docs

update docs with permanent delete

jina-ai/jina#1984 (comment)

Wrong url to check jinad is properly set up

In https://docs.jina.ai/master/chapters/remote/jinad.html, under the Usage->Prerequisites section->2nd paragraph, it says to visit http://3.16.166.3:8000/alive to check if jinad is properly set up. The url does not work. I think it should be http://3.16.166.3:8000/status

BaseUpdateDriver

Hey everyone, I'd like to write a custom UpdateDriver. Unfortunately, there is not a single driver like for all the other request types (index, search, delete).

I would appreciate it, if you could add one. The KVIndexDriver would work absolutely fine in my opinion. One just needs to change the default method to 'update'. Thank you!!

prevent duplicates and incremental indexing chapters should be merged

They cover the same content, and should be merged into one single chapter

https://docs.jina.ai/chapters/incremental_indexing/index.html
https://docs.jina.ai/chapters/prevent_duplicate_indexing/index.html

Improve SEO

Read this guide and implement any of the possible steps.

For example, try adding a robot.txt file according to the instructions.

Flask is mentioned as an extra dependency when it should be fastapi

In https://docs.jina.ai/chapters/install/os/via-pip.html, under the Extra Dependencies Explained section, you mention using flask as one of your extra dependencies. Didn't you make a shift from flask to fastapi as mentioned here jina-ai/jina#1348.

let jina bot commit to master branch

Once after a release, we need to update versions file and push to master branch. While dev-bot failed to do it because master is a protected branch. Currently I'm updating versions manually.

I'll figure out a way to update version automatically.

Wrong and incomplete documentation about CompundExecutor

Sitemaps for old versions

create sitemaps for older versions and add them to the robots.txt
User-agent: *
Disallow:
sitemap: https://docs.jina.ai/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.4/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.3/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.2/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.1/sitemap.xml
sitemap: https://docs.jina.ai/v1.0.0/sitemap.xml

Minimum working example...isn't?

The minimum working example should be something more like a minimum working example of a Jina app, no? So basic index and query Flows, like Wikipedia.

The current doc is more of a minimum working example of an Executor. I'm happy to re-title and move if no-one has objections.

Introduction section is a scary wall of text

Intro for core is a scary big wall of text with dozens and dozens of links. This gives the impression there many many pages of content.

The intro is one of the first things new users see. We don't want to terrify them!

Suggestion

One heading and introductory paragraph each for:

101
102
Hello world

With link to the full page below. No links to sub pages (and certainly not sub-sub pages)

Some images would be nice too

JinaD docs are outdated

Since JinaD is the entry-point for users on Console and everything remote with Jina, the docs need to be rearranged & old code snippets should be removed - https://docs.jina.ai/chapters/remote

raise a PR to update version after release

We need to update versions, and the latest versions will be backup and sync to docs website. Since master is a protected branch, we can not update master directly, a workaround would be:

after each jina-release-event, update versions file, and let JINA DEV BOT to raise a PR. We manually approve and merge.