GithubHelp home page GithubHelp logo

deepset-ai / haystack-website Goto Github PK

View Code? Open in Web Editor NEW
31.0 16.0 41.0 148.46 MB

Contents moved to https://github.com/deepset-ai/haystack-home

Home Page: https://haystack.deepset.ai

License: BSD Zero Clause License

JavaScript 21.32% TypeScript 75.75% CSS 2.94%

haystack-website's Introduction

Development

Getting Started

First, install the dependencies (If you are running into issues with this, make sure to update Node to the latest version):

yarn install

Part of the documentation source lives within the Haystack repo and the build system expects to find it locally, so before running the development server run this command to get a local copy of Haystack:

yarn haystack

At this point you can run the development server:

yarn dev

Open http://localhost:3000 with your browser to see the result.

When editing .mdx files, you can run the following command to see your changes update automatically:

yarn dev:watch

Note: This setup is tested with node v14.17.5 - but might be incompatible to older/newer versions.

Environment Variables

If you have permission issues when starting up, get a personal access token from GitHub. The public_repo scope is sufficient.

Create a .env.local file and add your token as an env variable:

GITHUB_PERSONAL_ACCESS_TOKEN="youraccesstoken"

Required Reading

This project makes heavy use of Next.js's getStaticProps and getStaticPaths functions, to fetch markdown files at build time (locally from the docs directory as well as from GitHub using the GitHub API) and generate html pages for each of these files. Before working on the project, it's vital that you understand how these functions work and how they apply to this project. This example and this example may be used as simple demonstrations of these functions to solidify your understanding.

Docs Publishing Process

Overview & Usage Docs

These docs live in the docs directory, in the given version directory. The docs are written in .mdx, which allows us to include JSX inside these files. This allows us to add Headless UI components, a React component library based on Tailwind.css. See the components/Disclosures and components/Tabs components as examples and how these are used inside of .mdx files such as docs/v0.9.0/overview/get-started.mdx. Whenever you want edit or create new documentation, simply do so by adding .mdx files to a given version directory or by editing existing .mdx files. For new files one additional step is required, please add the new page to the menu.json file which is located in the folder docs/vX.X.X. In the same way, please remove a page from menu.json if it is not needed anymore, e.g., if the corresponding module has been deleted in haystack and therefore its documentation is not needed anymore. When you push a branch with your changes to GitHub, Vercel will automatically generate a preview environment for you (check the Vercel Dashboard to find the preview URL).

Tutorial & Reference Docs

These docs live in the Haystack repository, in the given version directory. The docs are generated markdown files and must be fetched before the build starts. Thanks to Vercel's Incremental Static Regeneration, the static pages we create for these docs are always up-to-date. This means that if existing tutorials or references are changed, the changes will be visible on the docs website automatically.

Adding a new Tutorial Page

In the Haystack repo, add an entry into haystack/docs/_src/tutorials/tutorials/headers.py that corresponds to your new tutorial. When you push your changes to any branch, there is a Github action that calls haystack/docs/_src/tutorials/tutorials/convert_ipynb.py to generate a .md version of the tutorial in the same folder. These .md files are generally called something like 12.md.

Then in this repo, you need to add an entry to haystack-website/lib/constants.ts to refer to the new .md file in Haystack. Please add the new file only to the latest version. If you remove files, you also have to remove it in the latest version. To make it appear in the left Table of Contents, you need to add a new entry to haystack-website/docs/latest/menu.json.

For example:

const res = await octokit.rest.repos.getContent({
  owner: "deepset-ai",
  repo: "haystack",
  path: `docs${version && version !== "latest" ? `/${version}` : ""}${repoPath}${filename}`,
  ref: HAYSTACK_BRANCH_NAME
});

Preview from non-main branches

To preview docs that are on a non-main branch of the Haystack repo, you run this project locally and navigate to lib/github.ts, where you have to add a ref parameter to the octokit.rest.repos.getContent function call with the value of the branch name that you would like to preview. You also need to add the tutorials/references you would like to preview to docs/{GIVEN_VERSION}/menu.json and lib/constants.ts.

Redirects In Case of Renaming or Restructuring Pages

When renaming documentation pages, or restructuring the directories that they're contained in, the new filepath can cause old links to break. For example, when the pipeline_nodes grouping was created components/reader.mdx did not exist any more as it had changed to pipeline_nodes/reader.mdx. This meant that links on websites were broken.

To make sure links aren't broken please follow these steps:

  1. Identify what path is no longer valid and what new path is the most appropriate for it to point to

  2. Populate the redirects() function in next.config.js with an entry containing source, destination and permanent:

    {
      source: 'the/old/path',
      destination: '/the/new/path',
      permanent: true,
    }
    

    The haystack-website/docs/generate_redirect_table.py script will generate a set of suggested mappings. In cases where the directory structure has changed but the filename has stayed the same, this script will map from the old link to the new link in latest. In cases where the filename has changed, this script will identify the old link but not provide a suggestion for a new link. Update the MANUAL_REDIRECTS option to define any custom destinations.

  3. Push the changes to your branch and test that the old paths still work and point to the intended destination. You can do this by checking out the Preview that Vercel will produce.

Updating docs after a release

When there's a new Haystack release, we need to create a directory for the new version within the local /docs directory. In this directory, we can write new overview and usage docs in .mdx (or manually copy over the ones from the previous version directory). Once this is done, the project will automatically fetch the reference and tutorial docs for the new version from GitHub. Bear in mind that a menu.json file needs to exist in every new version directory so that our Menu components know which page links to display.

Moreover, we need to point the links, which are pointing to the latest version, to the new version. Update links in docs using haystack-website/docs/update_links.py. The command you run should look something like python update_links.py -d v0.3.0 -v v0.3.0. This script prints the changes to console. Have a scan through these as a sanity check.

Additionally, the referenceFiles and tutorialFiles constants in lib/constants need to be updated with any new reference or tutorial docs that get created as part of a new release. During a release, please add a new object referenceFiles and tutorialFiles with the release number to file. This change has also implications on the files tutorials/[...slug].tsx and reference/[...slug].tsx. Please update the functions getStaticPaths and getStaticProps in both files with an array representing the latest version.

In the haystack repo, we have to release the api and tutorial docs by copying them to a new version folder as well. If you want to include here files from another branch than main follow Preview from non-main branches. Lastly, we have to update the constant specified in the components/VersionSelect component, so that we default to the new version when navigating between pages.

After releasing the docs, we need to release the benchmarks. Create a new version folder in the folder benchmarks and copy all folders from latest to the new folder.

If you now start the local sever and go to the new version, you will see the 404 page. We pull the version from the haystack release tags. Most likely, the newest version is not released yet. Therefore, you have to add it manually to the array tagNames in the function getDocsVersions by adding the command tagNames.push('v0.10.0');.

Styling

We use Tailwind for CSS. It's a CSS utility library, which allows us to write barely any CSS ourselves. The tailwind.config.js file contains configuration to provide classes that match deepset.ai's new style guide. Additionally, there is a styles/global.css file, which loads our custom font provided by the style guide. Lastly, we have two css module files within the components directory (markdown.module.css and tutorial.module.css), wich are applied on the components/Layout component. These files allow us to provide some defaults for certain HTML elements, which get applied to the HTML tags generated when we convert markdown to html at build time. We also use a React component library authored by the Tailwind team, called Headless UI. This allows us to easily create React components such as the components/Tabs and components/Disclosures components.

Deployment

This application gets deployed on Vercel. In the dashboard, connect the haystack-website repo to a new project and it should handle builds, preview environments (all branches other than main), and production environments (main branch) automatically. Be sure to include yarn haystack in the list of build commands.

Future Work

Convert the remote markdown files for references and tutorials to .mdx, so that we can inject React components into these. This would also allow for more code sharing between the overview+usage pages and tutorial+reference pages.

haystack-website's People

Contributors

agnieszka-m avatar baregawi avatar brandenchan avatar c4ndyfl1p avatar dependabot[bot] avatar divya-19 avatar dmigo avatar fstau avatar hsm207 avatar ju-gu avatar julian-risch avatar kgeis avatar kubami avatar masci avatar maxast avatar mayankjobanputra avatar michelbartels avatar mpangrazzi avatar ontolox avatar oryx1729 avatar piffpaffm avatar sjrl avatar stestagg avatar tanaysoni avatar tholor avatar tuanacelik avatar ugm2 avatar ulyssebottello avatar vblagoje avatar zansara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

haystack-website's Issues

Privacy Policy

As a user, I want to read the privacy statement so that I know what is going to happen with my data

+Alex

Small fixes in Documentation Website

  • When window is narrow, the text from Installation Editable can be seen
  • Text over text in DocumentStore/Choosing...
  • Even when on document page, window name is “Haystack Pricing"

[Bug] Mobile formatting

  • Navigation bar seems to be too much on the right
  • Navigation bar doesn't fully stick when scrolling
  • Edit button in docs

Screenshot_20201023_093736_com android chrome

Screenshot_20201023_093707_com android chrome

Screenshot_20201023_093717_com android chrome

Terms and Conditions

As a user, I want to read the terms and conditions so that I know what is legal

+Alex

Extra navigavtion level for API Reference

We should have one additional level in the right navigation for the API Reference section (not for others) that shows the method names of the classes. Otherwise it's very hard to navigate

image

If not possible on the right, then we could add a table of contents on the top of the markdown file - but this would be rather ugly.
Or we remove the level one headers (memory, faiss ...) during the markdown generation and rather have the class names as level 1 and methods as level 2

Aesthetic Changes for Documentation Website

  • Pick better font - try Lato
  • Bullet points seem too spaced out
  • Pick code font
  • Tabbed elements need borders (see DocumentStore/Choosing…)
  • Boxes around tips, tricks, recommendations
  • Make the hierarchy of different header/title levels clearer

Optimize link checker

The link checker run takes at the moment about 40min. Goal shoudl be not more than 20min for a deployment run.

Look into ways to optimize it

Integrate a Haystack Search Bar

Add a search bar to the website and integrate it with the Haystack Hub API. The backend crawls all data from the website and gives back answers.

Could also be somekind of chatbot

Solution pages

Pages:

  • Finacial Governance
  • Portal Search
  • Market & Competitor Intelligence

As a user, I want to figure out a business case for my company so that I can implement it in the right context

Social media feed

We want to see the latest social media activities for haystack on the landing page.

Milvus as Production Example

Look into the open source project Milvus in order to understand their Gatsby setup. Does they use any special themes? How to set up proper naviation and include Markdown files.

Define current deployment processes and automate gaps

Let's define all the steps for editing and deploying our website

  • Create overview on Wiki

  • Document how different sources are handled (e.g. tutorials / docstrings / benchmarks)

  • Define some requirements on a good system (e.g. local development, scripts to locally pull latest changes from Haystack)

  • Definition of internal and external link protocols (e.g. best not to have links that point to github current master since this can change)

  • Plan work on improvements

  • Integrate docstring generation into deployment process (option to deploy it to staging) -> Bring all together in one script (@brandenchan) and trigger this script in Github actions (@PiffPaffM)

  • Integrate Tutorial generation into deployment process (option to deploy it to staging) -> Use Python script in Github Actions (@PiffPaffM)

  • Automate "Change File Process" -> Come up with initial design (@PiffPaffM)

  • Automate "New File Process" -> Come up with initial design (@PiffPaffM)

  • Automate "New Version Process" -> Deployment to staging and then PR for haystack website (@PiffPaffM)

Contact Form

As a user, I want to contact deepset for open questions or feedback so that questions can be clarified.

Docs for Haystack Hub

As a user, I want to know how to use Haystack Hub in detail so that I can use it more efficient.

Pricing page

As a user, I want to know the various pricing options and features within it so that I can choose the right product.

SEO for QA

haystack.deepset.ai should be top result on google.com

Select more than one tab per page

Currently, only one tab can be active per page due to the way of implemetation with links. We want to select more than one tab per page.

Landing page for Haystack Hub

As a user, I want to see all important information for Haystack Hub so that I have an overview about functionality, references etc.

Social Cards for Pages

Extend social card for the different pages of the web site so that we can share them in social media

Add nested tabs to docs

Currently, we only have normal tabs with one level. We want for the languages nested tabs so that is better readable.

Deployment pipeline for documentation

Add deployment pipeline to the documentation. Current process:

  1. Change documentation in haystack core (Text, docstrings or notebooks)
  2. Add new documents to haystack-io
  3. Deploy haystack-io

Trigger Deployment for Docs Changes in haystack

Currently, we have a script in haystack-website to deploy a new version of the website. During this process all markdown files are synced with the last version from haystack master. This should be triggered from the haystack repository for new doc versions.

  • Trigger: Merge docs related changes to master (maybe filter by directory or tag)

Technolegy research

Decide if we want to you Gatsby or React as framework for the development

Add version dropdown

Is your feature request related to a problem? Please describe.
We currently have only a single version of docs that reflects the latest master branch.
This is problematic when people are using a released version (e.g. 0.4.0) and the docs are showing "newer" functions

Describe the solution you'd like

  • Have separate folders for the docs in github (latest, 0.4.0 ...)
  • Have a dropdown on the top left on the website to choose which version to display
  • Editing of all versions is possible (e.g. to fix typos), but docstrings etc. don't change anymore after a release

[Bug] first line in API docs

We currently have a weird headline in all API docs.
Does this come from wrongly converted markdown files or is this about some integration in the website?

image

[EPIC] Documentation Landing Page for Haystack Core

Description

This feature will allow users to understand the concepts of Haystack. Moreover, they will get in introduction how to use it.
We have multiple users using Haystack Core for ther projects. A public available documenation will make it easier to start with Haystack and understand its benefits. This way, the user base will grow faster and existing user will be more successful in implementing the framework in the right way.
Ideally, the design should be enterprise ready and the documentation should be understandable.

Initiative / goal

Describe how this Epic impacts an initiative the business is working on.

Hypothesis

The public documentation will increase the user base and decrease the number of issues which are realted to user errors while setting up Haystack.

Acceptance criteria and must have scope

[] Enterprise ready design
[] Documentation for Usage, API and Architecture

Stakeholders

Branden (Writing documentation)
Malte (Owner)
Markus (Setting up website)

Timeline

The first version should be online on 11/09/2020

Increase line spacing

It's a bit hard to read longer text passages in the docs. Increasing the space between lines a bit would be helpful.

image

Update Documentation Website Content

  • Recommend top-k retriever = 10 top -k reader = 5
  • Check top-k defaults in Haystack
  • Too many headings in "Writing Documents"
  • DocumentStore page needs to talk about Preprocessing, recommended splitting
  • Need to give top-k recommendations
  • Languages other than English deserves own section (Let @PiffPaffM know when new page is created)
  • Write note about ES analyser (tokenizer) which can be language specific
  • Section on Preprocessing
  • RAG
  • Create new bigger groups: Overview vs Usage
  • Create Optimization section that includes top-k, doc length, model choices
  • Add RAG tutorial to website
  • Add RAG tutorial to github readme
  • Technology section maybe deserves to be its own page
  • Copy from site to latest
  • Create an image that shows early preprocessing stages and clarifies terminology (file -> doc -> n_docs)
  • Solve nbconvert issues for Tutorial 7
  • Edit script to download new files @PiffPaffM
  • Long tutorial titles overlap edit button @PiffPaffM
  • Talk about the choice of returning 5 docs of 10K vs 10 docs of 5k
  • Regenerate API docs
  • Rewrite docstring examples in new format

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.