GithubHelp home page GithubHelp logo

rag's Introduction

RAG

Project Summary: Intelligent Document Analysis with Retrieval-Augmented Generation (RAG) and Vector Search

This open-source project leverages Optical Character Recognition (OCR) to convert files in various formats (PDF, TIFF, PNG, JPEG) into text. It integrates Retrieval-Augmented Generation (RAG) for extracting relevant attributes from the text. The core functionality involves taking a query text as input, performing a vector search to identify relevant parts of the file, and using Large Language Model (LLM) providers such as OpenAI, KIMI, and Tencent Hunyuan to generate answers from the search results.

Feature List

Feature Description
File Upload Facilitates the upload of files in supported formats for processing.
Multi-format OCR Supports OCR for PDF, TIFF, PNG, and JPEG files, converting them into text.
Vector Search Performs vector search to identify relevant parts of the text based on embeddings.
LLM Integration Integrates with LLM providers like OpenAI, KIMI, and Tencent Hunyuan for generating responses.
Embedding-based Retrieval Uses vector embeddings for accurate and efficient information retrieval.

Getting Started with RAG

install with Docker

  1. Clone the repos

  2. Set neccessary environment variables

    Make sure to set your required environment variables in the .env file. You can read more about how to set them up in the API Keys Section

  3. Deploy using Docker

    With Docker installed and the rag repository cloned, navigate to the directory containing the Dockerfile in your terminal or command prompt. Run the following command to start the rag application in detached mode, which allows it to run in the background:

# clone rag repo
git clone https://github.com/likid1412/rag

# navigate to rag
cd rag

# build, will download the necessary Docker images
docker build -t rag .

# run and start rag
docker run --env-file .env -dt --name rag -p 80:80 rag

# check rag logs, once success, you should see `Application startup complete.`
docker container logs rag

Remember, Docker must be installed on your system to use this method. For installation instructions and more details about Docker, visit the official Docker documentation.

You can read FastAPI in Containers for a quick start.

  1. Access rag
  1. logs

    We will send logged messages to app.log file and the stdout using loguru

  • For app.log file, it will located at /rag/app.log
  • For stadout, you can check it use command such as docker container logs -f rag, use docker container logs --help to read more

API Keys

Before starting rag you'll need to configure access to various components depending on your chosen technologies, such as OpenAI, hunyuan, and Kimi via an .env file. Create this .env in the same directory you want to start rag in. Check the .env.example as example.

Make sure to only set environment variables you intend to use, environment variables with missing or incorrect values may lead to errors.

Below is a comprehensive list of the API keys and variables you may require:

Environment Variable Value Description
MINIO_ENDPOINT the endpoint to your minio storage See Minio as local storage
MINIO_ACCESS_KEY Minio access key See Minio as local storage
MINIO_SECRET_KEY Minio secret key See Minio as local storage
--- --- ---
TENCENT_VECTOR_URL URL for Tencent Vector Database Access to Tencent Vector Database
TENCENT_VECTOR_USER Username for Tencent Vector Database Access to Tencent Vector Database
TENCENT_VECTOR_KEY API Key for Tencent Vector Database Access to Tencent Vector Database
--- --- ---
TENCENTCLOUD_SECRET_ID Tencent Cloud Secret ID for Tencent hunyuan LLM Access to Tencent API for Tencent hunyuan LLM
TENCENTCLOUD_SECRET_KEY Tencent Cloud Secret Key for Tencent hunyuan LLM Access to Tencent API for Tencent hunyuan LLM
TENCENT_MODEL Tencent HunYuan Model name Tencent hunyuan model
--- --- ---
API_KEY OpenAI SDK API Key Accee OpenAI or compatible LLM Provider API Key such as Kimi
BASE_URL OpenAI SDK Base URL Accee OpenAI or compatible LLM Provider API Key such as Kimi
MODEL OpenAI SDK Model name Model of OpenAI or compatible LLM Provider

Storage

Use minio as local storage, see Minio as local storage for more detail

Embedding

OpenAI embedding

You can get it from OpenAI

Tencent hunyuan embedding

Check hunyuan-embedding-API for more detail.

You can find instructions for obtaining a key here

Vector Database

You can get it from Tencent Vector Database

LLM providers

OpenAI

You can get it from OpenAI

Kimi (Moonshot)

Check Moonshot for more detail.

You can find instructions for obtaining a key here

Tencent hunyuan

Check hunyuan for more detail.

You can find instructions for obtaining a key here

Endpoint usage examples

Once you have access to rag, you can interact with API using the Interactive API docs, below is the endpoint usage examples.

File Upload Endpoint

Functionality

  • Accepts one or more file uploads (limited to pdf, tiff, png,jpeg formats).
  • Saves the processed file to storage (e.g, MinIO) solution, returning one or more unique file identifiers or signed URLs for the upload.

Usage example

  • Read the alternative automatic documentation for more Upload - ReDoc
  • Try it out: File Upload Endpoint: /upload
  • Click the Add string item, and choose file to upload, and will return uploaded file info with the original file name from uploaded file, unique file id, signed URL and unique file name which you can search in minio

OCR Endpoint

Functionality

  • Running an OCR service on the file downloaded from the signed_url
  • Process OCR results with embedding models (e.g, OpenAI, Tencent hunyuan)
  • Upload the embeddings to a vector database (e.g, Pinecone, Tencent Vector Database) for future searches.

Usage example

  • Read the alternative automatic documentation for more Ocr - ReDoc
  • Try it out: OCR Endpoint: /ocr
  • Fill the signed_url value with the url got from upload endppoint, this endpoint return immediately, because it will take some times, doing several tasks in the background mention above.
  • The return result look like below, you can check progress using Get OCR Progress Endpoint :

Get OCR Progress Endpoint

Functionality

  • Get ocr progress

Usage example

Attribute Extraction Endpoint

Functionality

  • Takes a query text and file_id as input, performs a vector search and returns relevanted text based on the embeddings.
  • Chat with LLM provider (e.g, OpenAI, Tencent hunyuan) to generate the answer from the search result.

Usage example

  • Read the alternative automatic documentation for more Extract - ReDoc
  • Try it out: Attribute Extraction Endpoint: /extract
  • Takes a query text and file_id as input, choose LLM provider api (OpenAI or hunyuan), return the answer from query using LLM and relevant texts search from vector database which related to the file_id
    • For OpenAI api, can use OpenAI model or compatible LLM Provider model such as Kimi
    • For hunyuan api, can use Tencent hunyuan model

TODO

One More Thing

About Chunking Strategies

Seems the oce result has divided the content based on its structure and hierarchy, which is the paragraphs, resulting in more semantically coherent chunks, we can simple use Fixed-size chunking base on the paragraphs.

read more: Chunking Strategies for LLM Applications | Pinecone

rag's People

Contributors

likid1412 avatar

Watchers

 avatar Ayan Sengupta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.