Integrated with Langchain to utilize OpenAI API or Llama to read PDF.
The project include several components and they can be visualized as below.
Backend
Two major components for the backend server are Flask and Langchain. Flask is used to build all REST endpoints and Langchain is used to interact with LLMs.
PostgreSQL
Postgres is used to store metadata including user information and PDF metadata that user uploaded.
AWS S3
All uploaded PDFs are stored in S3.
Pinecone
Pinecone as a vector database is used to store vector embedding of each PDF document, which is then used for retrieving relevant contents from PDF based on user's questions.
Make sure you have the following items downloaded.
- Python3
- Docker
- create a
.env
file with global variables fromenv-var-list
. - Create a virtual environment.
python -m venv .venv
- Activate virtual environment with
source .venv/bin/activate
- Start up Postgres and Redis instance with
inv run-local-infra
. - Initialize database with
inv init-db
. - Start the application with
inv dev
. - Open a new terminal (make sure to activate the virtual env), start celery worker with
inv devworker