GithubHelp home page GithubHelp logo

mt7180 / quaigle Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 1.0 7.15 MB

RAG-based LLM application and a project to explore different llm frameworks like: llamaindex, marvin and langchain

CSS 0.58% Python 53.05% Jupyter Notebook 45.78% Dockerfile 0.59%

quaigle's Introduction

Quaigle

Quaigle is a RAG-based LLM application and a project to explore different frameworks to provide generative AI with additional knowledge sources outside of its own training data.

frontend_view

In general, Retrieval Augmented Generation (RAG) enables more factual consistency, improves reliability of the generated responses, and helps to mitigate the problem of "hallucinations".

In this project, RAG is used to specialize on a specific context and to answer questions about uploaded documents.

The external data sources can originate from multiple files, and may exist in different formats like:

  • txt-files,
  • pdf-files or
  • websites (currently limited to only static).

These data sources are converted into numerical vector representations, called embeddings, and stored in a vector database.

Vector databases empower semantic search: instead of relying on exact keyword matching, the actual meaning of the query is considered. Through the encoding of the (text) data into meaningful vector representations, the distances between vectors reflect the similarities between the elements. Utilizing algorithms like Approximate Nearest Neighbor (ANN), they enable rapid retrieval of results that closely match the query, facilitating efficient and precise searches.

In addition to RAG, this project also explores querying with functions to interact with

  • SQL databases

using natural language.

Frameworks and Techniques

In this project, the llm frameworks:

  • LangChain,
  • LLamaIndex and
  • Marvin

were examined and utilized according their (current: 11/2023) strengths in the scope of RAG and querying with functions (to interact with a sql database).

LLamaIndex:

LLamaIndex turned out to be the most straight-forward approach to transform external data into embeddings and store them in a persistent vector database and to set up a stateful chat agent. The general approach needed only some lines of code.

LangChain

LangChains approach of chaining together different components/ functions to create more advanced LLM applications like querying a database with natural language turned out to be powerful. The chain characteristic allows to give the llm a specific procedure as instruction, which makes this approach very explicit.

Marvin:

Marvins strenghts are both, classification and structured output. The reason why it is not used for generating the multiple choice questions (which exactly need structured output for the further use in this project) is because of the more difficult approach to connect to the vector database, which is possible, but not straight forward with the pure marvin library. But marvin is used in this project for the extraction and augmentation of metadata about the given text, which is used for the vector index retriever.

The different Tasks:

Tech Stack

Text-Querying with LlamaIndex CondenseQuestionChatEngine

A LlamaIndex CondenseQuestionChatEngine with RetrieverQueryEngine is used to extract relevant context from the vector database, which is then used for querying the openai API.

As a chat engine, it is a high-level interface for having a conversation with your data (multiple back-and-forth instead of a single question & answer) like with ChatGPT, but augmented with your own knowledge base. It is stateful and keeps track of the conversation history.

The following two concepts for performant RAG applications "Decoupling Chunks Used for Retrieval" and "Chunks Used for Synthesis" are both used here: The CondenseQuestionChatEngine is responsible for synthesis, while the RetrieverQueryEngine handles retrieval. The RetrieverQueryEngine uses a VectorIndexRetriever with a VectorStoreIndex, which is based on nodes. These nodes are chunks that were parsed by the SimpleNodeParser.

By using this setup, the chunks used for retrieval (handled by the RetrieverQueryEngine) are decoupled from the chunks used for synthesis (handled by the CondenseQuestionChatEngine). This allows for more efficient and accurate retrieval of relevant documents before retrieving the specific chunks needed for synthesis.

-> https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/dev_practices/production_rag.html

But let's start from the beginning and clearify the tasks of the different components used:

  • A SimpleNodeParser is used first, which is a tool used in the LlamaIndex library to chunk documents into smaller nodes that can be used for indexing and retrieval purposes. It allows for more efficient processing and retrieval of information from large documents. It takes a list of documents and splits them into nodes of a specific size, with each node inheriting the attributes of the original document, such as metadata, text, and metadata templates.

  • The chunking is done using a TokenTextSplitter, with a default chunk size of 1024 tokens and a chunk overlap of 20 tokens.

  • The MetadataExtractor is used in the LlamaIndex library to extract contextual information from documents and add it as metadata to each node.

  • The VectorStoreIndex enables efficient indexing and querying of documents based on vector stores. It is a component that allows for the construction and querying of indexes based on vector stores. It is used to store embeddings for input text chunks and provides a query interface for retrieval, querying, deleting, and persisting the index.
    The VectorStoreIndex can be constructed upon any collection of documents and uses a vector store within the index to store the embeddings. By default, it uses an in-memory SimpleVectorStore that is initialized as part of the default storage context. However, it also supports various other vector stores such as DeepLake, Elasticsearch, Redis, Faiss, Weaviate, Zep, Pinecone, Qdrant, Cassandra, Chroma, Epsilla, Milvus, and Zilliz.
    Once the index is constructed, you can use it for querying by creating a query engine and executing queries:

# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
  • A VectorIndexRetriever is used to retrieve nodes from a VectorStoreIndex based on similarity search and therefore allows for efficient retrieval of similar nodes from the index. It takes in a query vector and returns the most similar nodes from the index.
    Once the VectorIndexRetriever is created, you can use the retrieve() method to perform a similarity search. You pass in the query vector and it returns the most similar nodes from the index.

  • A RetrieverQueryEngine is an end-to-end pipeline that allows you to perform queries and retrieve relevant context from a knowledge base using a retriever. It takes in a natural language query and returns a response along with the reference context retrieved from the knowledge base.
    The RetrieverQueryEngine uses a retriever, which defines how to efficiently retrieve relevant context from a knowledge base when given a query. One example of a retriever is the VectorIndexRetriever, which retrieves nodes from a VectorStoreIndex based on similarity search.
    The RetrieverQueryEngine handles the orchestration of the retrieval process and provides a convenient interface for querying.

  • The CondenseQuestionChatEngine is designed to condense the question in combination with the converstation context to a single representative question to query the query engine. The use case for the CondenseQuestionChatEngine is to improve the performance and accuracy of question-answering systems by reducing redundancy and optimizing the retrieval process. https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_condense_question.html

Generating Multiple-Choice Questions on given context with LlamaIndex and LangChain PydanticOutputParser

In addition to thaving a conversation with your data, Quaigle also offers the possibility to generate multiple-choice questions on the given context. The challenge here was to obtain a structured output from the LLM, which is designed and optimized for the processing and generation of unstructured natural text. For the further programmatic use of the generated information, however, it is necessary to be able to rely on type-safe answers. It turned out that the Pydantic framework is very powerful in forcing the llm to generate an answer in a specific output structure and offers a very clean and explicit way to achieve this. As LlamaIndex was used for RAG in this project, the LLamaIndex LangChain PydanticOutputParser was used here.

Querying SQL Databases with Langchain SQLDatabaseChain

Quaigle supports querying a SQLite database by natural language, which makes databases accessible to everyone. The size of the uploadable database is currently limited by the frontend (max 40MB), but with optimal frontend scaling no limit is set. A Langchain SQLDatabaseChain with Runnables is used to provide natural language answers. For conveniance also the used SQL query is given as output.

Continuous Deployment with GitHub Actions

The whole application is deployed via CI/CD using GitHub Actions and Pulumi for code based creation of the infrastructure (IaC). The backend is deployed to an AWS EC2 instance and the frontend is hosted on fly.io. Changes to the frontend, backend, or infrastructure on the GitHub main branch trigger automated updates. See the full GitHub Actions yml file here: .github/workflows/main.yml

Streamlit User Interface

frontend_view frontend_view frontend_view frontend_view

Future Improvements

quaigle's People

Contributors

mt7180 avatar zaubeerer avatar

Stargazers

Bob Belderbos avatar

Watchers

Ankush Singal avatar  avatar  avatar

Forkers

siri1410

quaigle's Issues

Adapt upload-route to use new Document Class

The class LlamaTextDocument loads and converts a text file into LlamaIndex nodes and a marvin ai_model predicts the text category and gives a short summary of the text.

  • make usage of the LlamaTextDocument class by instanciating it with the UploadFile filename
  • implement callback_manager to get information about the used token amount
  • use the callback manager while classifying and summarizing the text of the document and hand over to logging in debug mode

fix url website reader in backend

with recent installation of packages from requirements.txt, openai api sends a 500 back: "InvalidRequestError: Missing parameter 'name': messages with role 'function' must have a 'name'." when reading some websites (not wikipedia)

Deploy FastAPI backend to AWS using Pulumi

          @mt7180, wow, congratulations!

I would say that you have successfully tested fly, but it may not be the right tool for this job.

So my suggestion is that I give you an intro to pulumi and we set up the backend in AWS with an EC2 instance.

Like such, we can get some beefy machines to run the backend.
We can even set it up with prefect such that the EC2 instance gets automatically started and stopped to save money ;-)

Would you like to explore that right now or first tackle the second app and come back to this later

Originally posted by @Zaubeerer in #45 (comment)

Detail backend (api) for qa on text files

  • write TextDocument class to load text files and convert them into Llama-nodes
  • write Custom ChatEngine class to use and customize VectorindexRetriever, load document nodes into it
  • test main functionality in script with if name == "main"

create test on upload route functionality processing file AND str

          I totally messed up here, since I didn't recognize, that the url upload works, but the file upload not anymore. Currently working on a fix, but until now wasn't able to resolve the 422 Unprocessible Entity. I Found out that it is not possible to combine Uploadfile and string url in one Basemodel and also as separate route function parameters it does not work, yet ... working on this ...

Originally posted by @mt7180 in #29 (comment)

Compare results/ ai answers for different ai wrappers

LLamaIndex turned out to be the most straight-forward approach to transform external data into embeddings and store them in a persistent vector database and to set up a stateful chat agent. The general approach needs only some lines of code. Nevertheless, since the development of the AI Frameworks is a currently moving very fast, new chat engine interfaces may evolve which are even more suitable here. Investigate whether there are meanwhile better implementations for a Chat Engine

implement quiz function on texts/ html

The streamlit frontend has a quiz section, where a user should be able to ask to be quizzed on the submitted txt/ html-url.

  • check if a text/ html is submitted via uploader or url on frontend
  • implement route on backend to post questions and multiple choice answers
  • implement request for questions on frontend
  • implement functionality for multiple choice test on frontend

fix streamlit frontend with new streamlit version

  • options menu disappears with newest streamlit options menu version
  • chat_input is not fixed to the bottom anymore for newest streamlit version (needs to be outside the container to be fixed to the bottom)

investigate how to leverage marvin to make qa bot

  • Is marvin able do get access to a (sqlite) database and answer questions about it?
  • is it possible to tell marvin to locally save (in session) the vector database somewhere, so that the text does not have to be converted every time again?
  • is it possible to generate python code with marvin to make diagrams?
  • is langchain possibly the better choice, a langchain POC was already sucessful

fix MARVIN_OPENAI_API_KEY error

something changed so that marvin doesn't get the open_api_key anymore. The other frameworks like langchain and llama index have still access ....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.