GithubHelp home page GithubHelp logo

limcheekin / embedchain Goto Github PK

View Code? Open in Web Editor NEW

This project forked from embedchain/embedchain

0.0 1.0 0.0 44 KB

Framework to easily create LLM powered bots over any dataset.

Home Page: https://twitter.com/taranjeetio/status/1671539269775634437

License: Apache License 2.0

Python 62.59% Jupyter Notebook 37.41%

embedchain's Introduction

embedchain

PyPI

This is the fork of Embedchain to run with OpenAI API compatible llama-cpp-python Web Server. Hence, you can run the Embedchain with any LLMs supported by llama-cpp-python package.

The notebook embedchain.ipynb is created to quick test the integrations is working fine. It is tested with orca-mini-7b.ggmlv3.q4_0.

Use of HuggingFaceEmbeddings(model_name="intfloat/e5-large-v2") instead of OpenAIEmbeddings as the embedding endpoint of the llama-cpp-python Web Server is too slow and using too much compute resources to be usable. Hence, you need to install additional dependency, the sentence_transformers package in order to run the notebook successfully.

embedchain is a framework to easily create LLM powered bots over any dataset.

It abstracts the entire process of loading a dataset, chunking it, creating embeddings and then storing in a vector database.

You can add a single or multiple dataset using .add and .add_local function and then use .query function to find an answer from the added datasets.

If you want to create a Naval Ravikant bot which has 1 youtube video, 1 book as pdf and 2 of his blog posts, as well as a question and answer pair you supply, all you need to do is add the links to the videos, pdf and blog posts and the QnA pair and embedchain will create a bot for you.

from embedchain import App

naval_chat_bot = App()

# Embed Online Resources
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/agi")

# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))

naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.

Getting Started

Installation

First make sure that you have the package installed. If not, then install it using pip

pip install embedchain

Usage

  • We use OpenAI's embedding model to create embeddings for chunks and ChatGPT API as LLM to get answer given the relevant docs. Make sure that you have an OpenAI account and an API key. If you have dont have an API key, you can create one by visiting this link.

  • Once you have the API key, set it in an environment variable called OPENAI_API_KEY

import os
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
  • Next import the App class from embedchain and use .add function to add any dataset.
from embedchain import App

naval_chat_bot = App()

# Embed Online Resources
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/agi")

# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
  • If there is any other app instance in your script or app, you can change the import as
from embedchain import App as EmbedChainApp

# or

from embedchain import App as ECApp
  • Now your app is created. You can use .query function to get the answer for any query.
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.

Format supported

We support the following formats:

Youtube Video

To add any youtube video to your app, use the data_type (first argument to .add) as youtube_video. Eg:

app.add('youtube_video', 'a_valid_youtube_url_here')

PDF File

To add any pdf file, use the data_type as pdf_file. Eg:

app.add('pdf_file', 'a_valid_url_where_pdf_file_can_be_accessed')

Note that we do not support password protected pdfs.

Web Page

To add any web page, use the data_type as web_page. Eg:

app.add('web_page', 'a_valid_web_page_url')

Text

To supply your own text, use the data_type as text and enter a string. The text is not processed, this can be very versatile. Eg:

app.add_local('text', 'Seek wealth, not money or status. Wealth is having assets that earn while you sleep. Money is how we transfer time and wealth. Status is your place in the social hierarchy.')

Note: This is not used in the examples because in most cases you will supply a whole paragraph or file, which did not fit.

QnA Pair

To supply your own QnA pair, use the data_type as qna_pair and enter a tuple. Eg:

app.add_local('qna_pair', ("Question", "Answer"))

More Formats coming soon

  • If you want to add any other format, please create an issue and we will add it to the list of supported formats.

How does it work?

Creating a chat bot over any dataset needs the following steps to happen

  • load the data
  • create meaningful chunks
  • create embeddigns for each chunk
  • store the chunks in vector database

Whenever a user asks any query, following process happens to find the answer for the query

  • create the embedding for query
  • find similar documents for this query from vector database
  • pass similar documents as context to LLM to get the final answer.

The process of loading the dataset and then querying involves multiple steps and each steps has nuances of it is own.

  • How should I chunk the data? What is a meaningful chunk size?
  • How should I create embeddings for each chunk? Which embedding model should I use?
  • How should I store the chunks in vector database? Which vector database should I use?
  • Should I store meta data along with the embeddings?
  • How should I find similar documents for a query? Which ranking model should I use?

These questions may be trivial for some but for a lot of us, it needs research, experimentation and time to find out the accurate answers.

embedchain is a framework which takes care of all these nuances and provides a simple interface to create bots over any dataset.

In the first release, we are making it easier for anyone to get a chatbot over any dataset up and running in less than a minute. All you need to do is create an app instance, add the data sets using .add function and then use .query function to get the relevant answer.

Tech Stack

embedchain is built on the following stack:

Author

embedchain's People

Contributors

cachho avatar dumoedss avatar limcheekin avatar mrbusche avatar satya131113 avatar taranjeet avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.