GithubHelp home page GithubHelp logo

mayooear / private-chatbot-mpt30b-langchain Goto Github PK

View Code? Open in Web Editor NEW
178.0 4.0 45.0 170 KB

Chat with your data privately using MPT-30b

Home Page: https://www.youtube.com/watch?v=CVd2XHcctJo

License: MIT License

Makefile 2.74% Python 97.26%
ggml gpt langchain llm

private-chatbot-mpt30b-langchain's Introduction

Chat with your documents privately without internet using MPT-30B & Langchain

MPT-30B is a powerful open-source model trained with a 8k context length and outperforms the original GPT-3. Announcement

Using the quantized version of MPT-30B, you can chat with your documents privately on your own computer without internet connection.

Requirements

Minimum system specs with 32GB of ram and python 3.10.

Installation

  1. Install poetry

pip install poetry

  1. Clone the repo

git clone {insert github repo url}

  1. Install project dependencies

poetry install

  1. Copy the .env.example file to .env

cp .env.example .env

  1. Download the model (approx. 19GB)

python download_model.py

or visit here and download the file. Then create a models folder in the root directory and place the file in there.

  1. Ingest the docs you want to 'chat' with

By default this repo a source_documents folder to store the documents to be ingested. You can replace the documents in there with your own.

Supported document extensions include:

  • .csv: CSV,
  • .docx: Word Document,
  • .doc: Word Document,
  • .eml: Email,
  • .epub: EPub,
  • .html: HTML File,
  • .md: Markdown,
  • .pdf: Portable Document Format (PDF),
  • .pptx : PowerPoint Document,
  • .txt: Text file (UTF-8),

Then run this script to ingest

python ingest.py

Output should look like this:

Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00,  1.73s/it]
Loaded 1 new documents from source_documents
Split into 90 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run question_answer_docs.py to query your documents

It will create a db folder containing the local vectorstore. Will take 20-30 seconds per document, depending on the size of the document. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database. If you want to start from an empty database, delete the db folder.

Note: during the ingest process no data leaves your local environment. You could ingest without an internet connection, except for the first time you run the ingest script, when the embeddings model is downloaded.

  1. Chat with your documents

Run these scripts to ask a question and get an answer from your documents:

First, load the command line:

poetry run python question_answer_docs.py`

or

make qa

Second, wait to see the command line ask for Enter a question: input. Type in your question and press enter.

Type exit to finish the script.

Note: Depending on the memory of your computer, prompt request, and number of chunks returned from the source docs, it may take anywhere from 40 to 300 seconds for the model to respond to your prompt.

You can use this chatbot without internet connection.

[Optional] Run the plain chatbot

If you don't want to chat with your docs and would prefer to simply interact with the MPT-30b chatbot, you can skip the ingestion phase and simply run the chatbot script.

poetry run python chat.py`

or

make chat

Credits

Credit to abacaj for the original template here Credit to imartinez for the privateGPT ingest logic and docs guidance here Credit to TheBloke for the MPT-30B GGML model here

private-chatbot-mpt30b-langchain's People

Contributors

mayooear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

private-chatbot-mpt30b-langchain's Issues

Testing in a macbook pro

Thanks for sharing the model, I have been able to test it on my macbook pro, i9 with 32 GB of ram. I notice that the cpu goes to 400% when inferring the answer, and the gpu goes to 0%. Is it possible to make the model use the gpu? Radeon Pro Vega 20 4GB

Thanks and questions...

@mayooear thanks for this nice project. Very helpful. Have a couple of items:

  1. The current setup is missing a required package: sentence_transformers
  2. Any thoughts on how to run this in GPU?

poetry install error m1

poetry install


Installing dependencies from lock file

Package operations: 54 installs, 0 updates, 0 removals

  • Installing scipy (1.9.3): Failed

  ChefBuildError

  Backend subprocess exited when trying to invoke build_wheel
  
  + meson setup /private/var/folders/1l/2jbdtw9n0qj2wyxkr1cz2lhh0000gn/T/tmp5x2_aw23/scipy-1.9.3 /private/var/folders/1l/2jbdtw9n0qj2wyxkr1cz2lhh0000gn/T/tmp5x2_aw23/scipy-1.9.3/.mesonpy-1p55ixp1/build -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/private/var/folders/1l/2jbdtw9n0qj2wyxkr1cz2lhh0000gn/T/tmp5x2_aw23/scipy-1.9.3/.mesonpy-1p55ixp1/build/meson-python-native-file.ini
  The Meson build system
  Version: 1.2.0
  Source dir: /private/var/folders/1l/2jbdtw9n0qj2wyxkr1cz2lhh0000gn/T/tmp5x2_aw23/scipy-1.9.3
  Build dir: /private/var/folders/1l/2jbdtw9n0qj2wyxkr1cz2lhh0000gn/T/tmp5x2_aw23/scipy-1.9.3/.mesonpy-1p55ixp1/build
  Build type: native build
  Project name: SciPy
  Project version: 1.9.3
  C compiler for the host machine: cc (clang 12.0.5 "Apple clang version 12.0.5 (clang-1205.0.22.9)")
  C linker for the host machine: cc ld64 650.9
  C++ compiler for the host machine: c++ (clang 12.0.5 "Apple clang version 12.0.5 (clang-1205.0.22.9)")
  C++ linker for the host machine: c++ ld64 650.9
  Host machine cpu family: aarch64
  Host machine cpu: aarch64
  Compiler for C supports arguments -Wno-unused-but-set-variable: NO 
  Compiler for C supports arguments -Wno-unused-but-set-variable: NO (cached)
  Compiler for C supports arguments -Wno-unused-function: YES 
  Compiler for C supports arguments -Wno-conversion: YES 
  Compiler for C supports arguments -Wno-misleading-indentation: YES 
  Compiler for C supports arguments -Wno-incompatible-pointer-types: YES 
  Library m found: YES
  
  ../../meson.build:57:0: ERROR: Unknown compiler(s): [['gfortran'], ['flang'], ['nvfortran'], ['pgfortran'], ['ifort'], ['ifx'], ['g95']]
  The following exception(s) were encountered:
  Running `gfortran --version` gave "[Errno 2] No such file or directory: 'gfortran'"
  Running `gfortran -V` gave "[Errno 2] No such file or directory: 'gfortran'"
  Running `flang --version` gave "[Errno 2] No such file or directory: 'flang'"
  Running `flang -V` gave "[Errno 2] No such file or directory: 'flang'"
  Running `nvfortran --version` gave "[Errno 2] No such file or directory: 'nvfortran'"
  Running `nvfortran -V` gave "[Errno 2] No such file or directory: 'nvfortran'"
  Running `pgfortran --version` gave "[Errno 2] No such file or directory: 'pgfortran'"
  Running `pgfortran -V` gave "[Errno 2] No such file or directory: 'pgfortran'"
  Running `ifort --version` gave "[Errno 2] No such file or directory: 'ifort'"
  Running `ifort -V` gave "[Errno 2] No such file or directory: 'ifort'"
  Running `ifx --version` gave "[Errno 2] No such file or directory: 'ifx'"
  Running `ifx -V` gave "[Errno 2] No such file or directory: 'ifx'"
  Running `g95 --version` gave "[Errno 2] No such file or directory: 'g95'"
  Running `g95 -V` gave "[Errno 2] No such file or directory: 'g95'"
  
  A full log can be found at /private/var/folders/1l/2jbdtw9n0qj2wyxkr1cz2lhh0000gn/T/tmp5x2_aw23/scipy-1.9.3/.mesonpy-1p55ixp1/build/meson-logs/meson-log.txt
  

  at /opt/homebrew/Cellar/poetry/1.5.1_1/libexec/lib/python3.11/site-packages/poetry/installation/chef.py:147 in _prepare
      143│ 
      144│                 error = ChefBuildError("\n\n".join(message_parts))
      145│ 
      146│             if error is not None:
    → 147│                 raise error from None
      148│ 
      149│             return path
      150│ 
      151│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with scipy (1.9.3) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "scipy (==1.9.3)"'.

NVIDIA CUDA Toolkit libraries error

I am following all the instructions but I am getting the following error when I run python ingest.py.

Traceback (most recent call last):
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/torch/init.py", line 168, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/ctypes/init.py", line 376, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcufft.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/skhani/Documents/Recent/LLM/private-chatbot-mpt30b-langchain-main/ingest.py", line 162, in
main()
File "/home/skhani/Documents/Recent/LLM/private-chatbot-mpt30b-langchain-main/ingest.py", line 126, in main
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/langchain/embeddings/huggingface.py", line 51, in init
import sentence_transformers
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/sentence_transformers/init.py", line 3, in
from .datasets import SentencesDataset, ParallelSentencesDataset
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/sentence_transformers/datasets/init.py", line 1, in
from .DenoisingAutoEncoderDataset import DenoisingAutoEncoderDataset
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/sentence_transformers/datasets/DenoisingAutoEncoderDataset.py", line 1, in
from torch.utils.data import Dataset
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/torch/init.py", line 228, in
_load_global_deps()
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/torch/init.py", line 189, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages/torch/init.py", line 154, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libcublas.so.*[0-9] not found in the system path ['/home/skhani/Documents/Recent/LLM/private-chatbot-mpt30b-langchain-main', '/home/skhani/anaconda3/envs/llm/lib/python311.zip', '/home/skhani/anaconda3/envs/llm/lib/python3.11', '/home/skhani/anaconda3/envs/llm/lib/python3.11/lib-dynload', '/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages']
(llm) skhani@skhani-Precision-7760 ~/Documents/Recent/LLM/private-chatbot-mpt30b-langchain-main$ poetry install
Installing dependencies from lock file

No dependencies to install or update
(llm) skhani@skhani-Precision-7760 ~/Documents/Recent/LLM/priv
ate-chatbot-mpt30b-langchain-main$ on311.zip', '/home/skhani/anaconda3/envs/llm/lib/python3.11', '/home/skhani/anaconda3/envs/llm/lib/python3.11/lib-dynload', '/home/skhani/anaconda3/envs/llm/lib/python3.11/site-packages']khani/anaconda3/envs/llm/lib/python3.11/site-packages']

trouble during setup

hi
Great project, wanted to try it, but having these problems during setup:

private-chatbot-mpt30b-langchain$ python ingest.py
Traceback (most recent call last):
File "/home/im/Downloads/private-chatbot-mpt30b-langchain/ingest.py", line 6, in
from dotenv import load_dotenv
ModuleNotFoundError: No module named 'dotenv'

and:

private-chatbot-mpt30b-langchain$ poetry run python chat.py
Traceback (most recent call last):
File "/home/im/Downloads/private-chatbot-mpt30b-langchain/chat.py", line 5, in
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
ModuleNotFoundError: No module named 'langchain'

Am I missing some dependencies or smth else? Seems like I might be missing smth related to LangChain (but already did: pip install langchain)?. rgds, IM

Error trying ingest.py

When I try to run "python3 ingest.py" I get this error:

Traceback (most recent call last):
  File "/home/ubuntu/private-chatbot-mpt30b-langchain/ingest.py", line 7, in <module>
    from langchain.docstore.document import Document
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/__init__.py", line 6, in <module>
    from langchain.agents import MRKLChain, ReActChain, SelfAskWithSearchChain
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/agents/__init__.py", line 2, in <module>
    from langchain.agents.agent import (
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 16, in <module>
    from langchain.agents.tools import InvalidTool
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/agents/tools.py", line 8, in <module>
    from langchain.tools.base import BaseTool, Tool, tool
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/tools/__init__.py", line 3, in <module>
    from langchain.tools.arxiv.tool import ArxivQueryRun
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/tools/arxiv/tool.py", line 12, in <module>
    from langchain.utilities.arxiv import ArxivAPIWrapper
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/utilities/__init__.py", line 3, in <module>
    from langchain.utilities.apify import ApifyWrapper
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/utilities/apify.py", line 5, in <module>
    from langchain.document_loaders import ApifyDatasetLoader
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/document_loaders/__init__.py", line 44, in <module>
    from langchain.document_loaders.embaas import EmbaasBlobLoader, EmbaasLoader
  File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain/document_loaders/embaas.py", line 54, in <module>
    class BaseEmbaasLoader(BaseModel):
  File "pydantic/main.py", line 204, in pydantic.main.ModelMetaclass.__new__
  File "pydantic/fields.py", line 488, in pydantic.fields.ModelField.infer
  File "pydantic/fields.py", line 419, in pydantic.fields.ModelField.__init__
  File "pydantic/fields.py", line 539, in pydantic.fields.ModelField.prepare
  File "pydantic/fields.py", line 801, in pydantic.fields.ModelField.populate_validators
  File "pydantic/validators.py", line 696, in find_validators
  File "pydantic/validators.py", line 585, in pydantic.validators.make_typeddict_validator
  File "pydantic/annotated_types.py", line 35, in pydantic.annotated_types.create_model_from_typeddict
  File "pydantic/main.py", line 972, in pydantic.main.create_model
  File "pydantic/main.py", line 204, in pydantic.main.ModelMetaclass.__new__
  File "pydantic/fields.py", line 488, in pydantic.fields.ModelField.infer
  File "pydantic/fields.py", line 419, in pydantic.fields.ModelField.__init__
  File "pydantic/fields.py", line 534, in pydantic.fields.ModelField.prepare
  File "pydantic/fields.py", line 638, in pydantic.fields.ModelField._type_analysis
  File "/usr/lib/python3.10/typing.py", line 1158, in __subclasscheck__
    return issubclass(cls, self.__origin__)
TypeError: issubclass() arg 1 must be a class

I have everything correctly installed, using python 3.10.6 and I have enough RAM, Storage and CPU.

I'm running it on an AWS machine, on ubuntu, but there shouldn't be any problem, right?

Thanks for the work you do, I hope you can help me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.