GithubHelp home page GithubHelp logo

anshkathpal / documentor-pdfchatbot Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 228.08 MB

DocuMentor is a sophisticated chatbot application designed to assist users in extracting valuable information from uploaded PDF documents. Users can upload PDF files, chat with the AI chatbot to ask questions or seek information related to the document, and receive well-informed responses.

Home Page: https://documentor-ai.vercel.app/

PowerShell 0.01% Shell 0.01% Python 97.11% C 0.66% Cython 0.69% Jupyter Notebook 0.48% C++ 0.44% Batchfile 0.01% Assembly 0.01% Meson 0.01% Fortran 0.04% Forth 0.01% ANTLR 0.01% Perl 0.01% XSLT 0.48% CSS 0.02% Roff 0.02% JavaScript 0.03% HTML 0.01%
chatgpt chromadb embeddings flask langchain openai python reactjs vectorsearch

documentor-pdfchatbot's Introduction

DocuMentor PDF Chatbot Readme

Description

DocuMentor is a sophisticated chatbot application designed to assist users in extracting valuable information from uploaded PDF documents. Users can upload PDF files, chat with the AI chatbot to ask questions or seek information related to the document, and receive well-informed responses. This readme provides an overview of the DocuMentor PDF Chatbot, including its features and the technology stack used.

Features

  • PDF Upload: Users can upload PDF documents for analysis and conversation with the AI chatbot.

  • AI Chatbot: Engage in a chat conversation with the AI chatbot to ask questions or discuss the content of the PDF.

  • Document Analysis: The chatbot creates chunks and embeddings to analyze the document and understand its content.

  • Similarity Search: Utilize Langchain for similarity search to find related content within the document.

  • ChromaDB Integration: Store vector searches in ChromaDB for efficient retrieval of similar content.

Tech Stack

Frontend

  • React: The user interface of DocuMentor is built using React, offering a modern and responsive design.

  • Chakra UI: Chakra UI provides a set of accessible and customizable components for creating a visually appealing and user-friendly interface.

Backend

  • Python Flask: The server-side logic of the chatbot is implemented using Flask, a micro web framework for Python.

Packages and Technologies

  • Langchain: Langchain is used for creating embeddings and performing similarity searches.

  • OpenAI: OpenAI's ChatGPT model 3.5 powers the chatbot, offering natural language understanding and generation capabilities.

  • Embeddings: Embeddings are generated to analyze and represent the content of the PDF.

  • Tiktoken: Tiktoken is used for tokenization and counting words in the text.

  • PyPDF: PyPDF is used for parsing and extracting text from PDF documents.

Database

  • ChromaDB: ChromaDB is integrated to store vector searches for efficient retrieval and similarity searching.
Screenshot 2023-10-21 at 2 55 40 AM Screenshot 2023-10-21 at 2 57 41 AM Screenshot 2023-10-21 at 2 58 35 AM Screenshot 2023-10-21 at 3 00 22 AM

Installation and Setup

  1. Clone the repository from GitHub.

  2. Navigate to the project directory and install the required dependencies for both the frontend and backend using npm install for React and pip install -r requirements.txt for Python.

  3. Set up a database connection to ChromaDB, and configure the database settings in the backend.

  4. Create environment variables for sensitive information, such as API keys and database connections.

  5. Start the frontend and backend servers using npm start for React and python app.py for Python Flask.

  6. Access the DocuMentor PDF Chatbot via a web browser by navigating to the specified URL (usually http://localhost:3000).

Usage

  1. Open the DocuMentor PDF Chatbot in your web browser.

  2. Upload a PDF document for analysis and conversation.

  3. Engage in a chat conversation with the AI chatbot to ask questions or discuss the content of the PDF.

  4. The chatbot will analyze the document, create embeddings, and perform similarity searches to provide informed responses.

  5. Store vector searches in ChromaDB for efficient retrieval of similar content in the future.

  6. Use DocuMentor to unlock valuable insights from your PDF documents.

Contributing

Contributions to the DocuMentor PDF Chatbot are welcome. Please follow the guidelines outlined in the CONTRIBUTING.md file.

License

This project is open-source and available under the MIT License.

Author

  • Ansh Kathpal

Acknowledgments

Special thanks to the React, Flask, and OpenAI communities for providing resources and libraries that made this advanced PDF chatbot possible.

documentor-pdfchatbot's People

Contributors

anshkathpal avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.