GithubHelp home page GithubHelp logo

mhg777 / 3 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aigptcode/askyourdocuments

0.0 0.0 0.0 25 KB

Welcome to the Document QA system! This repository contains the code for a system that allows you to ask questions about your documents and get answers based on their contents. It supports a wide range of document formats, including PDF, Word, Excel, PowerPoint, text files, and even images!

Python 100.00%

3's Introduction

๐Ÿ’ป Ask your Documents ๐Ÿค–

๐Ÿ‘‹ Welcome to the Document QA system! This repository contains the code for a system that allows you to ask questions about your documents and get answers based on their contents. It supports a wide range of document formats, including PDF, Word, Excel, PowerPoint, text files, and even images!

IMG-1413

๐Ÿš€ Features

  • ๐Ÿ’ป Supports a variety of document formats, including PDF, Word, Excel, PowerPoint, text files, and images
  • ๐Ÿค– Uses the Hugging Face Transformers library to create embeddings for document chunks
  • ๐Ÿ” Uses the FAISS library to create an index for those embeddings, allowing for efficient similarity search
  • ๐Ÿ’ฌ Allows users to ask questions about their documents and get answers based on the contents of those documents
  • โšก๏ธ Uses multiprocessing to parallelize the creation of the index for improved performance

๐Ÿ“‹ Requirements

  • Python 3.6 or higher
  • The following Python packages:
    • transformers
    • langchain
    • fitz
    • Pillow
    • textract
    • pandas
    • python-pptx
    • concurrent-futures
    • opencv-python (for image support)

๐Ÿ”ง Usage

  1. Clone this repository to your local machine:
git clone https://github.com/AiGptCode/AskyourDocuments.git
  1. Install the required Python packages:
pip install transformers langchain fitz pillow textract pandas python-pptx opencv-python concurrent-futures
  1. Set your Hugging Face API key as an environment variable:
export HUGGINGFACE_API_TOKEN=your-api-key
  1. Run the main.py script and enter the path to the directory containing your documents:
python AskyourDocuments.py
  1. Ask a question about your documents and get an answer based on the contents of those documents.

Note: If you want to include images in your search, make sure they are in a supported format (e.g., JPEG, PNG) and are located in the same directory as your other documents.

๐Ÿค Contributing

If you would like to contribute to this project, please follow these steps:

  1. Fork this repository to your own GitHub account.
  2. Create a new branch for your changes:
git checkout -b my-feature-branch
  1. Make your changes and commit them:
git commit -am 'Add some feature'
  1. Push your changes to your fork:
git push origin my-feature-branch
  1. Open a pull request against the original repository.

๐Ÿ“„ License

This project is licensed under the MIT License.

๐ŸŽ‰ Acknowledgments

  • The Hugging Face Transformers library for providing pre-trained models and tokenizers
  • The FAISS library for providing efficient similarity search and clustering of dense vectors
  • The langchain library for providing utilities for creating and working with language models
  • The fitz library for providing utilities for working with PDF files
  • The Pillow library for providing utilities for working with image files
  • The textract library for providing utilities for extracting text from various file formats
  • The pandas library for providing utilities for working with tabular data in Python
  • The python-pptx library for providing utilities for working with PowerPoint files
  • The concurrent-futures library for providing a high-level interface for asynchronously executing callables
  • The opencv-python library for providing utilities for working with image and video data (for image support)

3's People

Contributors

aigptcode avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.