GithubHelp home page GithubHelp logo

kevingarabedian / gpt3_document_search Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 7 KB

Search PDF documents using GPT-3 and the OpenAI API with this Flask app. Automatically generate embeddings and build a search index for fast and accurate searching, with results returned in JSON format. Uses Docker and a shared volume for efficient caching.

License: MIT License

Python 90.98% Dockerfile 9.02%

gpt3_document_search's Introduction

Document Search with GPT-3 and OpenAI API

This Flask app uses GPT-3 and the OpenAI API to generate embeddings and build a search index for a PDF document, and allows users to search for specific text within the document. The app uses a shared Docker volume to cache the embeddings and search index for faster searching.

Installation and Setup

  1. Clone the repository to your local machine.
  2. Install Docker and Docker Compose if you don't already have them installed.
  3. Set your OpenAI API key and GPT-3 engine in the docker-compose.yml file.
  4. Start the app using docker-compose up.

Endpoints

POST /build_index

Builds or refreshes the search index for a document and caches it in the shared Docker volume.

Request Body

{
  "url": "https://example.com/my-document.pdf"
}

Response

{
  "message": "Index built and cached successfully."
}

POST /search_index

Searches the cached search index for a document for a specific query.

Request Body

{
  "url": "https://example.com/my-document.pdf",
  "query": "search query"
}

Response

{
  "matching_text": "matching text"
}

Environment Variables

The following environment variables can be set in the docker-compose.yml file:

  • OPENAI_API_KEY: Your OpenAI API key.
  • GPT3_ENGINE: The GPT-3 engine to use (default: davinci).
  • OPENAI_RATE_LIMIT: The rate limit for OpenAI API requests (default: 0).
  • OPENAI_RATE_PERIOD: The rate limit period in seconds for OpenAI API requests (default: 1).
  • LOCAL_RATE_LIMIT: The rate limit for local requests (default: 0).
  • LOCAL_RATE_PERIOD: The rate limit period in seconds for local requests (default: 1).

Authentication

Endpoints are protected by bearer token authentication. The bearer token must be included in the Authorization header of the request. The token can be set in the build_index and search_index functions in the app.py file.

gpt3_document_search's People

Contributors

kevingarabedian avatar

Stargazers

Nisarg Rajvi avatar Sebastian avatar Rahul V Ramesh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.