This Flask app uses GPT-3 and the OpenAI API to generate embeddings and build a search index for a PDF document, and allows users to search for specific text within the document. The app uses a shared Docker volume to cache the embeddings and search index for faster searching.
- Clone the repository to your local machine.
- Install Docker and Docker Compose if you don't already have them installed.
- Set your OpenAI API key and GPT-3 engine in the
docker-compose.yml
file. - Start the app using
docker-compose up
.
Builds or refreshes the search index for a document and caches it in the shared Docker volume.
{
"url": "https://example.com/my-document.pdf"
}
{
"message": "Index built and cached successfully."
}
Searches the cached search index for a document for a specific query.
{
"url": "https://example.com/my-document.pdf",
"query": "search query"
}
{
"matching_text": "matching text"
}
The following environment variables can be set in the docker-compose.yml
file:
OPENAI_API_KEY
: Your OpenAI API key.GPT3_ENGINE
: The GPT-3 engine to use (default:davinci
).OPENAI_RATE_LIMIT
: The rate limit for OpenAI API requests (default:0
).OPENAI_RATE_PERIOD
: The rate limit period in seconds for OpenAI API requests (default:1
).LOCAL_RATE_LIMIT
: The rate limit for local requests (default:0
).LOCAL_RATE_PERIOD
: The rate limit period in seconds for local requests (default:1
).
Endpoints are protected by bearer token authentication. The bearer token must be included in the Authorization
header of the request. The token can be set in the build_index
and search_index
functions in the app.py
file.