Document Search Engine is a Python-based tool that indexes and searches text within documents stored either locally or in a Google Drive folder. The tool provides a GUI for ease of use and is capable of reading and extracting text from .pdf, .doc, .docx, .jpg, and .png file formats.
- Full-text search of documents.
- Supports .pdf, .doc, .docx, .jpg, and .png file formats.
- Can search documents located locally or in a Google Drive folder.
- Graphical user interface for ease of use.
- Customizable search context range.
-
Clone this repository:
git clone https://github.com/qepting91/document-search-engine.git
-
Install required Python packages:
pip install -r requirements.txt
In order to search documents stored in Google Drive, you'll need to create a credentials.json
file:
- Go to the Google Cloud Console.
- Create a new project, or select an existing one.
- In the sidebar, go to APIs & Services > Library.
- Search for "Drive API" and enable it for your project.
- Go to APIs & Services > Credentials.
- Click "Create Credentials", then "OAuth client ID".
- If you haven't configured the OAuth consent screen, you'll need to do so - you can use the default settings.
- For application type, choose "Desktop app".
- Click "Create", then "OK".
- In the credentials list, you should now see an entry for your client ID. Click the download icon on the right to download your
credentials.json
file. - Move
credentials.json
into the same directory as themain.py
file in this repository.
-
Specify the local directories and Google Drive folder to search in the
config.json
file. -
Run the main script:
python main.py
-
Enter your search query in the application window that appears and click "Search".
The application will display each matching document's path or link and the context around the match.
- Python 3
- PyPDF2
- python-docx
- PIL
- pytesseract
- google-api-python-client
- google-auth
- google-auth-httplib2
- google-auth-oauthlib
- Whoosh
This project is licensed under the terms of the MIT license.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
If you encounter any problems or have any questions, please open an issue on this GitHub repository.