MediExtract is a FastAPI-based backend project designed for streamlined medical data extraction from prescription and patient detail PDFs. Leveraging pdf2image and pytesseract, it ensures accurate text extraction, while regex-based parsers decode key medical information.
- FastAPI Backend: Enables seamless PDF uploads for medical data extraction.
- Text Extraction: Utilizes pdf2image and pytesseract for precise content extraction.
- Regex Parsers: Specialized regex parsers for extracting patient names, medicines, and more.
- Testing and Reliability: Incorporates pytest for automated testing, ensuring robust functionality.
- FastAPI
- pdf2image
- pytesseract
- poppler-utils
- pytest
- Clone the repository.
- Install dependencies:
pip install -r requirements.txt
- Run the FastAPI server:
uvicorn main:app --reload
- Access the API at
http://127.0.0.1:8000/docs
and use the/extract_from_doc
endpoint for PDF extraction.
Contributions are welcome! Fork the repository, make your changes, and submit a pull request.