shakeebparwez / medi-extract Goto Github PK

View Code? Open in Web Editor NEW

MediExtract simplifies medical data retrieval. FastAPI powers the backend, while pdf2image and pytesseract ensure accurate extraction. Regex-based parsers decode prescriptions, offering a reliable solution in healthcare data science.

Jupyter Notebook 60.65% Python 39.35%

medi-extract's Introduction

MediExtract

MediExtract is a FastAPI-based backend project designed for streamlined medical data extraction from prescription and patient detail PDFs. Leveraging pdf2image and pytesseract, it ensures accurate text extraction, while regex-based parsers decode key medical information.

Features

FastAPI Backend: Enables seamless PDF uploads for medical data extraction.
Text Extraction: Utilizes pdf2image and pytesseract for precise content extraction.
Regex Parsers: Specialized regex parsers for extracting patient names, medicines, and more.
Testing and Reliability: Incorporates pytest for automated testing, ensuring robust functionality.

Tech Stack

FastAPI
pdf2image
pytesseract
poppler-utils
pytest

Usage

Clone the repository.
Install dependencies: pip install -r requirements.txt
Run the FastAPI server: uvicorn main:app --reload
Access the API at http://127.0.0.1:8000/docs and use the /extract_from_doc endpoint for PDF extraction.

Contributing

Contributions are welcome! Fork the repository, make your changes, and submit a pull request.

Recommend Projects

shakeebparwez / medi-extract Goto Github PK

medi-extract's Introduction

MediExtract

Features

Tech Stack

Usage

Contributing

medi-extract's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs