ocrpy's Introduction

ocrpy

Unified interface to google vision, aws textract, azure, tesseract and other OCR tools

The core objective of ocrpy is to let users perform OCR, archive, index and search any document with ease, providing an intuitive interface and a powerful Pipeline API to solve common OCR-based tasks.

ocrpy achieves this by wrapping around the most popular OCR engines like Tesseract OCR, Aws Textract, Google Cloud Vision and Azure Computer Vision. It unifies the multitude of interfaces provided by a wide range of cloud tools & other open-source libraries under a common and easy-to-use interface for the user.

Getting Started

ocrpy is a Python-only package hosted on PyPI. The recommended installation method is pip

pip install ocrpy

Day-to-Day Usage

ocrpy provides various levels of abstraction for the user to perform OCR on different types of documents. The recommended and the best way to use ocrpy is through it's pipeline API as shown below.

The Pipeline API can be invoked in two ways. The first method is to define the config for running the pipeline as a yaml file and and then run the pipeline by loading it as follows:

   from ocrpy import TextOcrPipeline

   ocr_pipeline = TextOcrPipeline.from_config("ocrpy_config.yaml")
   ocr_pipeline.process()

Alternatively you can run a pipeline by directly instantiating the pipeline class as follows:

   from ocrpy import TextOcrPipeline

   pipeline = TextOcrPipeline(source_dir='s3://document_bucket/', 
                              destination_dir="gs://processed_document_bucket/outputs/", 
                              parser_backend='aws-textract', 
                              credentials_config={"AWS": "path/to/aws-credentials.env/file", 
                                           "GCP": "path/to/gcp-credentials.json/file"})
   pipeline.process()

📝 For a more detailed set of examples and tutorials on how you could use ocrpy for your use case can be found at ocrpy documentation.

Support and Documentation

For an in-depth reference of the ocrpy API refer to our API docs.
For inspiration on how to use ocrpy for your usecase, check out our tutorials or our examples.
If you're interested in understanding how ocrpy works, check out our Ocrpy Overview.

Feedback and Contributions

If you have any questions, Feedback or notice something wrong, please open an issue on GitHub Issues.
If you are interested in contributing to the project, please open a PR on GitHub Pull Requests.
Or if you just want to say hi, feel free to contact us.

Citation

If you wish to cite this project, feel free to use this BibTeX reference:

@misc{ocrpy,
    title={Ocrpy: OCR, Archive, Index and Search any documents with ease},
    author={maxentlabs},
    year={2022},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/maxent-ai/ocrpy}}
}

License and Credits

ocrpy is licensed under the MIT license. The full license text can be also found in the source code repository.
ocrpy is written and maintained by Bharath G.S and Rita Anjana.
A full list of contributors can be found in GitHub's overview.

ocrpy's People

Contributors

Stargazers

Watchers

ocrpy's Issues

Package installation depends on version of opencv-python that's no longer distributed

Describe the bug
pip install ocrpy==0.3.10 errors out bc of the opencv-python version required:

ERROR: Could not find a version that satisfies the requirement opencv-python==4.1.2.30 (from ocrpy) (from versions: 3.4.0.14, 3.4.10.37, 3.4.11.39, 3.4.11.41, 3.4.11.43, 3.4.11.45, 3.4.13.47, 3.4.14.51, 3.4.14.53, 3.4.15.55, 3.4.16.57, 3.4.16.59, 3.4.17.61, 3.4.17.63, 3.4.18.65, 4.3.0.38, 4.4.0.40, 4.4.0.42, 4.4.0.44, 4.4.0.46, 4.5.1.48, 4.5.2.52, 4.5.2.54, 4.5.3.56, 4.5.4.58, 4.5.4.60, 4.5.5.62, 4.5.5.64, 4.6.0.66)
ERROR: No matching distribution found for opencv-python==4.1.2.30

To Reproduce
Steps to reproduce the behavior:

Initiate new venv
pip install ocrpy==0.3.10
See error

Expected behavior
Installation to complete successfully

Screenshots
n/a

does it work without internet?

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Recommend Projects

maxent-ai / ocrpy Goto Github PK

ocrpy's Introduction

ocrpy

Getting Started

Day-to-Day Usage

Support and Documentation

Feedback and Contributions

Citation

License and Credits

ocrpy's People

Contributors

Stargazers

Watchers

Forkers

ocrpy's Issues

Package installation depends on version of opencv-python that's no longer distributed

does it work without internet?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs