GithubHelp home page GithubHelp logo

Adapt to Google Vision API about ocrmypdf HOT 15 CLOSED

shaunc869 avatar shaunc869 commented on July 19, 2024 3
Adapt to Google Vision API

from ocrmypdf.

Comments (15)

bkanuka avatar bkanuka commented on July 19, 2024 1

Just to update, this has been attempted recently - https://github.com/ualiawan/OCRmyPDF

Unfortunately it's in the form of a complete fork rather than a plugin, but the logic is all there. It would be really neat for someone to repackage @ualiawan work as a "pure plugin" that can be used with upstream OCRmyPDF.

from ocrmypdf.

jbarlow83 avatar jbarlow83 commented on July 19, 2024

Yes, the pipeline workflow I set up makes it quite easy to change the OCR engine or any other single part. I'll definitely look into it.

from ocrmypdf.

shaunc869 avatar shaunc869 commented on July 19, 2024

I had a couple more questions is there a way to get a hold of you directly?
My email [...]. Thanks!

On Sat, Dec 19, 2015 at 5:27 PM, jbarlow83 [email protected] wrote:

Yes, the pipeline workflow I set up makes it quite easy to change the OCR
engine or any other single part. I'll definitely look into it.


Reply to this email directly or view it on GitHub
#36 (comment).

Shaun

from ocrmypdf.

jbarlow83 avatar jbarlow83 commented on July 19, 2024

I removed your email address for your privacy.

from ocrmypdf.

bharat-patidar avatar bharat-patidar commented on July 19, 2024

Hi jbarlow83 and Shaun,
I am also working on exact same use case where I want to replace tesseract with Google Vision API. Can you guys help me out on the changes required to do the same.

@shaunc869 If you have working implementation and don't mind sharing it then it will be very great.
Thanks in advance.

from ocrmypdf.

jbarlow83 avatar jbarlow83 commented on July 19, 2024

This has not been attempted to my knowledge. This is not a small project but would be desirable.

The way to proceed would be to refactor the existing use of Tesseract in _pipeline.py to instead use generic OCR interface, and use the existing Tesseract driver in exec/tesseract.py to provide this for Tesseract and a new driver for cloud OCR.

Some more code would be needed to translate the Google response, delivered in JSON, to PDF. One way to do this would be convert the JSON format to hocr and use hocrtransform.py. It's unclear to me if the API can return a PDF with text annotations.

Google currently funds the development of Tesseract. As such there is nonzero chance that Tesseract is involved in delivering the text recognition component of the Cloud Vision. Cloud Vision clearly has some capabilities Tesseract does not, that much is clear.

from ocrmypdf.

ualiawan avatar ualiawan commented on July 19, 2024

Just to update, this has been attempted recently - https://github.com/ualiawan/OCRmyPDF

Unfortunately it's in the form of a complete fork rather than a plugin, but the logic is all there. It would be really neat for someone to repackage @ualiawan work as a "pure plugin" that can be used with upstream OCRmyPDF.

I should concede that the implementation in my repo is more like a quick fix to add support for GCV based OCR, rather than a thoroughly tested solution that can be merged with the original repo in its current form. However, it does what it is supposed to do, and as @bkanuka suggested the logic is all there. If someone is interested, we can work together to either repackage this as a pure plugin, or merge the fork with the original ocrmypdf repo.

from ocrmypdf.

ctrlcctrlv avatar ctrlcctrlv commented on July 19, 2024

@ualiawan I'm interested—but in Amazon Textract, not GCV.

from ocrmypdf.

scrollable-sidebar avatar scrollable-sidebar commented on July 19, 2024

I should concede that the implementation in my repo is more like a quick fix to add support for GCV based OCR, rather than a thoroughly tested solution that can be merged with the original repo in its current form. However, it does what it is supposed to do, and as @bkanuka suggested the logic is all there. If someone is interested, we can work together to either repackage this as a pure plugin, or merge the fork with the original ocrmypdf repo.

I'd love to try out your fork, but get the following error when I try to run it. While I'm new to Python and the command line so don't really know what I'm doing, I did manage to get the regular version of ocrmypdf working so hopefully this isn't beyond me. Would be very grateful for any help offered.

(venv) C:\Users\Me\AppData\Local\Microsoft\WindowsApps\venv\Scripts>ocrmypdf Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\Me\AppData\Local\Microsoft\WindowsApps\venv\Scripts\ocrmypdf.exe\__main__.py", line 7, in <module> File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\__main__.py", line 36, in run _parser, options, plugin_manager = get_parser_options_plugins(args=args) File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\_plugin_manager.py", line 116, in get_parser_options_plugins plugin_manager = get_plugin_manager(pre_options.plugins) File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\_plugin_manager.py", line 107, in get_plugin_manager builtins=builtins, File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\_plugin_manager.py", line 45, in __init__ self.setup_plugins() File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\_plugin_manager.py", line 73, in setup_plugins module = importlib.import_module(name) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\importlib\__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 728, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\builtin_plugins\gcv_tesseract_ocr.py", line 13, in <module> from ocrmypdf._exec import gcv File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\_exec\gcv.py", line 21, in <module> gcv_client = vision.ImageAnnotatorClient() File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\google\cloud\vision_v1\gapic\image_annotator_client.py", line 135, in __init__ ssl_credentials=ssl_credentials) File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\google\gax\grpc.py", line 106, in create_stub credentials = _grpc_google_auth.get_default_credentials(scopes) File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\google\gax\_grpc_google_auth.py", line 62, in get_default_credentials credentials, _ = google.auth.default(scopes=scopes) File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\google\auth\_default.py", line 488, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

from ocrmypdf.

ualiawan avatar ualiawan commented on July 19, 2024

getting

You may need to export google application credentials as follows:

export GOOGLE_APPLICATION_CREDENTIALS={path_to_gcv_api_key}

For more information, please see https://cloud.google.com/docs/authentication/getting-started

from ocrmypdf.

scrollable-sidebar avatar scrollable-sidebar commented on July 19, 2024

getting

You may need to export google application credentials as follows:

export GOOGLE_APPLICATION_CREDENTIALS={path_to_gcv_api_key}

For more information, please see https://cloud.google.com/docs/authentication/getting-started

Thank you. It turned out I needed to use SET instead of EXPORT as I am on Windows. Now it works great with the Tesseract engine, but with the Google Vision engine it fails with the OCR stage at 0%.

... UnboundLocalError: local variable 'visible_image_out' referenced before assignment

Is providing the Google API key enough, or do I also need to install Google client libraries? The key itself should be fine as I use it with another app (without installing Google client libraries). I've tried with various PDFs and images as source files, and also with both Python 3.7 and 3.9.

Here is a screenshot with debugging on, showing up until the the error message:

google-response

And here is the full error:

(venv) C:\Users\Me\AppData\Local\Microsoft\WindowsApps\venv\Scripts>ocrmypdf --output-type pdf small.jpg output.pdf Input file is not a PDF, checking if it is an image... Input file is an image Input image has no ICC profile, assuming sRGB Image seems valid. Try converting to PDF... Successfully converted to PDF, processing... Scanning contents: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 341.64page/s] OCR: 0%| | 0.0/1.0 [00:03<?, ?page/s] An exception occurred while executing the pipeline Traceback (most recent call last): File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\builtin_plugins\concurrency.py", line 135, in _execute result = future.result() File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\_base.py", line 428, in result return self.__get_result() File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\_base.py", line 384, in __get_result raise self._exception File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\concurrent\futures\thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "c:\users\me\appdata\local\microsoft\windowsapps\venv\lib\site-packages\ocrmypdf\_sync.py", line 214, in exec_page_sync page_context.image = visible_image_out UnboundLocalError: local variable 'visible_image_out' referenced before assignment

from ocrmypdf.

Asvaghosa avatar Asvaghosa commented on July 19, 2024

It doesn't work with mine either.

from ocrmypdf.

pseudomonas avatar pseudomonas commented on July 19, 2024

I'm getting the same error, with Python 3.8, on Ubuntu linux.

Doing the dumb-but-obvious thing and adding in […]/site-packages/ocrmypdf/_sync.py after line 211:

	else:
		visible_image_out = ocr_image_out

seems to make it work pretty much as expected. The issue seems to be that it only initialises the variable if not options.lossless_reconstruction: and if that option isn't False then (as far as I can see) it just omits to do so.

from ocrmypdf.

pseudomonas avatar pseudomonas commented on July 19, 2024

@ualiawan you might also want to note in your docs that Google cloud vision libraries
a) accept the json file being put in $HOME/.config/gcloud/application_default_credentials.json in lieu of the environment variable (and similar things in Windows)
b) don't work with python >= 3.10 (and possibly 3.9 which I didn't check. 3.8 works fine)
neither of these are strictly to do with your library but they affect use of the package as a whole.

from ocrmypdf.

alvitawa avatar alvitawa commented on July 19, 2024

I cant use the fork as-is due to the outdated dependencies (and I would rather not add an unmaintained dependency). Has any work been done to integrate it as a plugin? (other OCR API than google would also be ok for my use case)

from ocrmypdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.