thomasgruebl / rusty-tesseract Goto Github PK

View Code? Open in Web Editor NEW

125.0 125.0 16.0 106 KB

A Rust wrapper for Google Tesseract

License: MIT License

Rust 100.00%

rusty-tesseract's People

Contributors

Stargazers

Watchers

Forkers

riizade ecashin icekey voxelcoder gakgu strrl vininew921 gabeperson yunonion vladmovchan johannes-adr mustafakyz foxzool icemastert

rusty-tesseract's Issues

Cargo warns "ignoring invalid dependency `rusty-tesseract` which is missing a lib target"

Hello and thanks for making rusty-tesseract.

I was getting started by doing cargo add rusty-tesseract or cargo rm rusty-tesseract; cargo add rusty_tesseract and then doing use rusty_tesseract::Image; in my main.rs.

But cargo build warns me that it's ignoring rusty-tesseract, because it has no library target in its Cargo.toml.

Bundling tesseract

It would be nice to bundle tesseract like https://github.com/ledunguit/tesseract-native-rs

Not getting text from image

This is my code

let img = Image::from_path("C:/Users/{my user}/Downloads/asd.png").unwrap();
let mut my_args = Args::default();
let output = rusty_tesseract::image_to_string(&img, &my_args).unwrap();
println!("The String output is: {:?}", output);

and this is the output

Tesseract Command: tesseract.exe C:/Users/{my user}/Downloads/asd.png stdout -l eng --dpi 150 --psm 3 --oem 3
The String output is: ""

This is the image im using it with

Is it something to do with my image/tesseract args or is it with the crate?

AWS Lambda Function Errors

I'm using rusty-tesseract with a rust lambda function and I'm getting some strange results.

When I run everything locally it works great, super easy to use!

But as soon as I deploy to an AWS Lambda, it breaks with this error:

thread 'main' panicked at 'called Result::unwrap()on anErr value: CommandExitStatusError("exit status: 1", "Error in pixCreateNoInit: pixdata_malloc fail for data\nError in pixCreateTemplateNoInit: pixd not made\nError in pixCreateTemplate: pixd not made\nError in pixCopy: pixd not made\nError in pixGetDepth: pix not defined\nError in pixGetWpl: pix not defined\nError in pixGetYRes: pix not defined\nError in pixClone: pixs not defined\nPlease call SetImage before attempting recognition.\nError during processing.\n")', src/main.rs:158:74

I am deploying using Lambda Layers with pre-built Tesseract binaries. I have tried this with Tesseract v4 and v5 but get the same error each time. Here are the versions I've tried with:

The tesseract version is: "tesseract 4.1.3\n leptonica-1.82.0\n libjpeg 6b (libjpeg-turbo 2.0.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0\n Found AVX2\n Found AVX\n Found FMA\n Found SSE\n"

And also v5:

The tesseract version is: "tesseract 5.3.2\n leptonica-1.83.1\n libjpeg 6b (libjpeg-turbo 2.0.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0\n Found AVX2\n Found AVX\n Found FMA\n Found SSE4.1\n Found OpenMP 201511\n"

It is definitely a possible Lambda environment issue I haven't figured out yet, but I wanted to put this out there in case there is something else going on that you may want to know about.

Feature Request: add hOCR output support

Hi! rusty-tesseract is amzaing work! It works pretty well on my both Linux and MacOS machine!

I have used it on my personal project https://github.com/strrl/dejavu, and I found that I require more detailed information like
page, paragraph, line, not only the "word". ref: STRRL/dejavu#7

I found that both alto and hOCR output could make it possible, and both of them are XML-based output. And I prefer to hOCR because it seems it still keeps updating, https://github.com/kba/hocr-spec/

So here is my proposal:

append new function called image_to_hocr, and output is the string which the content is the xml-based hOCR

How do you think about it? ❤️

I could draft a PR for that.

which Tesseract should I use?

Hi I am a bit confused with this crate and the other tesseract crate:

https://crates.io/crates/tesseract

which one should I use?

Always get "Invalid Tesseract version" when tesseract not exit as 0

It's better to carry out the exact exitcode when tesseract does not work as expected.

rusty-tesseract/src/tesseract/command.rs

Lines 61 to 64 in 4a0c3da

 match status.code() { 

 Some(0) => Ok(out), 

 _ => Err(TessError::VersionError(err)), 

 }

thomasgruebl / rusty-tesseract Goto Github PK

rusty-tesseract's People

Contributors

Stargazers

Watchers

Forkers

rusty-tesseract's Issues

Cargo warns "ignoring invalid dependency `rusty-tesseract` which is missing a lib target"

Bundling tesseract

Not getting text from image

AWS Lambda Function Errors

Feature Request: add hOCR output support

which Tesseract should I use?

Always get "Invalid Tesseract version" when tesseract not exit as 0

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	match status.code() {
	Some(0) => Ok(out),
	_ => Err(TessError::VersionError(err)),
	}