Comments (5)
Since it generated a output.txt
instead of a output.hocr
or output.html
, my guess would be that you're missing the configuration file hocr
(/usr/share/tesseract-ocr/tessdata/configs/hocr
with Tesseract 3.05 in Debian). If so, it shouldn't have been silenced however.
from pyocr.
I compiled tesseract on my own and I got a hocr file in /usr/local/share/tessdata/configs/hocr
I got this output with --print-parameters command option:
tesseract --print-parameters | grep hocr
hocr_font_info 0 Add font info to hocr output
tessedit_create_hocr 0 Write .html hOCR output file
0 means disabled ?
Is there a way to make sure tesseract uses the hocr specified above?
Thank you for your help
from pyocr.
0 = disabled. But this is to be expected since you didn't specify to Tesseract that it must use the hocr configuration file.
% tesseract --print-parameters | grep hocr
hocr_font_info 0 Add font info to hocr output
tessedit_create_hocr 0 Write .html hOCR output file
% tesseract --print-parameters randomfile.jpeg randomoutputfile hocr | grep hocr
hocr_font_info 0 Add font info to hocr output
tessedit_create_hocr 1 Write .html hOCR output file
What is the content of your /usr/local/share/tessdata/configs/hocr
?
from pyocr.
I rebuild completely tesseract with latest version and I got no problem.
It seems it was an issue with my tessdata path, files were installed in different place...
Thank you for your help ;-)
from pyocr.
You're welcome
from pyocr.
Related Issues (20)
- -psm tesseract parameter is deprecated HOT 2
- Could we get a confidence value by each word? HOT 4
- In a multipage TIFF, results are returned only from the first page HOT 16
- I want use chinese char, but acc is low HOT 3
- tessedit_char_whitelist . detect only predefined chars . HOT 15
- [libtesseract] output of get_available_builders() is incomplete HOT 2
- The result is empty HOT 2
- Test environment to make tests reproducable HOT 1
- preserve_interword_spaces in tesseract HOT 1
- Extract Individual Characters
- [Libtesseract] Reduce calls to tesseract_raw.init() HOT 5
- Using libtesseract on Windows HOT 3
- 1 recognize 3 issue HOT 1
- Difference between pyocr, pytesseract, tesserocr HOT 2
- Different results generated from pyocr and tesseract HOT 1
- Problem allocate memory HOT 4
- tesseract4 error in detect orientation HOT 1
- Trying to OCR a jpeg but getting [Error 3221225477]? HOT 7
- pyocr with latest Tesseract fails with pyocr.error.TesseractError: "Error, unknown command line argument '-psm'\n") HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyocr.