Comments (5)
Having a look through the pyocr sources this stands out to me:
src/pyocr/builders.py
307- file_ext = ["txt"]
308: tess_flags = ["-psm", str(tesseract_layout)]
309- cun_args = ["-f", "text"]
--
564- file_ext = ["html", "hocr"]
565: tess_flags = ["-psm", str(tesseract_layout)]
566- tess_conf = ["hocr"]
--
640- file_ext = ["html", "hocr"]
641: tess_flags = ["-psm", str(tesseract_layout)]
642- tess_conf = ["hocr"]
Does pyocr just use -psm
instead of --psm
as the parameter? I'm wondering whether that is not accepted anymore now.
from pyocr.
Does pyocr just use -psm instead of --psm as the parameter? I'm wondering whether that is not accepted anymore now.
It looks like this is the problem. I have changed the passed options in builds.py
to provide --psm
instead of -psm
and it works fine now. I might create a pull request for this though I'm not sure whether there are any other implications of this.
The commit in question in tesseract is the following:
tesseract-ocr/tesseract@ee201e1
from pyocr.
I also came across this today. I note that -psm is used not just in builders.py but also in tesseract.py.
from pyocr.
from pyocr.
I haven't had a chance yet to work out the circular import statements that I introduced in https://github.com/ddddavidmartin/pyocr/tree/update_deprecated_psm_option_string. If anyone wants to step in, feel free to give it a go.
For now, a quick and dirty fix is to just apply c136838.
from pyocr.
Related Issues (20)
- -psm tesseract parameter is deprecated HOT 2
- Could we get a confidence value by each word? HOT 4
- In a multipage TIFF, results are returned only from the first page HOT 16
- I want use chinese char, but acc is low HOT 3
- tessedit_char_whitelist . detect only predefined chars . HOT 15
- [libtesseract] output of get_available_builders() is incomplete HOT 2
- The result is empty HOT 2
- Test environment to make tests reproducable HOT 1
- preserve_interword_spaces in tesseract HOT 1
- Extract Individual Characters
- [Libtesseract] Reduce calls to tesseract_raw.init() HOT 5
- Using libtesseract on Windows HOT 3
- 1 recognize 3 issue HOT 1
- File not found HOT 5
- Difference between pyocr, pytesseract, tesserocr HOT 2
- Different results generated from pyocr and tesseract HOT 1
- Problem allocate memory HOT 4
- tesseract4 error in detect orientation HOT 1
- Trying to OCR a jpeg but getting [Error 3221225477]? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyocr.