xhm1027 / pytesser Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/pytesser
License: Other
Automatically exported from code.google.com/p/pytesser
License: Other
Introduction: ============ PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string. PyTesser uses the Tesseract OCR engine (an Open Source project at Google), converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in Linux as well. PyTesser: http://code.google.com/p/pytesser/ Tesseract: http://code.google.com/p/tesseract-ocr/ Dependencies: ============= PIL is required to work with images in memory. PyTesser has been tested with Python 2.4 in Windows XP. http://www.pythonware.com/products/pil/ Installation: ============== PyTesser has no installation functionality in this release. Extract pytesser.zip into directory with other scripts. Necessary files are listed in File Dependencies below. Usage: ================================ >>> from pytesser import * >>> im = Image.open('phototest.tif') >>> text = image_to_string(im) >>> print text This is a lot of 12 point text to test the ocr code and see if it works on all types of file format. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. >>> try: ... text = image_file_to_string('fnord.tif', graceful_errors=False) ... except errors.Tesser_General_Exception, value: ... print "fnord.tif is incompatible filetype. Try graceful_errors=True" ... print value ... fnord.tif is incompatible filetype. Try graceful_errors=True Tesseract Open Source OCR Engine read_tif_image:Error:Illegal image format:Compression Tessedit:Error:Read of file failed:fnord.tif Signal_exit 31 ABORT. LocCode: 3 AbortCode: 3 >>> text = image_file_to_string('fnord.tif', graceful_errors=True) >>> print "fnord.tif contents:", text fnord.tif contents: fnord >>> text = image_file_to_string('fonts_test.png', graceful_errors=True) >>> print text 12 pt And Arnazwngw few dwscotheques provwde jukeboxes Tames Amazmgly few dnscotheques pmvxde Jukeboxes 24 pt: Arial: Amazingly few discotheques provide jul<ebo><es. Courier: Ama zimgly few discotheque S provide j u k e b ox e S . Times: Amazingly few discotheques provide jukeboxes. File Dependencies: ============================================ pytesser.py Main module for importing util.py Utility functions used by pytesser.py errors.py Interprets exceptions thrown by Tesseract tesseract.exe Executable called by pytesser.py tessdata/ Resources used by tesseract.exe
What steps will reproduce the problem?
1. Click on http://pytesser.googlecode.com/files/pytesser_v0.0.1.zip
2.
3.
What is the expected output? What do you see instead?
Dialog box to save/open appears but browser indicates "connecting" and
never connects even after 15 minutes. I've been trying to download this
file for two days. Have tried about 10 times over that period.
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 27 Jun 2008 at 4:33
It's currently not possible to install pytesser via PyPI as it lacks a setup.py
file.
pip install PyTesser==0.0.1 should allow pytesser to be installed.
Original issue reported on code.google.com by [email protected]
on 28 Mar 2014 at 2:30
What steps will reproduce the problem?
1. Calling image_to_string on any image object
What is the expected output? What do you see instead?
Expected: The result without the string "Tesseract Open Source OCR Engine with
LibTiff"
Seeing: "Tesseract Open Source OCR Engine with LibTiff" + the analyzed result.
What version of the product are you using? On what operating system?
0.0.1, Arch Linux 2.6.39
Please provide any additional information below.
I have installed tesseract 3.00-2, and using this directly outputs the expected
result.
Original issue reported on code.google.com by [email protected]
on 20 Jun 2011 at 10:57
Is there a way to get the coordinates of the string recognized.
Original issue reported on code.google.com by [email protected]
on 15 Oct 2012 at 1:26
Will it work for images containing hand written words ??
Original issue reported on code.google.com by [email protected]
on 10 Jun 2011 at 11:29
What steps will reproduce the problem?
1. Use the attached image as function argument to call image_to_string
What is the expected output? What do you see instead?
The expected string should be "S1.00". However, I get "$1.00" currently, which
is inaccurate.
What version of the product are you using? On what operating system?
Python 2.7.9, Windows 8
Please provide any additional information below.
Please kindly let me know if any information that may be required to help
fixing this issue. Thanks.
Original issue reported on code.google.com by [email protected]
on 2 Jun 2015 at 9:21
Attachments:
What steps will reproduce the problem?
from pytesser import *
im = Image.open('ibd.png')
im = im.convert('RGB')
text = image_to_string(im)
print text
received:
M A R K E T P U LS E
Aug; 'rm;nnn>gy^'E*`
Myriad <;eumiu*m;*`
un:ln.smm=:'^^
Eauehnkm rrimliueru-*`
aadm vamntwx
Alninn^*~x*`
uMq==iu'^?*`* a. mg
wel:.mm`*`*`*" 'rn
nun-inaae^*`*^` charm:
su:;hwalF**"Avagu^""]
raJn.s!c=?^*"`
Radix" aqua:-***^*`
m.ailsmq=ns^'-t
Aumhnme^*~**
sanoiskm"`
What is the expected output? What do you see instead?
see attached image
What version of the product are you using? On what operating system?
using windows version 0.0.1
Original issue reported on code.google.com by [email protected]
on 6 Apr 2014 at 10:43
Attachments:
What steps will reproduce the problem?
1. import pytesser
2. os.chdir([...])
3. call any pytesser function
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
0.0.1
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 21 May 2007 at 3:44
What steps will reproduce the problem?
1. There is an error when the sstems try to exec the Tesserac.exe on MAC
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
MAC OS X
Please provide any additional information below.
Please , send me information in how to substitutue the .Exe file needed on Mac
OS X. Thanks
Original issue reported on code.google.com by [email protected]
on 12 Apr 2009 at 5:40
What steps will reproduce the problem?
>>> from pytesser import *
>>> import Image
>>>im=Image.open('C:\\Python24\\Lib\\site-packages\\pytesser\\phototest.tif')
>>> text=image_to_string(im)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\site-packages\pytesser\pytesser.py", line 31, in
image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "C:\Python24\lib\site-packages\pytesser\pytesser.py", line 21, in
call_tesseract
proc = subprocess.Popen(args)
File "C:\Python24\lib\subprocess.py", line 542, in __init__
errread, errwrite)
File "C:\Python24\lib\subprocess.py", line 706, in _execute_child
startupinfo)
WindowsError: [Errno 2] The system cannot find the file specified
>>>
Original issue reported on code.google.com by [email protected]
on 25 Jul 2007 at 6:18
Now my picture just contains digital. but pytesser often treat some digital as
character. so I want to remove character identify, and just keep digital. How
can I config it?
Thanks a lot~~~
Original issue reported on code.google.com by [email protected]
on 29 Jul 2011 at 4:14
Is it possible to traing pytesser like you train tesseract. And if how do
you do it?
Original issue reported on code.google.com by [email protected]
on 11 Aug 2008 at 10:01
What steps will reproduce the problem?
1. Trying to run in Python 3+
What is the expected output? What do you see instead?
Not errors => Errors
What version of the product are you using? On what operating system?
3.4.1, Windows 7 x64
Please provide any additional information below.
Included is an updated working version
Original issue reported on code.google.com by [email protected]
on 19 Aug 2014 at 2:00
Attachments:
Is it possible to let pyteaser run with Tesseract 2.01?
Original issue reported on code.google.com by [email protected]
on 21 Jan 2008 at 9:20
What steps will reproduce the problem?
1. When Using Tesseract version 3.01, a title line is displayed.
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
To stop this being displayed change the line in pytesser.py
proc = subprocess.Popen(args)
to this:
proc = subprocess.Popen(args, stdout=subprocess.PIPE)
Original issue reported on code.google.com by [email protected]
on 3 Dec 2011 at 1:43
when i m trying to convert my file
to string its not working properly
i m attaching my pht.tif
but output is not show all the string which contain in pht.jpg
Original issue reported on code.google.com by [email protected]
on 2 Dec 2013 at 7:11
Attachments:
What steps will reproduce the problem?
1. Import pytesser into a script that's on a different path
2. Try and run pytesser.image_to_string()
3.
What is the expected output? What do you see instead?
This is how it crashes:
Traceback (most recent call last):
File "z:\scripts\bruker\grabMicroStar.py", line 107, in <module>
text = pytesser.image_to_string(snap).strip()
File "z:\scripts\bruker\pytesser\pytesser.py", line 41, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "z:\scripts\bruker\pytesser\pytesser.py", line 31, in call_tesseract
proc = subprocess.Popen(args)
File "c:\programs\Python26\lib\subprocess.py", line 633, in __init__
errread, errwrite)
File "c:\programs\Python26\lib\subprocess.py", line 842, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
What version of the product are you using? On what operating system?
v0.0.1
Please provide any additional information below.
The problem is that it doesn't know where to find the tesseract.exe executable
if current working directory is not the same as that of the script (likely).
The fix is: in pytesser.py, add the following lines after tesseract_exe_name,
which figure out the path:
#=========================================
if 'pytesser' in sys.modules:
tesseract_exe_path = sys.modules['pytesser'].__path__[0]
else:
tesseract_exe_path = os.getcwd()
tesseract_exe_name = os.path.join(tesseract_exe_path,tesseract_exe_name)
#=========================================
Original issue reported on code.google.com by [email protected]
on 15 Jun 2010 at 12:42
When graceful_errors is True, pytesser will now ripple arguments through to the
call to image_to_string after the input image has been converted to a Tesseract
compatible format.
Original issue reported on code.google.com by [email protected]
on 18 Sep 2011 at 12:34
Attachments:
I have download latest pytesser, but zip file contain .exe file. so i dont know
how can i install this on ubuntu. Is there any Module of pytesser for ubuntu?
please help me.
Thanxs.
Original issue reported on code.google.com by [email protected]
on 10 Apr 2012 at 5:48
I got D378984999, it should be 13378984999
Original issue reported on code.google.com by [email protected]
on 18 Jan 2009 at 2:14
Attachments:
What steps will reproduce the problem?
Trying to use the code that makes a whitelist for Tesseract like follows
ocr = tesseract.TessBaseAPI()
ocr.SetVariable("tessedit_char_whitelist", "0123456789;")
ocr.SetPageSegMode(tesseract.PSM_AUTO)
ocr.Init("C:\\Program Files (x86)\\Tesseract-OCR\\","eng",tesseract.OEM_DEFAULT)
What is the expected output? What do you see instead?
Intended output is to have only "0123456789;" characters be recognized when
using the image_to_string() function. Using code like what is above,
image_to_string() just ignores it and grabs whatever characters it finds.
What version of the product are you using? On what operating system?
pytesseract-0.1, Python 2.7, Windows 8.1
Please provide any additional information below.
I've been trying everything people use for Tesseract-OCR, but that doesn't work
with pytesseract. I haven't been able to find any solution or method to
whitelisting with the image_to_string() function anywhere, which would be
immensely helpful in improving the accuracy of the function.
Thanks in advance for any help on the matter.
Original issue reported on code.google.com by [email protected]
on 9 Jun 2015 at 6:58
The line "import Image" at line 6 pytesser.py expects that /PIL/ is added to
the path. However, Pillow does not have the PIL.pth file that ensures you can
import modules like that. As such, the import line needs to be changed to "from
PIL import Image".
Original issue reported on code.google.com by [email protected]
on 31 Mar 2014 at 12:07
What steps will reproduce the problem?
1. from pytesser import *
2.
3.
What is the expected output? What do you see instead?
No errors and allows me to continue. Instead I see the following error:
Traceback (most recent call last):
File "<pyshell#0>", line 1, in -toplevel-
from pytesser import *
ImportError: No module named pytesser
What version of the product are you using? On what operating system?
v0.0.1, Windows 7 Ultimate 32-bit
Please provide any additional information below.
I am running python 2.4 because 2.6 gave me the same error message. I have
downloaded and installed PIL. I have also placed the pytesser dependents in
the C:\Python24\Tools\Scripts folder. Once I got the error message the first
time, I then also added the files to C:\Python24\Scripts just as a precaution.
Original issue reported on code.google.com by [email protected]
on 4 Sep 2010 at 5:42
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.