GithubHelp home page GithubHelp logo

tess4j's People

Contributors

4f2e4a2e avatar alexander7161 avatar breathermachine avatar dependabot[bot] avatar gitter-badger avatar grahams avatar kreyssel avatar manuel-ssg avatar nguyenq avatar qxo avatar siddharths1 avatar stweil avatar waicool20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tess4j's Issues

Maven break

Hi,

I just clone your rep to my local to try out, but when I import the project into my eclipse, the POM editor shows a failure message says = Failure to transfer commons-collections:commons-collections:jar:3.2.1 from https bla bla bla...

So I went to maven repo, and found out that commons-collections already moved to common-collections4. I try to resolve this issue in local but I couldn't

Please advice, thanks Nguyen :)

NoClassDefFoundError: net/sf/ghost4j/GhostscriptException

We are having trouble making the code below work. Could it be the PdfUtilities?
We really appreciate any help. The stack trace is at the bottom. Let us know if you need more information.

Environment

Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

Code

package com.jantogal.test.pdf;
//import org.ghost4j.Ghostscript;
//import org.ghost4j.GhostscriptException;
import java.io.File;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import net.sourceforge.vietocr.PdfUtilities;

public class App
{
private static final String TEST_FILE_PATH = "~/workspace/pdfs/FT0.pdf";
public static void main(String[] args) {
Test02();
}
private static void Test02(){
Tesseract instance = Tesseract.getInstance();
File pdfDoc = new File(TEST_FILE_PATH);
File pngImageFiles[] = PdfUtilities.convertPdf2Png(pdfDoc);

    for (int i = 0; i < pngImageFiles.length; i++) {
      try {
        String ocrResult = instance.doOCR(pngImageFiles[i]);
        System.out.println(ocrResult);
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
      System.console().readLine();
    }
}

}

Solutions attempted:

(1) Tried including Ghostscript ghost4j 1.0.0 in the maven dependencies. The library was in the classpath correctly.
Results: Stack trace shown below - Note: I was able to instantiate Ghostscript objects using the factory without any issues inside void main.
(2) Followed the solution described here #4
Results: same as (1)

Stack Trace:

The issue happens on the line * File pngImageFiles[] = PdfUtilities.convertPdf2Png(pdfDoc);*

We are just testing for the moment. The exception is:
Exception in thread "main" java.lang.NoClassDefFoundError: net/sf/ghost4j/GhostscriptException
at com.jantogal.test.pdf.App.Test02(App.java:24)
at com.jantogal.test.pdf.App.main(App.java:19)
Caused by: java.lang.ClassNotFoundException: net.sf.ghost4j.GhostscriptException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more

Change from tess4j 2.0.1 to 3.1.0

Hello. I try to go from tess4j 2.0.1 to 3.1.0. I just changed the version of tess4j in my pom (Maven) and the java version to 1.7. Everything compile in a good way. The problem is that when I launch my tests in the same way as before (and it worked) tesseract did not find the language files.
The matter must come from the new version because when I go back to 2.0.1 in java 1.6 I do the work the same as before.

Include special characters when looking for character coordinates

How to include white spaces, tabs or special characters when looking for character coordinates??

I found out that I will get character coordinates using

int level = TessPageIteratorLevel.RIL_SYMBOL; (instead of RIL_WORD)

https://github.com/nguyenq/tess4j/blob/master/src/test/java/net/sourceforge/tess4j/TessAPI1Test.java#L465

The thing is that this will exclude other characters that are important to me. By important I mean I want to be able to provide text that make sense to humans too(with spaces, end lines, tabs).

So I really need to have the full OCR result of an image and also the coordinates of each character including special characters. Is this possible? Thanks.

UnsatisfiedLinkError: Add exception message with cause hint

User should get a hint, that Visual C++ Redistributable must be installed. Here a example running on Windows 7 with version 2.0.2-SNAPSHOT:

java.lang.UnsatisfiedLinkError: Das angegebene Modul wurde nicht gefunden.

        at com.sun.jna.Native.open(Native Method)
        at com.sun.jna.Native.open(Native.java:1759)
        at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:260)
        at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:398)
        at com.sun.jna.Library$Handler.<init>(Library.java:147)
        at com.sun.jna.Native.loadLibrary(Native.java:412)
        at com.sun.jna.Native.loadLibrary(Native.java:391)
        at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:68)
        at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:41)
        at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:286)
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:222)
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:171)
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:155)
        at com.example.tess4j.TestTess4J.test(TestTess4J.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
        at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
        at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Error:Assert failed:in file ratngs.cpp, line 321

Some images are crashing tess4j with the message:
start >= 0 && start + num <= length_:Error:Assert failed:in file ratngs.cpp, line 321,
causing java.lang.Error: Invalid memory access.
Processing the same image with tesseract from the commandline (generating hocr) works fine.
Any ideas?

TessAPI1.TessBaseAPIRecognize(handle, null); Invalid memory access @question

Hay.

I have posted my question on Stackoverflow http://stackoverflow.com/questions/35295582/tess4j-memory-access-error-in-tess4j-java ... but I still can't solve the error... I tried to see if the problem in in bufferimage but the image bufft oky and the images on which its working or not are the same; same format, same resolution,

hear prints of image and Baytbuff of which program workse oky
BufferedImage@169c6ba: type = 5 ColorModel: #pixelBits = 24 numComponents = 3 color space = java.awt.color.ICC_ColorSpace@daef40 transparency = 1 has alpha = false isAlphaPre = false ByteInterleavedRaster: width = 2550 height = 4200 #numDataElements 3 dataOff[0] = 2
its Bytebuffer : java.nio.DirectByteBuffer[pos=0 lim=32130000 cap=32130000]

image on which returns below error:
BufferedImage@169c6ba: type = 5 ColorModel: #pixelBits = 24 numComponents = 3 color space = java.awt.color.ICC_ColorSpace@daef40 transparency = 1 has alpha = false isAlphaPre = false ByteInterleavedRaster: width = 2550 height = 4200 #numDataElements 3 dataOff[0] = 2
its Baytbuff: java.nio.DirectByteBuffer[pos=0 lim=32130000 cap=32130000]

Exception in thread "main" java.lang.Error: Invalid memory access
at net.sourceforge.tess4j.TessAPI1.TessBaseAPIRecognize(Native Method)
at TesseractUtility.TessFoLogo.testResultIterator(TessFoLogo.java:125)
at TesseractUtility.TessFoLogo.main(TessFoLogo.java:76)

Please help me... where can I search for bug.

Cannot delete file after OCR

The following code throws java.io.IOException: Unable to delete file: multipage_tif_example.pdf

File input = new File("multipage_tif_example.tif");
Tesseract1 instance = new Tesseract1();
instance.createDocuments(input.getPath(), input.getPath() , Arrays.asList(RenderedFormat.PDF));
File output = new File("multipage_tif_example.tif.pdf");
FileUtils.forceDelete(output);

Is there a handle kept by Tess4j on the output files?
Is there there a way to remove that handle?

Ghostscript WARN messages

Using PdfUtilities.convertPdf2Png(inputPdfFile) outputs this:

log4j:WARN No appenders could be found for logger (org.ghost4j.Ghostscript).
log4j:WARN Please initialize the log4j system properly.

The conversion works fine, but the warning is annoying :)

logback.xml should not be packaged within the jar

At the startup of the application, if we add our own logback.xml inside a new project, we can see this warning:

|-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.
|-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/Users/[user]/.m2/repository/net/sourceforge/tess4j/tess4j/3.2.1/tess4j-3.2.1.jar!/logback.xml]
|-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/[project]/target/classes/logback.xml]

The result is pretty bad, since we lose control on our logging configurations, there's a workaroud, we can specify a distinct name for the config file etc., but it's not supposed to be like that.

In the pom.xml, it easy to exclude the file when packaging the jar:

                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-jar-plugin</artifactId>
                    <configuration>
                        <excludes>
                            <exclude>**/logback.xml</exclude>
                        </excludes>
                    </configuration>
                </plugin>

liblept.so.4: cannot open shared object file

Hello!
Unfortunately I am not able to run tesseract through tess4j on Linux (CentOS).
I think that tesseract is being loaded correctly with JNA, but for some reason he can't find liblept.so.
Is it possible to somehow tell tesseract which is loaded through JNA where he should look for liblept.so?
Or maybe it would be possible to compile tesseract in such way that leptonica shared library wouldn't be needed at execution time?

NOTE1: I can't set LD_LIBRARY_PATH environment variable. It solves the problem but cannot be done on my system.
NOTE2: Leptonica is not installed on local machine.
NOTE3: I have compiled Tesseract from the source (tried github master branch and 3.02 from sourceforge).

OSGI enabled

We should add OSGI bundle support und deploy on maven central, here a request from Suvidh via the tesseract-ocr forum.

Hi

I am adding the below pom dependency to my project to download the ‘tess4j’ jar files.

It successfully download the jar files from the central repository and adds it to my class path. But it is not OSGI bundled jar it is a normal jar file.

To use this jar in my project it has to be OSGI bundled.

net.sourceforge.tess4j tess4j 1.3.0

Can I have the OSGI bundled Jar file.

Thanks

Suvidh

Tesseract1Test.testCreateDocuments() passes with created document of 1348 bytes

The output files for Tesseract1Test.testCreateDocuments() are 1348 bytes and is an invalid pdf.

Steps to reproduce:

  1. Open your workspace
  2. Run Tesseract1Test.testCreateDocuments()
  3. check file size of test/test-results/docrenderer1-1.pdf
  4. Open test/test-results/docrenderer1-1.pdf

Expected results:
file size of docrenderer1-1.pdf is larger than 1348 bytes and can open in Adobe reader successfully.

Actual results:
File size of test/test-results/docrenderer1-1.pdf is exactly 1348 bytes and cannot be opened.

Dropbox links to relevant files:
docrenderer1-1.pdf output file https://www.dropbox.com/s/bbkp67h37ksd4tu/docrenderer1-1.pdf?dl=0
updated Tesseract1Test.java to check file size https://www.dropbox.com/s/iu24h6ps1ij4i9m/Tesseract1Test.java?dl=0

Replace ghost4j with pdfbox

Due to the big size of the native libraries (gs*dll) of Ghost4J and the fact that pdfbox is written in pure java, we will be replacing the Ghost4J lib with pdfbox.

can tess4j support for linux

I want to develop a orc program on linux ubuntu, but I found the tess4j only support for windows;
My Question is that When the tess4j can support for linux or what can I do to support linux when using tess4j.

Not reading provided ./tessdata/eng.traineddata

Just cloned the repo, built the .jar with

mvn assembly:single

Added the tess4j-2.0.0-SNAPSHOT-all.jar file to my classpath and tried to use it.
On the first call to

instance.doOCR(file);

it spits this out:

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
AdaptedTemplates != NULL:Error:Assert failed:in file ..\..\classify\adaptmatch.cpp, line 174

Double checked and yes, I do have the TESSDATA_PREFIX set to:C:\Users\ribeirob\Applications\nguyenq\tess4j\src\main\resources\

Failed loading language Tesseract couldn't load any languages!

Tess4J version - 3.1.0
Tesseract version - Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Os - OSX 10.11.3 El Capitan

//TESS4J_FOLDER_PATH = "/usr/local/Cellar/tesseract/3.04.01_1/share/' - location lang .traineddata
instance = new Tesseract();
instance.setDatapath(TESS4J_FOLDER_PATH);
instance.setLanguage("chi_tra");

String result = "";
File imageFile = new File(filePath);
try {
     result = instance.doOCR(imageFile);
} catch (TesseractException e) {
    System.err.println(e.getMessage());
}
return result;

i got error

Failed loading language 'chi_tra'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000012408d311, pid=68062, tid=0x0000000000001703
#
# JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.dylib+0x13311]  _ZN9tesseract9Tesseract15recog_all_wordsEP8PAGE_RESP10ETEXT_DESCPK4TBOXPKci+0xb9
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

Load tessdata folder automated by default

Since #2 is the tessdata folder is the only resource / property that must be loaded / set in order to run tess4j.

For simplicity's sake, the default folder should be loaded automactly as the libs in #2.

WDYT?

No support to UNLV files

When I run tesseract via cli, it automatically uses any .uzn file with the same name of the target image.

Example:
tesseract workingimage004.png output.txt -psm 4

If I have a workingimage004.uzn file in the same location of my workingimage004.png it will use it.

Tess4J ignores the uzn file.

Here is my code:

        Tesseract instance = Tesseract.getInstance();
        //In case you don't have your own tessdata, let it also be extracted for you
        File tessDataFolder = LoadLibs.extractTessResources("tessdata");

        //Set the tessdata path
        instance.setDatapath(tessDataFolder.getAbsolutePath());

        instance.setPageSegMode(4);
        try {
            String content = instance.doOCR(page);
        } catch (TesseractException e) {
            e.printStackTrace();
        }

Additional info:
Found the function that read and segments the image using a file UNLV at ccstruct\blread.cpp:36 (on tesseract source)

Ubuntu Installation

When running 'mvn clean package' I get this, any idea ?


T E S T S

Running net.sourceforge.tess4j.TessAPI1Test
Tests run: 58, Failures: 0, Errors: 58, Skipped: 0, Time elapsed: 0.211 sec <<< FAILURE!
Running net.sourceforge.tess4j.util.PdfUtilitiesTest
getPdfPageCount
convertPdf2Png
convertPdf2Tiff
Tests run: 5, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 0.401 sec
Running net.sourceforge.tess4j.TestFolderExtraction
Feb 15, 2015 4:55:36 PM net.sourceforge.tess4j.TestFolderExtraction testFolderExtraction
INFO: Loading the tessdata folder into a temporary folder.
/tmp/tess4j/tessdata
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle? braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marr?n r?pido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o 050 preguieoso.

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.071 sec
Running net.sourceforge.tess4j.TessAPITest
TessBaseAPISetImage
Error in pixCreateHeader: width must be > 0
Error in pixCreateNoInit: pixd not made
Error in pixCreate: pixd not made
Error in pixGetData: pix not defined
Error in pixGetWpl: pix not defined
Error in pixSetYRes: pix not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined
TessResultIteratorGetChoiceIterator
ocr alive: false
progress: 0
Message: 0
symbol n, conf: 78.800751
TessBaseAPIEnd
TessBaseAPIGetUTF8Text
ocr alive: true
progress: 100
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle? braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marr?n r?pido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o 050 preguieoso.

OSD
PSM: PSM_AUTO_OSD
Orientation: ORIENTATION_PAGE_UP
WritingDirection: WRITING_DIRECTION_LEFT_TO_RIGHT
TextlineOrder: TEXTLINE_ORDER_TOP_TO_BOTTOM
Deskew angle: 0.0138

TessBaseAPIGetInitLanguagesAsString
TessBaseAPISetPageSegMode
TessBaseAPIRect
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle? braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marr?n r?pido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o 050 preguieoso.

TessBaseAPIGetBoolVariable
TessBaseAPIPrintVariablesToFile
TessBaseAPISetVariable
TessResultRenderer
Error during processing.
TessBaseAPISetOutputName
TessBaseAPIProcessPages
TessBaseAPIClear
TessBaseAPIInit1
TessBaseAPIInit2
TessBaseAPIInit3
TessBaseAPIInit4
TessBaseAPIGetLoadedLanguagesAsVector
TessBaseAPICreate
TessBaseAPIDelete
TessBaseAPISetRectangle
TessBaseAPIGetPageSegMode
TessBaseAPIGetAvailableLanguagesAsVector
TessBaseAPIGetHOCRText
TessBaseAPISetInputName
TessBaseAPIGetIterator
ocr alive: false
progress: 0
ocr alive: false
progress: 0
ocr alive: false
progress: 0
ocr alive: false
progress: 0
ocr alive: false
progress: 0
ocr alive: false
progress: 0
ocr alive: false
progress: 0
Message: 0000000
Bounding boxes:
char(s) left top right bottom confidence font-attributes
The 105 66 178 97 90.477913 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
(quick) 205 67 347 106 87.597588 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
[brown] 376 69 528 109 89.419556 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
{fox} 559 71 663 110 89.750946 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
jumps! 687 73 823 113 90.885422 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
Over 104 115 199 147 90.350136 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
the 224 117 283 148 85.878929 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
$43,456.78 310 117 533 155 86.597305 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
561 121 696 162 90.273590 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
#90 722 123 791 154 90.512245 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false

dog 818 125 887 165 87.420654 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
& 103 165 134 196 91.004303 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
duck/goose, 160 166 396 206 88.156059 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
as 424 178 463 201 90.320709 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
12.5% 493 171 614 203 91.406738 font: timesbd, size: 10, font id: 118, bold: true, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
of 638 172 680 204 87.761482 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
E-mail 700 174 835 206 90.870163 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
from 103 215 194 247 89.095757 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
[email protected] 220 219 716 260 88.013405 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
ocr alive: true
progress: 100
is 742 223 773 255 91.161392 font: georgia, size: 9, font id: 113, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
spam. 799 233 911 264 87.911133 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
Der 102 266 173 297 91.957092 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
,,schnelle? 198 267 406 302 77.849152 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
braune 433 269 568 302 91.801796 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
Fuchs 594 272 709 304 89.163849 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
springt 735 274 877 314 81.703934 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
iiber 102 315 187 347 77.482101 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
den 212 317 280 348 88.632729 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
faulen 306 318 430 350 86.671204 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
Hund. 456 320 572 352 90.255211 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
Le 601 322 648 354 87.890198 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
renard 674 324 803 356 88.245270 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
brun 827 325 918 357 90.243034 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
<<rapide? 101 366 274 405 84.881035 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
saute 302 373 403 400 88.049896 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
par-dessus 428 371 641 409 85.305786 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
le 667 372 700 404 91.223145 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
chien 725 374 833 406 91.348145 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
paresseux. 100 424 308 454 88.653801 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
La 337 419 384 450 88.389328 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
volpe 409 420 516 459 90.512085 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
marrone 543 430 707 455 88.570694 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
rapida 733 424 859 464 85.022652 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
salta 100 466 192 497 87.467209 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
sopra 219 475 324 507 87.601593 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
i] 351 468 376 499 90.915085 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
cane 403 478 491 501 90.802002 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
pigro. 517 471 633 511 89.557564 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
El 662 473 703 504 95.099556 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
zorro 729 482 834 506 88.975525 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
marr?n 99 516 242 548 76.536209 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
r?pido 268 517 395 557 74.100021 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
salta 421 520 513 552 88.454437 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
sobre 540 521 644 554 92.879433 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
el 669 523 702 554 91.275024 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
perro 728 532 833 563 87.961540 font: times, size: 9, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
perezoso. 98 574 284 604 90.293137 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
A 313 568 342 598 93.391632 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
raposa 369 578 497 609 90.341415 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
marrom 523 579 677 604 89.250183 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
rzipida 703 573 829 613 77.346802 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
salta 98 616 190 647 81.067596 font: georgia, size: 10, font id: 113, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
sobre 217 617 320 649 90.234375 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
o 346 627 366 650 92.162483 font: trebucbd, size: 10, font id: 122, bold: true, italic: false, underlined: false, monospace: false, serif: false, smallcap: false
050 391 621 456 651 73.179626 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
preguieoso. 481 621 710 661 74.688721 font: times, size: 10, font id: 117, bold: false, italic: false, underlined: false, monospace: false, serif: true, smallcap: false
TessVersion
3.03
Tests run: 29, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 7.691 sec <<< FAILURE!
Running net.sourceforge.tess4j.Tesseract1Test
Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 0.006 sec <<< FAILURE!
Running net.sourceforge.tess4j.TesseractTest
doOCR with configs
1116 1111116111 11110111111 110111 111111951
01161 1116 343945678 4132317 11490 603
81 611611130056. 35 12.5010 01 15-111311
110111 35930061006096.6010 15 593111.
1361 91561111611619 131311116 5116115 59111131
111361 6611 13111611 1-111116. 1.6 1611316 1311111
4113916611 531116 1331-6655115 16 6111611
93165561114. 1.3 110196 1113110116 139163
53113 50913 11 63116 91310. 1-31 20110
111311011 139160 53113 501316 61 96110
96162050. 18 139053 1113110111 139163
53113 501316 0 650 91631116050.

doOCR on a skewed PNG image
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose. as 12.5% of E-mail
from [email protected] is spam.
Der ,.schnelle? braune Fuchs springt
?ber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien

paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro

marr?n r?piclo salta sobre el perro

perezoso. A raposa marrom r?pida

salta sobre o c?o preguieoso.

doOCR on a BMP image with bounding rectangle
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle? braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marr?n r?pido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o 050 preguieoso.

doOCR on a buffered image of a GIF
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle? braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marr?n r?pido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o 050 preguieoso.

createDocuments for an image
Feb 15, 2015 4:55:50 PM net.sourceforge.tess4j.Tesseract createDocuments
SEVERE: Error during processing.
net.sourceforge.tess4j.TesseractException: Error during processing.
at net.sourceforge.tess4j.Tesseract.createDocuments(Tesseract.java:490)
at net.sourceforge.tess4j.Tesseract.createDocuments(Tesseract.java:464)
at net.sourceforge.tess4j.TesseractTest.testCreateDocuments(TesseractTest.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)

Feb 15, 2015 4:55:51 PM net.sourceforge.tess4j.Tesseract createDocuments
SEVERE: Error during processing.
net.sourceforge.tess4j.TesseractException: Error during processing.
at net.sourceforge.tess4j.Tesseract.createDocuments(Tesseract.java:490)
at net.sourceforge.tess4j.Tesseract.createDocuments(Tesseract.java:464)
at net.sourceforge.tess4j.TesseractTest.testCreateDocuments(TesseractTest.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)

doOCR on a PDF document
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle? braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marr?n r?pido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o 050 preguieoso.

doOCR on a PNG image
The (quick) [brown] {fox} jumps!
Over the $43,456.78 #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle? braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide? saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marr?n r?pido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o 050 preguieoso.

Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.825 sec <<< FAILURE!

Results :

Failed tests: testTessBaseAPIProcessPages(net.sourceforge.tess4j.TessAPITest): expected:<1> but was:<-2133209520>
testCreateDocuments(net.sourceforge.tess4j.TesseractTest)

Tests in error:
testTessBaseAPISetImage(net.sourceforge.tess4j.TessAPI1Test): Error looking up function 'TessChoiceIteratorGetUTF8Text': /usr/lib/libtesseract.so.3.0.3: undefined symbol: TessChoiceIteratorGetUTF8Text
testTessBaseAPISetImage(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testChoiceIterator(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testChoiceIterator(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIEnd(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIEnd(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetUTF8Text(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetUTF8Text(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testOSD(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testOSD(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetInitLanguagesAsString(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetInitLanguagesAsString(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetPageSegMode(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetPageSegMode(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIRect(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIRect(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetBoolVariable(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetBoolVariable(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIPrintVariablesToFile(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIPrintVariablesToFile(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetVariable(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetVariable(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testResultRenderer(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testResultRenderer(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetOutputName(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetOutputName(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIProcessPages(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIProcessPages(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIClear(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIClear(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit1(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit1(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit2(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit2(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit3(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit3(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit4(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIInit4(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetLoadedLanguagesAsVector(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetLoadedLanguagesAsVector(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPICreate(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPICreate(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIDelete(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIDelete(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetRectangle(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetRectangle(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetPageSegMode(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetPageSegMode(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetAvailableLanguagesAsVector(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetAvailableLanguagesAsVector(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetHOCRText(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPIGetHOCRText(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetInputName(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessBaseAPISetInputName(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testResultIterator(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testResultIterator(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessVersion(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testTessVersion(net.sourceforge.tess4j.TessAPI1Test): Could not initialize class net.sourceforge.tess4j.TessAPI1
testChoiceIterator(net.sourceforge.tess4j.TessAPITest): Error looking up function 'TessResultIteratorGetChoiceIterator': /usr/lib/libtesseract.so.3.0.3: undefined symbol: TessResultIteratorGetChoiceIterator
testDoOCR_File_With_Configs(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1
testDoOCR_SkewedImage(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1
testDoOCR_File_Rectangle(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1
testDoOCR_BufferedImage(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1
testCreateDocuments(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1
testDoOCR_List_Rectangle(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1
testDoOCR_File(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1
testExtendingTesseract1(net.sourceforge.tess4j.Tesseract1Test): net.sourceforge.tess4j.TessAPI1

Tests run: 108, Failures: 2, Errors: 67, Skipped: 2

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 33.024s
[INFO] Finished at: Sun Feb 15 16:55:53 UTC 2015
[INFO] Final Memory: 15M/102M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project tess4j: There are test failures.
[ERROR]
[ERROR] Please refer to /home/ubuntu/tess4j/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Tess4j on OpenCL / Multithread

Hello,
I have seen on the Internet that it is possible to use tesseract with OpenCL.
Could tess4j support it ? It would be a tremendous feature.
Vulkan ? But it may be a bit early :-)

HOCR output location

Hi, I've enable HOCR output for Tess4j but I'm unable to find the resulting hocr file.

Where is the default output location for it?

Thanks

Invalid calling convention 63 (Ghostscript, Tess4J, Ghost4J)

Hello,

I'm using tess4j to make a webapp that scans PDF bills or pictures to extract a certain reference number. I developed the webapp on Windows where everything is working fine so far ( although i have some image pre-processing to implement). Since the webapp will be running on a Linux server i'm running some test on a Debian system. I have compiled tesseract libraries and leptonica and also installed ghostscript and the required libraries. When testing an image i get the correct results but when it comes to testing pdf I have an invalid calling convention 63 error. I've browsed on various post related to this subject and applied the suggested solutions.
My system:
Debian 8 Jessie
Tesseract 3.04.01
Leptonica 1.72
The webapp is running with Tomcat 8.0.33

*What I have done so far to fix this issue: *

  • Switched the JNA from 4.2.2 to 4.1.0 in the provided libs for Ghost4j
    zippy1978/ghost4j#44
  • Switched Ghostscript version from 9.18 to 9.16 according to this post
    #30 (comment)

Regardless of all that, the issue persists. So I would appreciate some pointers on how to fix this issue. Is it related to Debian ? Is there something I have done wrong ? Maybe there's a library I failed to install. The ocr works for pictures so my suspicion is that the problem is linked to PDF conversion.

The next option I will try is to make the tests on a CentOS system. But I thought that if I could get it running on Debian it should be running anywhere. I'd like to point out that I'm a student so I apologize if I seem unclear or lack some technicalities. This my internship subject so I'm doing my best to get it working. If needed I'll provided additional information.

Thank you

Stacktrace
`
type Exception report

message Request processing failed; nested exception is java.lang.IllegalArgumentException: Invalid calling convention 63

description The server encountered an internal error that prevented it from fulfilling this request.

exception

org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.IllegalArgumentException: Invalid calling convention 63
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:980)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:859)
javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:844)
javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
root cause

java.lang.IllegalArgumentException: Invalid calling convention 63
com.sun.jna.Native.createNativeCallback(Native Method)
com.sun.jna.CallbackReference.(CallbackReference.java:239)
com.sun.jna.CallbackReference.getFunctionPointer(CallbackReference.java:413)
com.sun.jna.CallbackReference.getFunctionPointer(CallbackReference.java:395)
com.sun.jna.Function.convertArgument(Function.java:541)
com.sun.jna.Function.invoke(Function.java:305)
com.sun.jna.Library$Handler.invoke(Library.java:236)
com.sun.proxy.$Proxy49.gsapi_set_stdio(Unknown Source)
org.ghost4j.Ghostscript.initialize(Ghostscript.java:323)
net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Png(PdfUtilities.java:103)
fr.ocr.jouleye.analysis.FileAnalyzer.Analyse(FileAnalyzer.java:67)
fr.ocr.jouleye.controller.FileUploadController.showPDL(FileUploadController.java:112)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:817)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:731)
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:968)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:859)
javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:844)
javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
note The full stack trace of the root cause is available in the Apache Tomcat/8.0.33 logs.`

Add tesseract linux-x86-64 shared objects to the distribution

Currently, only win32-x86 and win32-x86-64 are supported via .dlls that are in the distro, but it is possible to get tess4j working under linux via:

sudo apt-get install tesseract-ocr tesseract-ocr-vie
export TESSDATA_PREFIX=/usr/share/tesseract-ocr/

However, with code that uses tess4j, and is monitored with a linux continuous integration server that one does not have the ability to install libraries to, this is not possible. Hence, would it be possible to add {32.64} bit tesseract linux .so s to the repo?

jbig2 Issue: Information: Globals not set

Do someone have an easy fix, that does not invade our project to much for this penetrating log message from jbig2:

com.levigo.jbig2.util.log.JDKLogger info
Information: Globals not set.

Thanks in advance!

Tess4j 3.3.0 in Windows and Tomcat - UnsatisfiedLinkError

Hi I am having some issues after upgrading my app to tess4j 3.3.0. JNA wont find the dll no matter were I put them.

java.lang.UnsatisfiedLinkError: The specified module could not be found.

at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:263)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:403)
at com.sun.jna.Library$Handler.<init>(Library.java:147)
at com.sun.jna.Native.loadLibrary(Native.java:502)
at com.sun.jna.Native.loadLibrary(Native.java:481)
at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:75)
at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:42)

I am using Windows 2012 R2 (64bit)
I have installed:

I was using tess4j 3.0.0 before without any problems.

Is there any place where I can find more docs about how to point to the right location?

Do I need to install anything else?

Thanks!

I cannot use the traineddata of chi_tra!

i can not use tess4j for traditional chinese
it will show something like these:

ParamsModel::Incomplete line ?覆G鋐蚆?�鵿?��蚇灸��K濘??�??琨k戒?�?0'??接$(??洚^?"反?=I1芚錄蹍??&????-渶郈姾?�??�?
ParamsModel::Unknown parameter 渣c勤穌1f?滇?>鶷@b抰?�,?p澦
ParamsModel::Incomplete line ?
ParamsModel::Incomplete line m�??-蜺?沁?�?(??D牒鼣�???��CI]???:濿)陽??3c撙?�鵡�??<駤容???篴??!?"!x??O罊???�褂剬?
ParamsModel::Incomplete line p??蕥}�8�抰@?��??�矓T�嬰???7JG????#q+��??s齊/>=捫?43t�F??k僱諗?"?(C敹w栭^埥醛??硨
ParamsModel::Incomplete line 吽鉌?T?2?�覦D�陲?徨響????0鏀湟蟫?�XpSl蛾案^?��??6?��??3?7?U諼篛]??粲趑?/遛飂7�篹?-髯�?
ParamsModel::Incomplete line ??�草)梲=i??W��???��筈Z棑諧KB?F灆?�pk;�?oS鏒-憎?C*�m�憀?G�H??a�}??j?>,馭恛磅???
ParamsModel::Incomplete line \捫�g?8??�E??7��]t{??銧???輾@▼郿

Automatically load libs from jar

Right now users have to extract the desired lib (dll) in order to be able to use the tess4j library properly.
We should load those libs depending on the OS automated into the java temp folder.

PoC already successful in the past.

Tess4j output differs from Tesseract cli output

Here are the steps

  1. git clone https://github.com/nguyenq/tess4j.git
  2. cd tess4j
  3. mvn clean package (it will create both shaded and normal jar)
  4. Install the normal jar into local repository
    mvn install:install-file -Dfile=target/tess4j-2.0.0-SNAPSHOT.jar -DpomFile=pom.xml

I created a simple repository to ease your testing
5. git clone https://github.com/brucardoso2/tess4jtest.git
6. mvn clean package (it will create both shaded and normal jar)

Now run the shaded jar with this image: https://www.dropbox.com/s/hhas9bddxpiaj97/workingimage001.png?dl=0
7. java -jar target/ShadedMaven-1.0-SNAPSHOT-shaded.jar workingimage001.png

It should output

X
Y
Y

But tess4j outputs "Empty page !!"


Tesseract version used:

$ tesseract --version
tesseract 3.02
 leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit]
  libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5

Error:Assert failed:in file unicharset.cpp, line 270

I'm getting this error on an image:

id < this->size():Error:Assert failed:in file unicharset.cpp, line 270

object TesseractTest {

  def main(args: Array[String]): Unit = {
    val tess = new Tesseract
    tess.setDatapath("/usr/local/share/tessdata")
    tess.setLanguage("equ")

    val bi = ImageIO.read(new File("/tmp/test.png"))
    val words = tess.getWords(bi, TessPageIteratorLevel.RIL_WORD).toList
    println(words)
  }
}

on MacOSX.

I don't get this error when running tesseract on the command line:

$ tesseract /tmp/test.png out -l equ hocr
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Warning in pixReadMemPng: work-around: writing to a temp file
OSD: Weak margin (0.66), horiz textlines, not CJK: Don't rotate.

screen shot 2016-02-26 at 5 28 18 pm

As you can see, I'm trying to use the equations training data. It seems this error always happens when trying to use that training data. I have it loaded in my /usr/local/share/tessdata/equ.trainingdata from https://github.com/tesseract-ocr/tessdata/blob/master/equ.traineddata. I have not seen this error on other languages.

Any idea what's happening?

Other languages can not use except eng

OS X EI Capitan 10.11.1
JDK8_60
test4j 2.0.1
tesseract 3.04.00

i installed tesseraect from brew.

brew reinstall tesseract --all-languages --with-training-tools

tessdata path is /usr/local/share/ and it has chi_sim.traineddata
image

but when i use tess4j to load chi_sim, here is code

public class TesseractOCR {
    private static Logger logger = LoggerFactory.getLogger(TesseractOCR.class);

    //default config
    private final static String DEFAULT_TESSDATA_PATH = "/usr/local/share";
    private final static String DEFAULT_PAGE_SEG_MODE = "3";
    private final static String DEFAULT_LANG = "chi_sim";

    public static void main(String[] args) {
        Tesseract instance = new Tesseract();  // JNA Interface Mapping
        instance.setLanguage(DEFAULT_LANG);
        instance.setDatapath(DEFAULT_TESSDATA_PATH);
        instance.setPageSegMode(Integer.parseInt(DEFAULT_PAGE_SEG_MODE));
        BufferedImage image = Images.from("ocr/data/input/1.png");
        String result = "";
        try {
            result = instance.doOCR(image);
        } catch (TesseractException e) {
            logger.error("ocr image error!", e);
        }
        logger.info(result);
    }
}
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000012a54e933, pid=3139, tid=5891
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.dylib+0x12933]  tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+0xb9
#

the jvm crashed. here is log https://gist.github.com/fivesmallq/1f6d349c02e9bbab9b80

eng is ok.


also, i clone the tess4j project from github. and update junit test to set language chi_sim, put chi_sim.traineddata to src/main/resources, It appeared the same problem.

➜  tessdata git:(master) which tesseract
/usr/local/bin/tesseract
➜  tessdata git:(master) tesseract --list-langs
List of available languages (107):
...
chi_sim
chi_tra
...
➜  ocr  tesseract 2.jpg -l chi_sim result
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Warning in pixReadMemJpeg: work-around: writing to a temp file
Detected 56 diacritics

i use tesseract with the command line is ok.


is it not currently does not support tesseract 3.04.00 ?

Thank you

Exception when trying to use it on a .jar

Exception in thread "main" java.util.ServiceConfigurationError: javax.imageio.spi.ImageWriterSpi: Provider com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageWriterSpi could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
    at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
    at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
    at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
    at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:333)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:190)

How to change symbol value in TessResultIterator

Hi everyone, I am currently working on a project using Tess4J. I want to change the symbol value in the TessResultIterator like this:

TessResultIterator ri = api.TessBaseAPIGetIterator(handle);
TessPageIterator pi = api.TessResultIteratorGetPageIterator(ri);
api.TessPageIteratorBegin(pi);
int level = TessPageIteratorLevel.RIL_SYMBOL;
do {
    Pointer ptr = api.TessResultIteratorGetUTF8Text(ri, level);
    String symbol = ptr.getString(0);
    float confidence = api.TessResultIteratorConfidence(ri, level);
    if (confidence < 80){
        Rectangle rect =    // get symbol rect
        String newSymbol = doMyStuff(rect);
        ptr.setString(0, newSymbol);    // something like this
    }
} while (api.TessPageIteratorNext(pi, level) == TRUE);

Pointer utf8Text = api.TessBaseAPIGetUTF8Text(handle);
String result = utf8Text.getString(0);    // get final result here

How can I achieve something like that? I searched the API and have found TessMutableIterator, but I can’t make it work.
Any help would be appreciated. Thanks for your attention.

Proposal: Add UNZ Automated File Support

As one can read here #10, there is currently no automated support for unz files.
I propose to add an automated support that, would check for unz files, parse them and populate a java.awt.Rectangle object and proceed with the process.

WDT?

How to use the picture to do some processing to improve the recognition rate?

Hello, I would like to use tess4j tool optimization accuracy, is there any relevant case reference?

    File imageFile = new File("d:\\9.jpg");
    ITesseract instance = new Tesseract(); // JNA Interface Mapping // JNA Interface Mapping
    // ITesseract instance = new Tesseract1(); // JNA Direct Mapping
    instance.setDatapath("D:\\Java\\Tesseract-OCR\\tessdata"); // replace <parentPath> with path to parent directory of tessdata
    instance.setLanguage("eng");
    
    try {
    	BufferedImage bi = ImageIO.read(imageFile);
        String result = instance.doOCR(imageFile);`

Tesseract 4.0.0

Hello,

I have seem some comments in the code saying that ddls for Tesseract 4.0.0alpha were added.
How can I do to build/test tess4j for/with Tesseract 4.0.0 (ideally on a windows machine)?.

Thanks.

How to filter some characters

hello , i want filter some characters, How to do?
instance.setTessVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.