charlesw / tesseract Goto Github PK

View Code? Open in Web Editor NEW

2.2K 2.2K 744.0 172.65 MB

A .Net wrapper for tesseract-ocr

License: Apache License 2.0

C# 99.97% Batchfile 0.03%

tesseract's People

Contributors

Stargazers

Watchers

Forkers

mingdynasty supermanetta nikolay87 shawn-liu-human wormsauce serak omeriel dougo1081 anasjaber eirikdal helioelias ddorismond wesleydonk edwin1409 codernbrsix jazzyjes vivimaster mangregory lholiday lvaleriu rush1980 m2j1984 allandragoon cormal gankov bolte-17 jklemmack d00kie bonyz bboyjairo bill-p lawrenceking rdegelo praveenkumar9947 firesword centaphi satishreddy81 tangtao-xp junli2020 lostprophet86 dlebedynskyi dtdesign killswitch1111 ico35 elektralex nelyud waild steelboxers igoro1975 funk03 jackyzzy peters noivern crashtest9 marcorsouza luttyfrutty danbopes marco-diego pplbud marjeof pingxumeng jakesays-old danilovav backlashs charlesbaynham rajasajidmanzoor roverlabs lucindom nicktherock cichy3000 pradinski broetje ayesandark miguelerm erikborger shtpavel eklypse3681 rsteng2 lidermanrony jonathankeav smartleos only4omkar paulrosenkranz hegdegoutam brandonseydel yetiea woinar abhinav-koppula huanghws2013 ihar-krupenin alisyed123 dva96lbn codeisgreat voltes dalychan guibertjulien amferguson tito-cu malkassem yunardopatra

tesseract's Issues

C# Tesseract 3.02 How I access each character of word from image

Hi, I'm newbie here.
First, I need to draw rectangle on each character of word from image.
in old version of tesseract I found that we can access each character by

foreach (tessnet2.Character c in word.CharList)
e.Graphics.DrawRectangle..........

But, now I'm working on C# winform with Tesseract 3.02

TesseractEngine a = new TesseractEngine(@"./tessdata", "eng", EngineMode.TesseractAndCube);
Tesseract.Page page1 = a.Process(image);
foreach ( ....... in page1)
{
// draw rectangle from (bounding box of each character)
}

Question 1: how i access each character of page1.

I try many method like PageIteratorLevel and get some part of page like first line, first word or first block , but i can't get first character of them.
Well, I notice that on result text of HOCRtext from page1 each element like word, line , block has Bounding box's value.

Question 2: how i get value of bounding box of each element. ( I found only 1 method "TryGetBoundingBox" that return only boolean.

thank you.

Leptonica: Update Pix to support accessing\creating colour images

The Tesseract.Pix class doesn't currently support exposing or defining colour metadata through the colour map. This is required for dealing with colour images.

Update build script to support versioning (automatic).

It would be good if we could update the build script that generates the nuget package build.ps1 to support versioning using a pre-defined file (say version.txt) in the repository.

The build script should also update the AssemblyInfo.cs files before compiling the project so the resulting dlls are also versioned.

how can I set filters to OCR?

Hi,
I am trying to solve captcha, that contains only english letters (A-Z),capitalize,1 line, 1 word, 5 charachters.

What is the correct code for that?

OtsuBinarizationTest fails when using x64 dlls on AMD64

It fails due to the interop call returning 1.
Workaround: Change platform to x86.

How to read multipage TIF file

I have a TIF file that is multiple pages. The function Pix.LoadFromFile(filename) appears to be only loading the first page.
Is there a way to load all the pages?
I would like to be able to read the entire document at once.

Thanks

Save Text from image to a label or a string in Asp.net C# using OCR

Hello
i have performed all the steps needed to use ocr in a web app (Asp.net and C#) as described here in #24

Project is working fine, there are no errors in project but, i am unable to save text value from image to a label or any string. You can download project here https://skydrive.live.com/redir?resid=FD89CF01AC9EA16D!448&authkey=!ADLbEdiJPWTyF6I&ithint=file%2c.zip

please tell me where i am doing mistake or i am missing some thing in project.

will be thankful to you.

Sajid

Not working in Website Application

When I Use your library in a console application ! It works fine !
But NOT in a Website Application VB.net !
I have a compilation error as soon as I put the line engine.Process(img).
The error is not very explicit
compilation failed: '0 xc0000005 'vbc: Command line

Do you have any idea of the problem ?
Thanks you

PS : Sorry for my bad english

Orientation and WritingDirection

Thanks for all you do. I have successfully created a batch OCR application using your wrapper. However, I occasionally run into an image where the text does not read from left-to-right. I have tried scouring the web to find the best method for determining page/text orientation. These methods are not demonstrated in the sample applications.

Here is my current page level code...

                using (TesseractEngine engine = new TesseractEngine(System.Environment.GetEnvironmentVariable("TESSERACT_PREFIX"), "eng", EngineMode.Default))
                {
                    Pix img = Pix.LoadFromFile(MyImgFilePath);
                    Page page = engine.Process(img);
                    img.Deskew();
                    //GetPageOrientation here
                    string text = page.GetText();
                    page.Dispose();
                    img.Dispose();
                    text = text.Replace("\n", "\r\n");
                    return text;
                }

How would I "AutoRotate" to get the correct text?
I have attached an image for testing.

can't compile project with visual studio 2012

The errors I get are:
Error 1 Unsafe code may only appear if compiling with /unsafe c:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net20\PixData.cs 6 22 Tesseract.Net45
Error 2 Unsafe code may only appear if compiling with /unsafe c:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net20\Pix.cs 7 32 Tesseract.Net45
Error 3 Unsafe code may only appear if compiling with /unsafe c:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net20\Interop\MarshalHelper.cs 7 32 Tesseract.Net45
Error 4 Unsafe code may only appear if compiling with /unsafe c:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net20\Interop\LeptonicaApi.cs 8 32 Tesseract.Net45
Error 5 Unsafe code may only appear if compiling with /unsafe c:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net20\BitmapHelper.cs 9 29 Tesseract.Net45
Error 6 Metadata file 'C:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net45\bin\Release\Tesseract.dll' could not be found C:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Tests\CSC Tesseract.Tests
Error 7 Metadata file 'C:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net45\bin\Release\Tesseract.dll' could not be found C:\Users\guy\Downloads\tesseract-master\tesseract-master\BaseApiTester\CSC BaseApiTester
Error 8 Metadata file 'C:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Net45\bin\Release\Tesseract.dll' could not be found C:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Tests.Console\CSC Tesseract.Tests.Console
Error 9 Metadata file 'C:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Tests\bin\Release\Tesseract.Tests.dll' could not be found C:\Users\guy\Downloads\tesseract-master\tesseract-master\Tesseract.Tests.Console\CSC Tesseract.Tests.Console

Add font information to result iterator

Font information isn't yet available from the result iterator. This will have two parts to complete this these being:

Add TessResultIteratorWordFontAttributes to BaseApi.cs
Add GetFont method to ResultIterator, not this should return a font class (probably cached by font id).

Also make sure that you free any required resource (check the doc for TessResultIteratorWordFontAttributes).

Signature retrieved from capi.h from https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.02-win32-lib-include-dirs.zip

TESS_API const char* TESS_CALL TessResultIteratorWordFontAttributes(
    const TessResultIterator* handle, 
    BOOL* is_bold, BOOL* is_italic, BOOL* is_underlined, BOOL* is_monospace, 
    BOOL* is_serif, BOOL* is_smallcaps, int* pointsize, int* font_id);

Failed to load C:\TEMP\w3wp_6.0.3790.1830 (srv03_sp1_rtm.050324-1447)\liblept168.dll

Hi,

I've got a problem. I'm using the tesseract wrapper in a C# .NET web project.
I've made a webservice that is running in IIS. When I try the code on my windows XP machine everything works fine but when I deploy the webservice on a windows 2003 server I get the following error.

The type initializer for 'Tesseract.Interop.TessApi' threw an exception.Failed to load 'C:\TEMP\w3wp_6.0.3790.1830 (srv03_sp1_rtm.050324-1447)\liblept168.dll'. Line 24

When I look into the temp folder the file is there but for some reason it doesn't get loaded. I've installed C++ 2005 and 2010 runtimes for X32.

Am I still missing something? Does somebody knows why I get this error?

Greetings

Meaning of Orientations

It is unclear to me what the different orientations denote; could you verify that I understand their meaning:

PageUp - The page is not rotated
PageRight - The right side is the top (the page is rotated 90 degrees)
PageDown - The bottom is the top (the page is rotated 180 degrees)
PageLeft - The left side is the top (the page is rotated 270 degrees)

Thank you!

Leptonica: Convert Pix To Bitmap (1bpp)

Referring to Issue #3.

"Added support for converting 8bit, 16bit, and 32bit pix's (should cover most cases hopefully). Please file a new task if additional depths are required."

I am working a lot with 1bppIndexed images but i could not get the conversion from pix to bitmap working in order to display the image.

Thank you for your effort.

Copy of iterator

I need make copy of iterator. I see that it isn't implemented.
I try implement it like:

public ResultIterator Clone()
{
        return new ResultIterator(Interop.TessApi.ResultIteratorCopy (handle));
}

But it seems not work. When i try call GetText method of coped iterator, application crash. What is wrong?
Thanks for any help.

working with uzn file not working

in a command line you would use "tesseract.exe pic1.bmp pic1.txt -psm 4" and put a pic1.uzn file in the current directory.
When I try
Tesseract.TesseractEngine tesseract = new Tesseract.TesseractEngine("....path... tessdata", "eng", Tesseract.EngineMode.Default);
Tesseract.Pix picture = Tesseract.Pix.LoadFromFile(@"...path... pic1.bmp");
Tesseract.Page page = tesseract.Process(picture, Tesseract.PageSegMode.SingleColumn); //PSM -4
...
string text = page.GetText();

will lead to an exception on GetText (same as tesseract.exe would fail if there is no uzn file)
Therefore I assume that the .net wrapper does not find (or search for) the uzn file.

Could you please tell me what to do or if this is a bug?

Using Tesseract in Web Appp

I'm trying to use this project with my Web App.

I created Class Library with the reference to Tesseract.Net45
When it comes to
var engine = new TesseractEngine(@"./tessdata", language.ToString(), EngineMode.Default) I get the error "Failed to initialise tesseract engine."
I see that it happens in Interop.TessApi.BaseApiInit function and I suppose the reason is the path to tessdata.

I've been trying different paths, put this directory to all possible places but it doesn't work.

How can I fix the problem?

Thank you in advance,
Igor

Problems with setting the "classify_bln_numeric_mode" variable

Hi I'm trying to read some numbers from a scanned document.
My first test was using the BaseApiTester. The only things I changed in the project was the path to the image, and I added following line:

bool ret = engine.SetVariable("classify_bln_numeric_mode", 1);

before the line:

using (var page = engine.Process(img)){

Whenever I set the variable the program will crash. If I don't set the variable the program does what it is supposed to.

I checked here (http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version)
to see what variable to set.

The error Im getting is:

System.AccessViolationException
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

And the Command Window:

Process image
first_unichar != NULL:Error:Assert failed:in file .\wordrec\language_model.cpp, line 445

Thanks

Feature Request: Port to .NET for Windows 8 Store Apps

I would appreciate it if there would be a port of the existing library to .NET for Windows 8 Store Apps. Are there any plans in this area?

Working on Mono

If i right understand this library don't work on mono, because embedded libtesseract302.dll library request kernel32 which has not in mono.
But tesseract works on Linux.
My question:
Do you have plan to make mono compatibility?
Is this possible make to on no windows os instead load embedded dll, load system tesseract library like libtesseract.so.3?

.Net Wrapper for .Net Framework 4 Client Profile

First of all, thank you so much for your excellent work on this project.
I just have an issue concerning the profile of the targeting Framework. All of my applications target the .Net Framework 4 Client Profile, not full. It seems the Tesseract.dll can not be added in a project targeting .Net 4 Client Profile because of "System.Web". Is it really necessary? Is there any way to make the Tesseract DLL compatible with .Net 4 Client Profile applications?

Thank you in advance!

Hide/Wrapper Pix type?

Why not use Bitmap or Image in public method?

For instance:
public Page Process(Pix image, PageSegMode? pageSegMode = null)
=>
public Page Process(Bitmap image, PageSegMode? pageSegMode = null)
{
... internal convert....
}

How to read text from image in C# ASP.Net

Hello
i am a new to Asp.net and C# development and i want to create a program that takes image and then read text from the image and show text to user. I want to use tesseract OCR, i have downloaded all the files from tesseract OCR project and tesseractdotnet Project but none of them is working fine. All have some errors or issues.
i am using Visual Studio 2012 with Dot net Frame work 4.5
Can any body please tell me step by step guide on how can i read text from images in C# Asp.net using Tesseract OCR.

Thanks
Sajid Manzoor

Could not build Tesseract.Net20 & Tesseract.Net40 projects

Could not build projects because of:

Could not copy the file "C:\Users\root\Documents\GitHub\tesseract\lib\TesseractOcr\liblept168.dll" because it was not found.

I believe that it's because of 0edf6a0 moved libraries in separated folders

Read a specific font size text in image in OCR

Hello
i want to read a specific font size text in image.
for example
There are three lines in image with different text and different font size
Line 1 has text ABC with font size 16 and has text 12 with font size 12.<= in this line i want to read text with font size 16 only.
Line 2 has text 12345 of font size 16 and i want to read this line complete. and line line3 has text ABCDEF with font size 12 and i dont want to read this line.

Please tell me how can i do this..

Thanks
Sajid

add 64bit support

Sorry it is not a new issue but can you provide the lib files liblept168 and libtesseract302 as 64bit compiled dll.
Am trying to run the application on 64bit platform but it keeps throwing error on loading the mentioned dlls.

Thank you

GetHOCRText always returns first OCR for first page

So it seems that GetHOCRText is always returning the OCR even though I request other pages

I have the following code

public static DocumentOCR PreformOcr(string fp)
{
using (var eng = new Tesseract.TesseractEngine(@"./tessdata", "eng", Tesseract.EngineMode.Default))
{
using (Tesseract.Pix p = Tesseract.Pix.LoadFromFile(fp))
{
using (var page = eng.Process(p))
{
HtmlAgilityPack.HtmlDocument hDoc = new HtmlAgilityPack.HtmlDocument();
var s = page.GetHOCRText(3);
var path = Path.GetTempPath() + "hocr.html";
File.WriteAllText(path, s);
hDoc.LoadHtml(s);
var body = hDoc.DocumentNode.ChildNodes[2].ChildNodes[3]; //should be the body of the html document
return GetDoc(body);
}
}
}
}

fp, in the case im getting this issue, is a multi-page .tiff file. I request the HOCR for page 3 but I get the HOCR for page 1. In fact no matter what page I ask for, even if I ask for page -1 or 0, I only get the first page.

Bunch of tests are failed because of missing 'libtesseract302.dll'

After I fix Tesseract.Net20 and Tesseract.Net40 projects with #46 20 tests are failed because of missing 'libtesseract302.dll'.

Suddenly the engine stop working

I´m working in a desktop application that uses your engine, since months ago i realize that the engine stop working in some machines at my work, I mean I don't know if it's because some windows update or what, but at the pc's from home (where I usually don't update quite often) the engine is working normally, the exception in question is this:

    private void Initialise(string datapath, string language, EngineMode engineMode)
    {
        if (Interop.TessApi.BaseApiInit(handle, datapath, language, (int)engineMode, IntPtr.Zero, 0, IntPtr.Zero, 0, IntPtr.Zero, 0) != 0)
        {
            // Special case logic to handle cleaning up as init has already released the handle if it fails.
            handle = IntPtr.Zero;
            GC.SuppressFinalize(this);

            throw new TesseractException("Failed to initialise tesseract engine.");
        }
    }

and I don't know how to fixed or if I should unninstall the updates.

ASP.NET Cannot Find lept/tess DLLs

Installed via NuGet successfully, but an exception gets thrown while creating a new engine: "{"Unable to load DLL 'libtesseract302': The specified module could not be found. (Exception from HRESULT: 0x8007007E)"} System.Exception {System.DllNotFoundException}"

I tried copying the sample .csproj, but got warning that ASP could not link the files, as they're already within the directory structure.

Which dll should I reference from win svr 2008 r2 iis

Hello,

First, thank you for this wrapper. I'm able to get it working with one exception as follows:

I'm having a dickens of a time getting this to work with an iis application running on windows server 2008 r2. I've had the complete range of errors mentioned in these threads involving the 32/64 bit, incorrect format, etc. I've change the app pools to except 32 bit apps, given iusr permission basically everywhere.. I've made sure to have 2008 runtime installed, I've made sure the tessdata is in place. I've made sure the two extra dlls are accessible.

So I have a simple question..

If you were to create an iis app in win svr 2008 r2, which dll would you choose to reference? This is a 64 bit machine.

Thanks,
L

Words coordinates

Hello,
is there any ability to retrieve words coordinates?
Thanks!

XML Comments

Could you add in more XML comments to make this a little easier to use? I've found that a few of the field/method names are not intuitive.

For example:
// In TesseractEngine.cs
// What does this do? It's not obvious what "Seg" means
public PageSegMode DefaultPageSegMode
{
get;
set;
}

Color class and the System.Drawing.Color class naming conflicts

When using Tesseract namespace the Color will conflict with System.Drawing.Color.

Leptonica: Create Bitmap from Pix

Need to be able to convert a pix to a bitmap so that they can be (easily) displayed in Winforms\ASP.NET apps.

Pix.Rotate method dropping image content

I have several images where the new Pix.Rotate() method truncates pixels.

Pix oldpix = Pix.LoadFromFile(MyImgFilePath);
MyAngle = (float)(90 * Math.PI / 180.0f);
Pix newpix = oldpix.Rotate(MyAngle);
newpix.Save(MyImgFilePath.Replace(".", ".new."));

I have also tried RotationMethod.Shear and also specifying larger height and width values.

I have attached 2 images, showing before and after, you can see that Rotate is truncating the email message metadata block at the top of the "old" images. The "new" image does NOT contain the data.

these were TIF images. GitHub doesn't allow for uploading tif. I saved as JPG.

Improve error messages.

Add error page to wiki for each TesseractException that get thrown and reference that page in the message. Use Tesseract.Internal.ErrorMessage.Format to generate a formatted error message. The corresponding wiki page name must be Error {ErrorNumber}.

Non-English language - cannot create an instance of the TrasseractEngine

Hi,

If I try to use any language setting other than "eng":
var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default)

it fails with the standard exception:
TesseractException(ErrorMessage.Format(1, "Failed to initialise tesseract engine."));

I have tried with "dan" for Danish or "rus" for Russian. I have downloaded language data files from http://code.google.com/p/tesseract-ocr/downloads/list and put them in the same folder with "eng"- files.

Am I doing something wrong or it is not supported?

Can't recognize 1-3 characters at all or very well.

When I have an image that have 4 or more characters, it works fairly well. But using 3 or less, it doesn't work so well or at all.

http://i.imgur.com/yboy4sn.png
http://i.imgur.com/EzMdhyG.png

I've never gotten 1-2 characters to work. Sometimes 3 works, sometimes 3 spits garbage, and sometimes 3 doesn't work at all.

I don't know if this is an actual issue or if there's a setting or something I need to change.

EDIT Err... right after posting this I discovered the DefaultPageSegMode property. Changing it to single word worked.

Convert .NET Bitmap to Leptonica Pix

I looked but couldn't find a function that converts .NET Bitmap type to Leptonica Pix type. I tried the following code but the depth value returned by LeptonicaAPI is always -1, causing exceptions to be thrown:

    private IPix ConvertBitmapToPix(Bitmap bmp)
    {
        IntPtr pval = IntPtr.Zero;
        BitmapData bd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite, bmp.PixelFormat);

        try
        {
            pval = bd.Scan0;
            //var depth1 = Bitmap.GetPixelFormatSize(bmp.PixelFormat);
            //var depth2 = Tesseract.Interop.LeptonicaApi.GetDepth(pval);
            return Pix.Create(pval);
        }
        finally
        {
            bmp.UnlockBits(bd);
        }
    }

The type initializer for 'Tesseract.Interop.TessApi' threw an exception

I'm running Tesseract from a WCF service. It's currently running locally. When I try to call my method, this exception is thrown when I instantiate a new TesseractEngine.

I looked at the other issues people were having and installed the SP1 runtime mentioned in another issue. But that didn't resolve it.

I noticed in another issue that it might be a permission issue because the wrapper embeds the libraries, liblept168.dll and libtesseract302.dll, and then extracts them during runtime.

Does anyone know which directory that is? I didn't see any information on the main page that specified where it extracts to.

Thanks!

EDIT I looked at the code and it looks like it extracts the library to the local temp path. Which on WIndows 8 for my PC, it's C:\Users\eperez\AppData\Local\Temp. It then creates a directory named after the calling process and places them there. I'm looking into if this is causing the issue.

EDIT OK. So, I cleaned up my temp directory just so I could more easily see if it would write the libraries there. But after running it, it worked. So I'm not sure if it worked because there wasn't enough space, or if it was conflicting with a version already there or what. But it's working. I'm going to keep playing with it to see what happens.

Failed to initialize Tesseract.

Hi,

I'm trying to run the BaseApiTester project. But everytime i run it I get the following:

Error opening data file C:\Program Files (x86)\Tesseract-OCR\tessdata/nld.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

So I uninstalled Tesseract and removed the TESSDATA_PREFIX variable. But it's still not working?

Please help me 👯

Tesseract 3.02 falls over with skewed text lines.

hello,
i have this bug
http://code.google.com/p/tesseract-ocr/issues/detail?id=643

the resolution is to call Leptonica's pixFindSkewAndDeskew().
but the leptonicaApi(in the wrapper) doesn't contain such method , i tried to added it myself but with no success... If you can add it or give me some instructions how to do this..
thanks

Leptonica: Provide method to convert between depth levels

This is really to assist in testing as Issue #15, in order to test this we need a 1bpp Pix which we can convert. The idea here is that we first add a conversion function that can convert between any depth (1,2,4,8,16,32) to any other depth maintaining as much information as possible. This will probably require an optional parameter for a hint that determines performance vs quality (for instance if were down-sampling a 32bit px down to an 8bit one).

Fixed an issue in TryGetBoolVariable implementation

The aforementioned method is implemented using TessApi.BaseApiGetIntVariable and is not working. The correct method should be TessApi.BaseApiGetBoolVariable. Tested this with 'classify_enable_learning' and 'classify_enable_adaptive_matcher' variables.

PixConverter.ToPix(Bitmap)

I am having issues with image conversion using the PixConverter.ToPix method.

I have images that are rotated either PageLeft or PageRight orientation. I need to rotate these so they will OCR correctly.

I have a couple options...

Load the bitmap directly into the engine (Page page = engine.Process(bitmap).
Convert the bitmap to Pix and load the Pix (Pix img = Pix.LoadFromFile(filename).

Either way, I need to rotate the images to the correct orientation.

Pix does not have a method to rotate (that I can see).
I can rotate with System.Graphics.Bitmap, then convert to Pix using PixConverter.ToPix. However, the image quality is diminished and the OCR quality becomes "suspect". I get the same quality passing the Bitmap directly to the engine.

I can "open" an image a a Bitmap, rotate and save.
then open the saved image with as a Pix. This gives me great quality at the cost of speed.

Anything I am missing or doing incorrectly using the PixConverter.ToPix method?

Bitmap bmp = (Bitmap)Bitmap.FromFile(MyImgFilePath);
Pix img = PixConverter.ToPix(bmp);
Page page = engine.Process(img);

Cannot convert Bitmap to Image (x64)

The current web example fails when running in a x64 process with System.AccessViolationException when trying to convert the Bitmap to a Pix.

Steps to reproduce:

Load Tesseract.WebDemo in IIS (NOT IIS Express)
Load any test picture.
Hit submit, the mentioned exception is thrown (this also causes the process to crash so it's quite serious).

Wrappers around leptonica

I was looking at your commit for the addition of 1bpp support and feel that I could use this to identify what to tweak in order to add wrappers around some binarization methods like those definded in leptonica's leptprotos.h (e.g. pixOtsuAdaptiveThreshold or pixOtsuThreshOnBackgroundNorm). Nevertheless, I certainly need more insight and time to do so...
I would like to implement those in order to use you wrapper in my attempt to quickly setup a tesseract test class that I could use to test/search for the best image manipulation prior to an OCR. Do you plan to add these?

Split Tesseract and Leptonica into separate projects

Request for comment:

I've been considering splitting the Tesseract and Leptonica into separate projects\dlls this way you could include the Leptonica reference without having to include Tesseract. However you would still need Leptonica to use Tesseract.

The only disadvantage I can think of is that this would be a fairly substantial break in the API since all the Leptonica stuff would reside in a new dll and namespace.

So what do you think?

Can the tesseract OCR run in Windows Phone 8 platform?

Since I successfully installed the tesseract OCR from the Nuget package manager in the Windows phone 8 app project, will it be able to run like normal? I ask this as i am still in the beginning stage of using the library, and i am still trying my best to solve some problem like "Error 1 The type or namespace name 'tesseract' could not be found (are you missing a using directive or an assembly reference?)". Truly thanks in advance.

charlesw / tesseract Goto Github PK

tesseract's People

Contributors

Stargazers

Watchers

Forkers

tesseract's Issues

Steps to reproduce:

Recommend Projects

Recommend Topics

Recommend Org

Jobs