GithubHelp home page GithubHelp logo

tomkita / tesseract-ocr-for-php Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thiagoalessio/tesseract-ocr-for-php

0.0 2.0 0.0 364 KB

A wrapper to work with TesseractOCR inside your PHP scripts.

License: MIT License

tesseract-ocr-for-php's Introduction

TesseractOCR for PHP

A wrapper to work with TesseractOCR inside your PHP scripts.

Instalation

Via composer (https://packagist.org/packages/thiagoalessio/tesseract_ocr)

{
    "require": {
        "thiagoalessio/tesseract_ocr": ">= 0.2.0"
    }
}

Or just clone and put somewhere inside your project folder.

$ cd myapp/vendor
$ git clone git://github.com/thiagoalessio/tesseract-ocr-for-php.git

Dependencies

IMPORTANT: Make sure that the tesseract binary is on your $PATH. If you're running PHP on a webserver, the user may be not you, but _www or similar. If you need, there is always the possibility of modify your $PATH:

$path = getenv('PATH');
putenv("PATH=$path:/usr/local/bin");

Windows users

I received several messages from people trying to get this library running under Windows, so I decided to write a short tutorial that can be found here.

Usage

Basic usage

<?php
require_once '/path/to/TesseractOCR/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer

$tesseract = new TesseractOCR('images/some-words.jpg');
echo $tesseract->recognize();

Defining language

Tesseract has training data for several languages, which certainly improve the accuracy of the recognition.

<?php
require_once '/path/to/TesseractOCR/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer

$tesseract = new TesseractOCR('images/sind-sie-deutsch.jpg');
$tesseract->setLanguage('deu'); //same 3-letters code as tesseract training data packages
echo $tesseract->recognize();

Inducing recognition

Sometimes tesseract misunderstand some chars, such as:

0 - O
1 - l
j - ,
etc ...

But you can improve recognition accuracy by specifing what kind of chars you're sending, for example:

<?php
$tesseract = new TesseractOCR('my-image.jpg');
$tesseract->setWhitelist(range('a','z')); //tesseract will threat everything as downcase letters
echo $tesseract->recognize();

$tesseract = new TesseractOCR('my-image.jpg');
$tesseract->setWhitelist(range('A','Z'), range(0,9), '_-@.'); //you can pass as many ranges as you need

You can even do cool stuff like this one:

<?php
$tesseract = new TesseractOCR('617.jpg');
$tesseract->setWhitelist(range('A','Z'));
echo $tesseract->recognize(); //will return "GIT"

Troubleshooting

Warnings like Permission denied or No such file or directory

To solve this issue you can specify a custom directory for temp files:

<?php
$tesseract = new TesseractOCR('my-image.jpg');
$tesseract->setTempDir('./my-temp-dir');

tesseract-ocr-for-php's People

Contributors

thiagoalessio avatar

Watchers

James Cloos avatar Tom Kita avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.