GithubHelp home page GithubHelp logo

fitrialif / ocr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aksth/ocr

0.0 2.0 0.0 24.15 MB

Optical character recognition using neural network. Implemented with Python and its libraries Numpy and OpenCV.

Python 100.00%

ocr's Introduction

ocr

Optical character recognition using neural network. Implemented with Python and its libraries Numpy and OpenCV.

We recommend you to view the presentation file inside docs first, which will give you a brief analysis of this project.

Python libraries needed: Numpy (Neural Network creation and data handling) OpenCV (Image processing) PyQT (GUI)

There are two parts to this project. First one is the neural network and the other is the image processing.

The OCR is performed in the following phases:

  1. Image is retrieved The image should be cropped in such a way that only text is present. Also, the background should be very lighter than the text. Ideal image would be black text on a white paper background.

  2. Preprocessing Noises are removed by blurring. The it is converted to binary image along with invert. For this we've used OpenCV methods such as gaussian blur and threshold.

  3. Segmentation Segmentation is divided into three parts. First we segment the image based on lines. Then the lines are separated into words. Lastly, the words are separated into characters. OpenCV methods such as projections and contour detections are used. The characters are then fed into the neural network.

  4. Neural Network There are two parts to neural network. First is Training Neural Network. For training the neural network, we first generated our own samples for each characters. That made a total of 260,000 samples. We used PHP's imagettftext() method using 10 different fonts. PHP generated each samples in an image format. So we then converted those images into numpy array and combined all samples with corresponding labels required by the neural network. Second is Recognizing characters. We used two NN for the classification. First is the classification for all characters. Second is based on the confusion classes. (For the detailed description of neural network and their implementaiton, we refered to the book at http://neuralnetworksanddeeplearning.com/ So if you want to know all about neural networks, their algorithms and workings, you should definitely give it a visit.)

  5. Optimizations Using two NNs for classification helped in better optimization of the recognition. Along with that, we also checked each words in the english dictionary to fix the spelling errors. (The dictionary was based on Peter Norvig's spell correct - http://norvig.com/spell-correct.html)

ocr's People

Contributors

aksth avatar

Watchers

James Cloos avatar Mohd Fitri Alif Bin Mohd Kasai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.