GithubHelp home page GithubHelp logo

ocr's Introduction

Java OCR Framework

An Optical Character Recognition Framework written purely in Java.

Installation

Build the project and add the jar for the project along with all the jars in the jar directory to your compile-time libraries.

Usage

There are 4 main parts to OCR:

  1. Normalization
  2. Segmentation
  3. Feature Extraction
  4. Classification

Feature Extraction and Classification are the only required parts. For Feature Extraction there are 5 algorithms at your disposal

  • Horizontal Celled Projection
  • Vertical Celled Projection
  • Horizontal Projection Histogram
  • Vertical Projection Histogram
  • Local Line Fitting

This framework loosely uses a Fluent Interface Builder syntax.

Example:

OCR ocr = OCRBuilder
            .create()
            .normalization(new Normalization())
            .segmentation(new Segmentation())
            .featureExtraction(
                FeatureExtractionBuilder
                    .create()
                    .children(
                        new HorizontalCelledProjection(5),
                        new VerticalCelledProjection(5),
                        new HorizontalProjectionHistogram(),
                        new VerticalProjectionHistogram(),
                        new LocalLineFitting(49))
                    .build())
            .neuralNetwork(
                NeuralNetworkBuilder
                    .create()
                    .fromFile("neural_network.eg")
                    .build())
            .build();

Contributing

Want to help out? Feel free to share your ideas.

  1. Fork it.
  2. Create a branch (git checkout -b my_fancy_feature)
  3. Commit your changes (git commit -am "Added amazing feature")
  4. Push to the branch (git push origin my_fancy_feature)
  5. Open a Pull Request

References

  • Arora, Sandhya (2008). “Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition”, IEEE Region 10 Colloquium. pp. 342-348
  • Haykin, Simon (1999). “Neural Networks A Comprehensive Foundation”, 2nd Edition. Pearson Education.
  • Perez, Juan-Carlos ; Vidal, Enrique ; Sanchez, Lourdes (1994). “Simple and Effective Feature Extraction for Optical Character Recognition”, Selected Paper From the 5th Spanish Symposium on Pattern Recognition and Image Analysis.
  • Zahid Hossain, M. ; Ashraful Amin, M. ; Yan, Hong (2012). “Rapid Feature Extraction for Optical Character Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 6. pp. 801-813

Thanks

Thanks to Heaton Research for providing an amazing Neural Network framework. Also thanks to Apache Math Commons for doing all the math without the mess.

ocr's People

Contributors

zoso10 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.