GithubHelp home page GithubHelp logo

paulchai / line-segmentation-algorithm-to-gcp-vision Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sshniro/line-segmentation-algorithm-to-gcp-vision

0.0 2.0 0.0 2.77 MB

Proposed line segmentation algorithm for Google Vision API.

License: Apache License 2.0

JavaScript 100.00%

line-segmentation-algorithm-to-gcp-vision's Introduction

Hex.pm Build Status

Introduction

Google vision outperforms most of the cloud ocr providers. It provides two options for OCR capabilities.

  • TEXT_DETECTION - Words with coordinates
  • DOCUMENT_TEXT_DETECTION - OCR on dense text to extract lines and paragraph information

The second option is preferred for data extraction from normal articles (Dense Text eg- News Papers, Books). But for images with sparse text content such as retails invoices the OCR segments the lines in a different order. If the distance of two words in a single line is too far apart then google vision identifies them as two separate paragraphs/lines.

The below images shows the sample output for a typical invoice from google vision.

screen shot 2018-01-15 at 3 55 59 pm

This behaviour creates a problem in information extraction scenarios. For example when reading a retail invoice and extracting the relevant price for the products. The algorithm proposed below provides line segmentation based on characters polygon coordinates for data extraction.

Proposed Algorithm

The implemented algorithm runs in two stages

  • Stage 1 - Groups nearby words to generate a longer strip of line
  • Stage 2 - Connects words which are far apart using the bounding polygon approach

screen shot 2018-01-15 at 4 50 31 pm

Explanation.

Stage 1 should be completed because for price related text like $3.40 is presented as 2 words by Google Vision (word 1: $3. word 2:,40). The first stage helps to concat nearby characters to form a text-block/word. This step helps reduces the computation needed for the second phase.

The stage 2 algorithm draws an imaginary bounding polygon (with a threshold) over the words and computes the words which belongs to each line.

Issues.

The algorithm successfully works for most of the slanted and slightly crumpled images. But it will fail to highly crumpled or folded images.

Usage

Node JS
  • cd nodejs
  • npm install
  • npm test

Future Work

Try to implement the water-flow algorithm for line segmentation and measure accuracies with bounding polygon approach.

waterflow

line-segmentation-algorithm-to-gcp-vision's People

Contributors

paulchai avatar sshniro avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.