GithubHelp home page GithubHelp logo

devalente / text-xtractor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from roshan4798/text-xtractor

0.0 0.0 0.0 11.67 MB

Text-Xtractor is a Text extracting java application which is used to extract text from Images and PDFs and display it on the seperate panel on interface.

Java 100.00%

text-xtractor's Introduction

Text-Xtractor

APIs Used :

  1. OpenCV
  2. Tess4j (Java Wrapper of Tesseract OCR)
  3. PDFbox
  4. Webcam Capture

What it does ? A GUI Java application with 2 motives-

  1. Start Webcam and convert the image captured by it into text.
  2. Allows importing pdf files from your PC and then extract text from it.

How is it useful?

  1. Copy whatever text from your Mobile Phone browser into clipboard of your PC.
  2. Easy to import pdf files and get the text out of it.

How Accurate is it? Well the accuracy is an issue here. Although most of the time it performs well but because of the poor camera resolution on my PC, it gives out some random shit too some times. The issue here is ImageQuality and to manage that OpenCV is used to perform some image enhancements to improve accuracy of the OCR.

How OpenCV works? First the Captured image is converted into GrayScale i.e Black and white format. After that Gaussian Blur and adpative threshold techniques are used to remove noise.

text-xtractor's People

Contributors

roshan4798 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.