GithubHelp home page GithubHelp logo

text-extraction-app's Introduction


Image Text Extraction Telegram Bot by Boramorka
🤖

Built with

How To UseHow To Run LocallyBuilt processFeedback

You may interested in this bot if you need to recognize some text from the image. It's free and quick.

Supported languages:

  • ✔️ English
  • ✔️ Russian

The 🗝️ key technology is a Tesseract OCR by Google that has Python API. Tesseract

How To Use

🤖 Bot link: https://t.me/boramorka_text_extraction_bot

Usage

  • Send a photo of text. Type /lang to choose a language. ✔️
  • Make sure that your document has a white background, readable black letters and picture is not rotated. ✔️
  • If choosed EN+RU mode it recognises both languages at the same time. But more artifacts may arise. If your document is in one language, please select that language. ✔️

How To Run Locally

# Clone this repository
$ git clone https://github.com/boramorka/text-extraction-app.git

# Go into the repository
$ cd text-extraction-app

# Install dependencies
$ pip install requirements.txt

# Run app
$ python bot.py

Built process

  • First of all we creating an app.py file for the main app. It contains:

    # Path to pytesseract
    pytesseract.pytesseract.tesseract_cmd
    
    # Code for text recognition
    def get_text():
    ...............
  • Bot.py script starts the bot. It containts AIOGram. It's a pretty simple and fully asynchronous framework for Telegram Bot API written in Python 3.7 with asyncio and aiohttp. It helps you to make your bots faster and simpler.

    # Bot class takes an API key to connect to the Telegram servers.
    bot = Bot(token=os.getenv("TEXT_EXTRACTOR_API_KEY")) #Note: API key is envioroment variable
    
    """
    Dispatcher will process incoming updates: 
        • messages
        • edited messages
        • channel posts
        • edited channel posts
        • inline queries
        • chosen inline results
        • callback queries
        • shipping queries
        • pre-checkout queries.
    """
    dp = Dispatcher(bot) 
    
    # Decorator that takes a message and processes it.
    @dp.message_handler(text=message)
  • Heroku deployment: Important files:

    • 📄 bot.py: the bot application (refer to my Github for the source code)
    • 📄 Aptfile : the third-party dependencies for Heroku to install (e.g: tesseract-ocr)
    • 📄 Procfile : a list of process types in an app (on Heroku)
    • 📄 requirements.txt : a list of dependencies to install
    • 📄 runtime.txt : version of Python to run on Heroku (optional)
    # HEROKU DEPLOYMENT PROCESS
    
    # Note:
    # Add this line to bot.py
    pytesseract.pytesseract.tesseract_cmd = "/app/.apt/usr/bin/tesseract"
    # (refer to my Github for the source code)
    
    # Login to Heroku, and create a new app:
    $ heroku login
    $git init
    $heroku create boramorka-text-extraction-app
    $heroku git:remote -a boramorka-text-extraction-app
    
    # Add Buildpacks:
    $ heroku buildpacks:add --index 1 https://github.com/heroku/heroku-buildpack-apt
    $ heroku buildpacks:add --index 2 heroku/python
    
    # Add Config Vars:
    $ heroku config:set TESSDATA_PREFIX=/app/.apt/usr/share/tesseract-ocr/4.00/tessdata
    
    # heroku stack (heroku-20) has bad compatibility with tesseract.
    # You may need to change heroku stack from 20 to 18 using command:
    $ heroku stack:set heroku-18
    
    # Deploy app on Heroku:
    $ git add .
    $ git commit -m "Initial commit to Heroku"
    $ heroku git:remote -a boramorka-text-extraction-app
    $ git push heroku master
    
    # Check worker status:
    $ heroku ps
    
    # Run worker
    $ heroku ps:scale worker=1

Feedback

🤵 Feel free to send me feedback on Telegram. Feature requests are always welcome.

🧮 Check my other projects.

text-extraction-app's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.