GithubHelp home page GithubHelp logo

narkhedesam / pyim2speak Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 186 KB

Read image and speek the text content using python

License: MIT License

Python 100.00%
pyim2speak python tesseract-ocr gtts playsound langdetect pil opencv-python

pyim2speak's Introduction

PyIm2Speak

Read image and speek the text content using PIL, langdetect, tesseract, gtts and playsound with Python

paypal

Before you start

You must have to install tesseract-ocr on your local pc

Parameters

  • -i --image - path to input image to be OCR
  • -p --processing - For type of pre-processing to be done(thresh, blur)
  • -f --format - Format for the output(speak)

Coding

Read the Image

image = cv2.imread(args['image'])

Convert image to grayscale image

gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Process the Image for threshold and blur according to user's choice For Threshold

gray_image = cv2.threshold(gray_image, 0, 255,
                cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

For Blurr

gray_image = cv2.medianBlur(gray_image, 3)

Store the image into temp folder with name as process_id of progrma

filename = "temp" + os.path.sep + "{}.png".format(os.getpid())
cv2.imwrite(filename, gray_image)

set path for tesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

get text from image

text = pytesseract.image_to_string(Image.open(filename))

remove temp file

os.remove(filename)

Store the processed image into output folder

text_filepath = "output" + os.path.sep + "{}.txt".format(os.getpid())
with open(file=text_filepath, mode='w+') as file:
    file.write(text)    

Detect the language accourding to the text

lang = detect(text)
print("Language: %s" % lang)

Convert Text to Audio using gtts

audio = gTTS(text=text, lang=lang, slow=False)

Store the audio file to output folder

audio_filepath = "output" + os.path.sep + "{}.mp3".format(os.getpid())
audio.save(audio_filepath)

write Grayimage to output folder

cv2.imwrite(filename, gray_image)

Play the converted audio

playsound(audio_filepath)

Author

Sameer Narkhede
Profile : https://github.com/narkhedesam
Website : https://narkhedesam.github.io/

Donation

If this project help you reduce time to develop, you can give me a cup of coffee ☺️

paypal

pyim2speak's People

Contributors

narkhedesam avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.