GithubHelp home page GithubHelp logo

venkatarangan / python-tamil-samples Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 853 KB

Quick samples in Python for handling Tamil

Home Page: https://venkatarangan.com/blog/2019/09/python-and-google-cloud-vision-for-tamil-text/

License: MIT License

Python 100.00%
ocr-python python speech-to-text tamil-language text-to-speech

python-tamil-samples's Introduction

python-tamil-samples

Overview

A set of quick samples that I wrote for the purpose of experimentation of using Python language on handling Tamil text. These were written around 2019.

Details are in the following blog posts:

  1. Python and Google Cloud Vision for Tamil text [https://venkatarangan.com/blog/2019/09/python-and-google-cloud-vision-for-tamil-text/]

  2. Tools and Applications in Tamil [https://venkatarangan.com/blog/2019/09/tools-applications-available-for-tamil/]

  3. Python code snippets for Speech in Tamil [https://venkatarangan.com/blog/2019/08/python-code-snippets-for-speech-in-tamil/]

Cover image

Disclaimer

These are simple examples and trials. Please treat them as as such. Some might be useful too, but that's an unintended consequence.

Recent Updates

While randomly checking my GitHub account on 25 May 2024, I came across this repository and noticed it lacked basic context and comments, I have added the same and made the repository public.

License

This project is licensed under the MIT License.

Contributions

This project was an experiment and is not maintained. Just take it as it is. Thanks.

Author

Venkatarangan Thirumalai venkatarangan.com

Samples available here:

Speech to Text:

I got the base code from here. You need to install SpeechRecognition package, available through PIP and PyAudio (available through PIP in Linux, but on Windows, you need to install appropriate package from here.

Text to Speech:

In 2019, when I posted the speech to text code, I got a reader request for a code that does the reverse – to speak out loud a sentence of given Tamil text. I got the base code from here. You need to install gTTS package, available through PIP. The code uses os.system to play the output audio file, this command works out of the box in Windows, in Linux you may need a command-line player like MPG321 (which can be installed using sudo apt-get install mpg321) You can listen to the output audio that’s produced by the code in the sample output MP3 file titled: output-tamil-audio.mp3.

The above were possible, thanks, to the numerous readymade packages that are available for free, and, the magic of cloud – in these two cases I am using Google Cloud, which required no configuration or key for trial runs. You may use Bing if you have an Azure API key.

Vision and Translate

This is a Quick 'n' Dirty code snippet that uses Google Cloud Vision OCR to extract the Tamil text from a given image containing Tamil text, then uses Google Cloud Translate to translate the Tamil text to English. The text parsing needs a lot more work to be better.

Tamil Text present in the image file as recognized by the Cloud Vision OCR:

புதிய இடத்தில் அதெல்லாம் பலிக்குமா?
பெரியவர் ஆறுமுகத்தோடு மல்லிகா நாகப்பட்டினத்
தில் கப்பலேறியபோது வழியனுப்ப வந்தவர்கள் வாய்
வார்த்தையின்றி அழுதழுது கண்ணைக் கடலாக்கிக்
கொண்டார்கள்.
அவர்களை அவள்தான் தேற்ற வேண்டியிருந்தது
"மாமாவின் கடைசி ஆசையை நான் நிறைவேற்ற
வேண்டாமா? அந்த ஆத்மாவுக்குச் சாந்தி கிடைக்க
வேண்டாமா? கோடை விடுமுறையில் தானே போகிறேன்.

The output English translation:

Will it be sacrificed in the new place? When Mallika sailed to Nagapattinam with the eldest, the passersby cried without a word. "Do I not fulfill my uncle's last wish? Do I not find peace with that soul? I am going on summer vacation.

Tamil Letters Count

This project uses the Grapheme package, which is a wonderful package to perform string operations on language recognized characters.

In 2004, when I faced the issue of counting of number of characters for a given Tamil string, the standard library functions were not useful for non-latin scripts, so I had to write a paper for a conference and submitted code written in .NET, Perl and VB.NET to solve this problem, specific for Tamil. Now, Grapheme makes it super easy and is updated upto Unicode 12.0.0 standard.

python-tamil-samples's People

Contributors

venkatarangan avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.