GithubHelp home page GithubHelp logo

satvikvarshney / pdfsuno Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.49 MB

Converts English PDFs/Ebooks into Hindi audiobooks using NLP, TTS, Cloud technologies, enhancing content accessibility for educational purposes in underprivileged areas with poor education rates

Python 44.23% Jupyter Notebook 55.77%

pdfsuno's Introduction

PDF Suno

PDF Suno is an under-progress project aimed at converting English PDFs and Ebooks into Hindi audiobooks. This initiative targets the underprivileged and under-educated populations in India who lack access to reading and writing due to educational deficiencies and language barriers. The project facilitates access to information in an audio format, making it comprehensible for those who previously found it challenging due to their limitations.

Overview

PDF Suno transforms the way we engage with documents by enabling seamless conversion of textual content into audio format. The application leverages cutting-edge AI technologies such as AWS Textract, Translate, and Polly, to extract, translate, and synthesize text from PDFs into high-quality Hindi audiobooks. The development strategy involves utilizing pre-existing training datasets, integrating TTS cloud services, and employing translation services to streamline the process cost-effectively and efficiently. This tool aims to empower Hindi-speaking audiences, especially those from underprivileged backgrounds, by providing access to educational and informational content in a user-friendly audio format.

Sample Demonstration Overview

Below is an audio demonstration of PDF Suno's current capabilities. The audio sample is based on the PDF uploaded in the Sample Data folder:

Sample_Audiobook.mp4

If you encounter issues with playback, you can download the file in .mp3 format here: Download and play sample audio

Key Features

Current Features:

  1. PDF Document Upload: PDF Suno allows for secure uploading of PDF files, enabling users to handle multiple documents at once. This feature broadens the application's context and ensures accurate responses.

  2. Efficient Text Extraction: Utilizing AWS Textract, PDF Suno extracts text from uploaded PDFs efficiently, laying the groundwork for informed translations and audio synthesis.

  3. Text Translation: Leveraging AWS Translate, PDF Suno translates extracted text from English to Hindi, catering to users who prefer or require Hindi content.

  4. Audio Synthesis: Using AWS Polly, PDF Suno synthesizes Hindi text into high-quality audio, creating an accessible and convenient audiobook format.

Planned Enhancements

  1. Migration to GCP: Reviews indicate that Google Cloud Platform offers superior text extraction, translation, and TTS compared to AWS. This migration could enhance quality in these areas.

  2. Expanded Language Support: Future updates will include additional languages, broadening the application's accessibility.

  3. Enhanced Audio Quality: Improvements will focus on enhancing the quality and naturalness of synthesized audio, offering a superior listening experience. Emphasis may be placed on creating local TTS software for Hindi.

  4. User Interface Improvements: The goal is to enhance the application's user interface for a more intuitive and seamless experience.

  5. Improved Extraction: Local extraction, including OCR, will provide a more cost-effective solution.

How It Works

  1. Document Upload: Users upload their PDF documents directly through the application interface. PDF Suno accepts multiple documents simultaneously, enriching the contextual processing.

  2. Text Extraction: Once uploaded, the documents are processed to extract text using AWS Textract.

  3. Text Translation: The extracted text is then translated into Hindi using AWS Translate.

  4. Audio Synthesis: Finally, the translated text is converted into audio using AWS Polly, creating a Hindi audiobook.

Technologies Used

  1. Boto3: For interacting with AWS services such as Textract, Translate, and Polly.
  2. Python: The core programming language used for the application logic.
  3. AWS Textract: Utilized for the extraction of text from PDF documents.
  4. AWS Translate: Used for translating text from English to Hindi.
  5. AWS Polly: Powers the text-to-speech synthesis for creating audiobooks.

Getting Started

Installation

Clone the repository to get started:

git clone https://github.com/SatvikVarshney/PDFSuno.git

Navigate to the project directory:

cd PDF-Suno

Install the required Python packages:

pip install -r requirements.txt

To run PDF Suno on your local machine, clone the repository and ensure you have the necessary AWS credentials configured. Use the following command to launch the application:

python pdf_suno.py

pdfsuno's People

Contributors

satvikvarshney avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.