GithubHelp home page GithubHelp logo

braineditor / speech_to_text_with_python Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dpbac/speech_to_text_with_python

0.0 0.0 0.0 17.05 MB

The goal of this project is to explore some Python's tools for analysis of audio as well as transcription.

Jupyter Notebook 100.00%

speech_to_text_with_python's Introduction

๐Ÿ—ฃ๏ธ Speech to Text in Python ๐Ÿ“œ

Goals in this project:

  1. Explore some Python's tools used for audio manipulation and transcription from speech to text.

  2. Transcript audio files from telephonic calls and apply on it sentiment and topic analysis.

During tools exploration phase audio files under different conditions are used. Such as:

  • Different languages of speakers (english and dutch)

  • Multiple speaker with multiple channels

  • Presence of noise

Project Organization

โ”‚
โ”œโ”€โ”€ README.md          <- The top-level README for developers using this project.
โ”‚
โ”œโ”€โ”€ data
โ”‚ย ย  โ”œโ”€โ”€ audio_call_friend       <- Audio files downloaded from CallFriend.
โ”‚ย ย  โ”œโ”€โ”€ audio_call_home         <- Audio files downloaded from CallHome.
โ”‚ย ย  โ”œโ”€โ”€ audio_common_voices     <- Files downloaded from Common Voices.
โ”‚   โ”‚   โ””โ”€โ”€ nl                  <- Dutch audio and .tsv files
โ”‚ย ย  โ”‚       โ””โ”€โ”€ raw             <- mp3 files.
โ”‚ย ย  โ”œโ”€โ”€ audio_openslr           <- Audio file downloaded from LibriSpeech
โ”‚   โ”‚    โ””โ”€โ”€ dev-clean          <- Dutch audio and .tsv files
โ”‚ย ย  โ”‚       โ””โ”€โ”€ 1272            <- flac files.    
โ”‚ย ย  โ”œโ”€โ”€ interim        <- Intermediate data that has been transformed.
โ”‚ย ย  โ””โ”€โ”€ processed      <- The final, canonical data sets for modeling.
โ”œโ”€โ”€ notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
โ”‚                         the creator's initials, and a short `-` delimited description, e.g.
โ”‚                        `1.0-jqp-initial-data-exploration`.
โ”‚
โ”œโ”€โ”€ images             <- Images used in the project.
โ”‚
โ”œโ”€โ”€.gitignore          <- Contains entries of files or folders to ignore in a project.
โ”‚
โ””โ”€โ”€ requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
                         generated with `pip freeze > requirements.txt`

The tools explored in this project are:

  • SpeechRecognition: This Python library provides an easy way to interact with many speech-to-text APIs.

  • Google Speech API: From the Speech APIs available to use within SpeechRecognition will be using a free version of Google Speech API. In addition, the free version does not support speaker diarization which is the process of splitting more than one speaker from a single audio. It is also not possible to detect punctuation. It supports different languages. Currently the following limits are applied:

  • PyDub: Allows different types of audio manipulation

Datasets

  • Dutch and English datasets: Common Voice is a initiative from Mozilla that offers open source, multi-language dataset of voices that anyone can use to train speech-enabled applications. At this moment you can have access to 18 different languages. You download not only audio files but also other .tsv files with information about those audio files. Really great resource!

  • LibriSpeech: LibriSpeech is a carefully segmented and aligned corpus of approximately 1000 hours of 16kHz read English speech, derived from read audiobooks. I downloaded only dev-clean.tar.gz.

  • Multiuser audio files: There are two datasets:

Both have some different languages (unfortunatelly not Dutch) and audios are both in wav and mp3. In addition, you can have the transcriptions.

Notebooks

01-Speech Transcription using Speech Recognition and PyDub: Use PyDub to access audio file information and modify audio files before performing transcriptions with SpeechRecognition and Speech Google API.

02-Phone calls Analysis: The goal of this notebook is to transcribe some phone calls and perform sentiment and topic analysis on them. For now we retrieve data from CallFriend and performed some analysis on audio attributes on the audio files retrieved. Therefore, this one is still in development.

Install/Technical requirements

  • conda version: 4.8.3
  • Install requirements using pip install -r requirements.txt.
    • Make sure you use Python 3 (I used 3.6.7).
    • You may want to use a virtual environment for this.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

speech_to_text_with_python's People

Contributors

dpbac avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.