Goals in this project:
-
Explore some Python's tools used for audio manipulation and transcription from speech to text.
-
Transcript audio files from telephonic calls and apply on it sentiment and topic analysis.
During tools exploration phase audio files under different conditions are used. Such as:
-
Different languages of speakers (english and dutch)
-
Multiple speaker with multiple channels
-
Presence of noise
โ
โโโ README.md <- The top-level README for developers using this project.
โ
โโโ data
โย ย โโโ audio_call_friend <- Audio files downloaded from CallFriend.
โย ย โโโ audio_call_home <- Audio files downloaded from CallHome.
โย ย โโโ audio_common_voices <- Files downloaded from Common Voices.
โ โ โโโ nl <- Dutch audio and .tsv files
โย ย โ โโโ raw <- mp3 files.
โย ย โโโ audio_openslr <- Audio file downloaded from LibriSpeech
โ โ โโโ dev-clean <- Dutch audio and .tsv files
โย ย โ โโโ 1272 <- flac files.
โย ย โโโ interim <- Intermediate data that has been transformed.
โย ย โโโ processed <- The final, canonical data sets for modeling.
โโโ notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
โ the creator's initials, and a short `-` delimited description, e.g.
โ `1.0-jqp-initial-data-exploration`.
โ
โโโ images <- Images used in the project.
โ
โโโ.gitignore <- Contains entries of files or folders to ignore in a project.
โ
โโโ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
generated with `pip freeze > requirements.txt`
The tools explored in this project are:
-
SpeechRecognition
: This Python library provides an easy way to interact with many speech-to-text APIs. -
Google Speech API
: From the Speech APIs available to use within SpeechRecognition will be using a free version ofGoogle Speech API
. In addition, the free version does not supportspeaker diarization
which is the process of splitting more than one speaker from a single audio. It is also not possible to detect punctuation. It supports different languages. Currently the following limits are applied:
PyDub
: Allows different types of audio manipulation
-
Dutch and English datasets
:Common Voice
is a initiative from Mozilla that offers open source, multi-language dataset of voices that anyone can use to train speech-enabled applications. At this moment you can have access to 18 different languages. You download not only audio files but also other .tsv files with information about those audio files. Really great resource! -
LibriSpeech
:LibriSpeech
is a carefully segmented and aligned corpus of approximately 1000 hours of 16kHz read English speech, derived from read audiobooks. I downloaded onlydev-clean.tar.gz
. -
Multiuser audio files
: There are two datasets:
Both have some different languages (unfortunatelly not Dutch) and audios are both in wav
and mp3
. In addition, you can have the
transcriptions.
01-Speech Transcription using Speech Recognition and PyDub:
Use PyDub
to access audio file information and modify audio files before performing transcriptions with SpeechRecognition
and Speech Google API
.
02-Phone calls Analysis:
The goal of this notebook is to transcribe some phone calls and perform sentiment and topic analysis on them. For now we retrieve data from
CallFriend
and performed some analysis on audio attributes on the audio files retrieved. Therefore, this one is still in development.
- conda version: 4.8.3
- Install requirements using
pip install -r requirements.txt
.- Make sure you use Python 3 (I used 3.6.7).
- You may want to use a virtual environment for this.
Project based on the cookiecutter data science project template. #cookiecutterdatascience