GithubHelp home page GithubHelp logo

flask-whisper-speech-recognition's Introduction

WP4 Analytic: Privacy-Aware Speech Recognition

Actions Status CodeCov LICENSE

The Privacy-Aware Speech Recognition (PSR) model is designed to accurately convert spoken language into written text while also prioritizing privacy. This analytic utilizes computational linguistics to analyze audio signals and generate a verbatim and editable transcription of the spoken content. Importantly, any sensitive information within the audio is anonymized to protect privacy. Additionally, the PSR system allows for the generation of privacy-preserving versions of the original audio. By converting the anonymized text back into speech through text-to-speech translation, an audio output is created that maintains privacy while still conveying the intended message. These privacy-preserving transcriptions and audio can then be securely shared with external services, as they do not disclose any sensitive information.

The Privacy-Aware Speech Recognition system requires an audio sample containing voice as its input data for translation. The analytic is designed to process WAV audio samples with specific requirements, including a sampling rate of 16000Hz, a single channel (mono) representation, and a 16-bit format. If the audio sample is in a different format, a preprocessing step may be necessary to adjust it to meet the input requirements of the analytic. In addition to the audio sample, the analytic also defines a list of textual entities to be anonymized from the audio. These textual entities are predefined and managed by the analytics like "PERSON", "ADDRESS", and "DATE".

  • Speech to Text: We employ the advanced Whisper automatic speech recognition (ASR) model, developed and maintained by OpenAI, to convert spoken language into written text. Whisper stands out as a highly precise and efficient model that harnesses cutting-edge deep learning techniques, specifically the transformer architecture. This state-of-the-art approach has transformed numerous natural language processing (NLP) tasks, including speech recognition. By leveraging the transformer architecture, the Whisper model excels in handling the complexities of speech recognition tasks. It can capture contextual information, recognize patterns, and generate accurate transcriptions by effectively modeling the relationships between different elements of the audio sequence.
  • Named Entity Recognition: is concerned with locating key phrases and nouns in texts as entities, and these entities fall under several categories, i.e., names, locations, and addresses. The sensitivity of these entities depends on the context where the data analysis is applied. For example, names of people and locations are highly sensitive when performing data analysis and processing. However, to protect the privacy of the user, these entities can be removed from the text. Thus, still providing data valid for analysis, but without violating privacy. We used SpaCy deep learning model to perform entity recognition on the text recognized from the previous step. The process for Named Entity Recognition is composed of the below steps:
  • Sentence Segmentation: to split the text into sentences.
  • Tokenization: to split each sentence resulting from the previous step into tokens which are usually numbers, words, and punctuation marks.
  • Tokens Classification: each token is classified according to its part-of-speech (POS). Entity Detection: classifies the word entities according to their type as an address, a time, a location, a name, etc.

Deploying

Privacy-Aware Speech Recognition in a container

The DHT and the Analytics-API containers should be running before starting to build and run the image and container of the Privacy-Aware Speech Recognition.

Privacy-Aware Speech Recognition is intended to run in a docker container on port 5040. The Dockerfile at the root of this repo describes the container. To build and run it execute the following commands:

docker build -t flask-whisper-speech-recognition .

docker-compose up

REST API of Privacy-Aware Speech Recognition

Description of the REST endpoint available while Privacy-Aware Speech Recognition is running.


GET /whisper

Description: Returns the classification of a series of 48 items of temperature data whether they are anomalous or not.

Command:

curl -F "file=@file_location" http://localhost:5040/whisper/<file_name.wav>/<requestor_id>/<requestor_type>/<request_id>


License

Released under the MIT License.

Acknowledgements

This software has been developed in the scope of the H2020 project SIFIS-Home with GA n. 952652.

flask-whisper-speech-recognition's People

Contributors

wisamabbasi avatar

Watchers

Luca Barbato avatar Luca Ardito avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.