GithubHelp home page GithubHelp logo

test-mass-forker-org-1 / cognitive-custom-speech-service Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/cognitive-custom-speech-service

0.0 0.0 0.0 5.6 MB

Code and Sample for Microsoft's Custom Speech Service

cognitive-custom-speech-service's Introduction

Cognitive Speech Service samples

Here you can find a set of audio data and language data for testing the Microsoft Cognitive Services Custom Speech service.

How it works?

The samples are a collection of data, both acoustic and language, which you can use to try out the Custom Speech service. The audio files (.wav) and transcript files should help you to get started with the service.

Simply clone this repository with

git clone https://github.com/Microsoft/Cognitive-Custom-Speech-Service.git

on the comandline.

Then go to the folder of the sample you like

cd <sample_folder>

If you have Powershell 5.0 installed you can use in Powershell the Compress-Archive cmdlet

Compress-Archive -Path '<sample_folder>\Test Data\Test Acoustic Data\*.wav' -DestinationPath data.zip

This will create a zip file containing all wav files without additional folder information as a flat list.

If you do not have Powershell installed you can use a zip tool of your choice but you have to zip the files as a flat list without path information.

Now, you have everthing prepared to get started with Custom Speech service.

There is a help page describing how to use the service or you can find some help here.

Biology Sample

Introduction

The biology sample is a great example of how powerful a custom language model can be for your application.

Data

The sample consist of approximately 150 statements about organisms that lived primarily in Triassic and Jurassic period. The names are in scientific form which rarely appear in everyday language. The facts are described in the ‘Language Data.txt’ file. In addition, the sample contains approximately 50 audio utterances (50 .wav files) which can be used to test the above model using the offline testing feature during deployment. There is a transcriptions.tsv file associated with those audio files that captures the transcriptions of each one of them. If you open the file you will see that it contains key value pairs where the name of the file is the key and the transcription the actual value. You can .zip those audio files and import them along with the associated transcription file using the ‘Import Acoustic Data’ menu option.

Modelling

You can use the ‘Language Data.txt’ file to create a language model on Custom Speech Service and deploy it along with an acoustic baseline model of the two available. Simply upload the file using the ‘Import Language Data’ and create a new Language model via the ‘Create Language Model’ menu option. Once you have created a language model you can deploy it via the ‘Deployments’ menu option. During deployment of your custom model you can tick the ‘Offline Test’ option. Point to the acoustic data you uploaded in the previous step. The model deployment process will start and once deployed the audio utterances will be tested against the newly deployed endpoint. The audio utterances will be send for transcription and the results will be displayed.

Results

As part of the results page you will see a set of numbers along with a list of transcriptions (as many as the audio files you uploaded). Read the transcriptions to get a feel of the quality of the results. The numbers serve as a comparison between the baseline model and the custom model that you created. They demonstrate transcription accuracy both from the baseline and the custom model. Remember that the transcriptions.tsv contains the correct transcriptions of all the audio files, so internally our system compares those against both the baseline and the custom models. The improvement over baseline for this sample is typically around 70% and the overall accuracy of the custom is close to 90%. Needless to say that with a richer language model accuracy can potentially go up.

cognitive-custom-speech-service's People

Contributors

panosperiorellis avatar marcmic avatar microsoftopensource avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.