GithubHelp home page GithubHelp logo

joll59 / d-ser-t Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 2.0 2.88 MB

d-ser-t quantifies speech recognition accuracy of the MSFT speech service and/or user created MSFT custom speech service models.

TypeScript 99.93% JavaScript 0.07%
speech-to-text sentence-error-rate word-error-rate

d-ser-t's People

Contributors

dependabot[bot] avatar hobbyprojects avatar jcorrigan14 avatar joll59 avatar katieprochilo avatar nairmai avatar ovishesh avatar zanawar avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

d-ser-t's Issues

Consider calculating SER based only on significant words

Output from STT will ultimately be passed to LUIS. LUIS can extract intents and entities perfectly well when minor mistakes occur in transcriptions. For example:

  • I've is transcribed as I or I have
  • you is transcribed as ya

Down the line we might consider evaluating sentence error rate (or a new naming/variation of error rate) based only on words that are significant to LUIS, and ignoring minor errors in prepositions, etc.

Blank response when streaming files through but when singular file is sent transcription is received

Describe the bug
When utilizing the multiple file transcription method in the service, you sometime receive empty response string, but if the same file is sent through without using the multiple file transcription methods you get a string response back.

To Reproduce
Steps to reproduce the behavior:

  1. Start with a collection of audio files
  2. Utilize the multiple file transcription methods/classes to transcribe the audio files
  3. Error shows up in the form of empty response strings from the service.

Expected behavior
Files are transcribed equally whether via streaming or singular file transcription.

Desktop (please complete the following information):

  • OS: Win 10
  • Version
    • Node 12.2.0
    • package
      • d-ser-t-service: 1.2.0

Additional context
Possibility exists this is a service issue.

Resolve typing issue in main.start()

In main.start(), after service.batchTranscribe().then(), results is currently cast as any:

const results: any = service.resultArray.map( . . . )

We should resolve this typing issue or decide it's truly necessary.

Note the proper bit rate for audio files

Audio files that are properly formatted should be mono and have a project rate of 16,000 Hz. The bit rate (which can be seen in File Explorer if you right click on a folder's title row and check it as an option) should be 256kbps.

image

Creating the issue since I ran into problems where the audio appeared to be mono, but really wasn't. Looking at the bit rate is a nice sanity check.

Cleaning actual transcriptions produces the same output as cleaning expected transcriptions.

Description:

  • The expected transcription currently:
    • is lowercase.
    • contains apostrophes in words that are contractions.
    • replaces "okay" with "ok".
    • replaces hyphens with a space.
    • errors out if any other special characters are present.
    • allows digits (which we may want to change since we expect lexical responses from the STT service).

Acceptance criteria:

  • Determine what special characters are returned from the STT service, and identify any patterns.
  • Account for these patterns when cleaning the actual transcription.
  • The actual transcription should match the expected transcription. Right now it:
    • is lowercase.
    • is lexical (e.g. the number 2 is returned as "two").

Update ReadMe to latest bits

Currently readme has vestigial data and needs to be updated.

  • Readme is inconsistent in how you start the package. cli.js vs main.js
  • Update the naming convention to match the package conventions. ( e.g: AUDIO_FOLDER_PATH to audio-directory )

validate audio file extension

Audio File(s) passed to the TranscriptionService should be .wav files, currently the d-ser-t does not validate file compression / extension before processing.

Expected: When an unsupported Audio File is received, an error surfaces reporting on the error.

[Bug] Concurrent calls exceeding 5, results become inconsistent

Describe the bug
Setting the concurrent flag to high value exceeding 5, results become inconsistent.
Less transcriptions returned, service hangs.

To Reproduce
Steps to reproduce the behavior:

  1. Use service
  2. Set concurrency to a value > 5
  3. Service behaves inconsistently

Expected behavior
High concurrency results in faster response and service continues to work as expected.

** Actual behavior**
Service sometimes breaks. Results are missing.

Desktop (please complete the following information):

  • OS: Windows
  • Version
    • Node -12.2.0
    • package - d-ser-t-service-1.1.3-alpha.2

Verify CLI arguments

In main.start(), at minimum we should verify:

  • endpointID
  • serviceRegion
  • subscriptionKey

Refactor textfile parsing to be unit testable

Separation of concerns: Logic to verify path are valid and logic for split string in .txt file should exist as stand alone not tightly coupled.

  1. parseTextFile should be split into 2 separate units, retrieveDataFromFile & splitDataIntoTestUnits, code coverage will focus on testing splitDataIntoTestUnits.

Analyze actual transcription and save output that is currently unhandled

The STT service occasionally has unexpected behavior. For example:

  • Spoken okay is sometimes transcribed as OK.
  • Spoken I've is sometimes capitalized.
  • Spoken products will sometimes contain hyphens.

Any STT output that contains unexpected characters should be logged and saved to a file that we will be added to as testing continues. Hopefully we can learn from this file (remove the "black box" aspect), update our code accordingly, and eventually not rely on this function.

Match audio files with transciption than the vice versa

If only a handful of audio wav files are provided to the tool, we'll see an issue like below:

(node:20560) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'replace' of undefined
at TranscriptionAnalysisService.cleanExpectedTranscription (C:\SBUX\d-ser-t\lib\TranscriptionAnalysisService.js:31:14)
at Object.exports.start (C:\SBUX\d-ser-t\lib\main.js:40:48)
at Object. (C:\SBUX\d-ser-t\lib\main.js:74:13)
at Module._compile (internal/modules/cjs/loader.js:774:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)
at Module.load (internal/modules/cjs/loader.js:641:32)
at Function.Module._load (internal/modules/cjs/loader.js:556:12)
at Function.Module.runMain (internal/modules/cjs/loader.js:837:10)
at internal/main/run_main_module.js:17:11

The reason being is tool looks at the transcription file and tries to find corresponding audio file and when it doesn't, it breaks.

The work should go the other way around to avoid issues like this.

Transcriptions with false negatives should not be graded

Here is an example of an utterance that is a false negative:

  • expectedTranscription: "all right that's one milkshake"
  • actualTranscription: "all right that's one milk_shake"

The important pieces of this utterance have clearly been picked up correctly by the speech service, but grading as it currently stands doesn't take into account these "false negatives."

Each transcription object should have something similar to a falseNegative field. Responses that are false negatives should not be taken into account when grading.

2 incomplete results from a single .wav file

  • Transcribed a single .wav file whose expected transcription was stored in transcription.txt.
  • In test_results.json there were 2 results:
    • Actual Result 1:
      • ~60% of the expected transcription.
      • Succeeded by ~0.5 seconds of silence.
    • Actual Result 2:
      • Picked up exactly where AR1 left off.
      • ~5% of the expected transcription.
      • Succeeded by no silence.
  • Even with the 2 results, the remaining ~35% of the .wav file was not transcribed.

Any single .wav file should produce exactly one result.

Bug : setting the wrong concurrency number crashes the tool

Only 4 files and concurrency of 17 causes a crash. Please see below. Setting the concurrency to 2 resolved the problem.

node lib\main.js -s xxxxxxxx -r westus2 -e yyyyyyyy -c 17 -d C:\SBUX\d-ser-t\audio -t "C:\SBUX\ramsay-utterances\transcripts\add-to-order-manual-text-utterances.txt"
Starting Recognizer 0 . . .
Starting Recognizer 1 . . .
Starting Recognizer 2 . . .
Starting Recognizer 3 . . .

ENCOUNTERED AN ERROR ####:

TypeError: Cannot read property 'recording' of undefined
at C:\SBUX\d-ser-t\lib\TranscriptionService.js:83:62
at new Promise ()
at TranscriptionService.internalRecognizer (C:\SBUX\d-ser-t\lib\TranscriptionService.js:56:20)
at C:\SBUX\d-ser-t\lib\TranscriptionService.js:111:101
at Array.map ()
at TranscriptionService.batchTranscribe (C:\SBUX\d-ser-t\lib\TranscriptionService.js:111:45)
at Object.exports.start (C:\SBUX\d-ser-t\lib\main.js:42:23)
at Object. (C:\SBUX\d-ser-t\lib\main.js:74:13)
at Module._compile (internal/modules/cjs/loader.js:774:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)

Create an interface to work with any number of file types.

With the intro of #77, d-ser-t now supports outputting JSON or XML. XmlWriterService closely mirrors TranscriptionFileService, and is called in the same way as well.

As a design enhancement, we should create an interface to be used by this.localFileService in main.ts. The interface and design of these two files should work in a way that calling a single function will correctly route the work to either the XML or JSON writer.

Validate transcription.txt early so that we fail fast

We should fail as fast as possible. To do so, we need to change a couple of things:

  1. Factor the validation and cleaning components from validateExpectedTranscription() into separate functions - validateExpectedTranscription() and cleanExpectedTranscription().
  2. Call validateExpectedTranscription() as soon as possible in the code so that we can quickly fail.

Remove work for unhandledSTTOutput

We originally generated unhandledSTTOutput.json to learn which special characters were returned from the speech service with our current configuration (e.g. we use the lexical response from NBest).

We no longer have questions in this arena, and this feature will only confuse future d-ser-t consumers, so we should nix all work having to do with unhandledSTTOutput. This includes any logging.

Convert d-ser-t to a Mocha test

d-ser-t needs to be converted into a mocha test so it can easier be integrated in to a CI pipeline.

If the accuracy is less than a certain threshold, the test should fail. The issue is currently is the tool outputting the results to the console, but you'd need post processing of the data with another script before you can mark the pipeline as successful.

This inefficiency can be avoided if we simply use a Mocha test.

Add KER metric functionality

Add the option for an additional metric, KER (Keyword Error Rate). The input would be a collection of words (not phrases) important to a use case or part of a domain-specific vocabulary, and the output would be a WER measurement taken only from these keywords.

We may not get around to implementing this, but dropping the issue so I don't forget ๐Ÿ˜€

Move literals into a config file

Literals such as certain file paths and regular expressions should be moved to a config file so that adding or removing a character won't require a recompile of the application.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.