joll59 / d-ser-t Goto Github PK

d-ser-t quantifies speech recognition accuracy of the MSFT speech service and/or user created MSFT custom speech service models.

TypeScript 99.93% JavaScript 0.07%

speech-to-text sentence-error-rate word-error-rate

d-ser-t's People

Contributors

Stargazers

Watchers

Forkers

nairmai codefoster

d-ser-t's Issues

Document global install of d-ser-t-cli

if you install d-ser-t-cli globally, via npm install -g d-ser-t-cli, you will be able to call d-ser-t and pass it the flags.

Consider calculating SER based only on significant words

Output from STT will ultimately be passed to LUIS. LUIS can extract intents and entities perfectly well when minor mistakes occur in transcriptions. For example:

I've is transcribed as I or I have
you is transcribed as ya

Down the line we might consider evaluating sentence error rate (or a new naming/variation of error rate) based only on words that are significant to LUIS, and ignoring minor errors in prepositions, etc.

Blank response when streaming files through but when singular file is sent transcription is received

Describe the bug
When utilizing the multiple file transcription method in the service, you sometime receive empty response string, but if the same file is sent through without using the multiple file transcription methods you get a string response back.

To Reproduce
Steps to reproduce the behavior:

Start with a collection of audio files
Utilize the multiple file transcription methods/classes to transcribe the audio files
Error shows up in the form of empty response strings from the service.

Expected behavior
Files are transcribed equally whether via streaming or singular file transcription.

Desktop (please complete the following information):

OS: Win 10
Version
- Node 12.2.0
- package
  - d-ser-t-service: 1.2.0

Additional context
Possibility exists this is a service issue.

XmlWriterService and TranscriptionFileService not showing up in code coverage report

As referred to in #77 , when you run npm run test from the top-level, there is no code coverage report for XmlWriterService or TranscriptionFileService, even though there are tests written for both of these:

Right now in d-ser-t-service/jest.json there is:

"collectCoverageFrom": [
     "<rootDir>/**/src/**/*[!.test].ts"
 ]

Resolve typing issue in main.start()

In main.start(), after service.batchTranscribe().then(), results is currently cast as any:

const results: any = service.resultArray.map( . . . )

We should resolve this typing issue or decide it's truly necessary.

Word error rate should have 3 significant figures

Word error rate should be rounded to 3 significant figures. Probably done by:

number.toPrecision(3)

Note the proper bit rate for audio files

Audio files that are properly formatted should be mono and have a project rate of 16,000 Hz. The bit rate (which can be seen in File Explorer if you right click on a folder's title row and check it as an option) should be 256kbps.

Creating the issue since I ran into problems where the audio appeared to be mono, but really wasn't. Looking at the bit rate is a nice sanity check.

Consider making TranscriptAnalyzer a static class

Cleaning actual transcriptions produces the same output as cleaning expected transcriptions.

Description:

The expected transcription currently:
- is lowercase.
- contains apostrophes in words that are contractions.
- replaces "okay" with "ok".
- replaces hyphens with a space.
- errors out if any other special characters are present.
- allows digits (which we may want to change since we expect lexical responses from the STT service).

Acceptance criteria:

Determine what special characters are returned from the STT service, and identify any patterns.
Account for these patterns when cleaning the actual transcription.
The actual transcription should match the expected transcription. Right now it:
- is lowercase.
- is lexical (e.g. the number 2 is returned as "two").

Update ReadMe to latest bits

Currently readme has vestigial data and needs to be updated.

Readme is inconsistent in how you start the package. cli.js vs main.js
Update the naming convention to match the package conventions. ( e.g: AUDIO_FOLDER_PATH to audio-directory )

Consider the need for handleResponse function

handleResponse is currently just a wrapper for calculateWER it is not clear if it's necessary, might be creating bloat.

validate audio file extension

Audio File(s) passed to the TranscriptionService should be .wav files, currently the d-ser-t does not validate file compression / extension before processing.

Expected: When an unsupported Audio File is received, an error surfaces reporting on the error.

[Bug] Concurrent calls exceeding 5, results become inconsistent

Describe the bug
Setting the concurrent flag to high value exceeding 5, results become inconsistent.
Less transcriptions returned, service hangs.

To Reproduce
Steps to reproduce the behavior:

Use service
Set concurrency to a value > 5
Service behaves inconsistently

Expected behavior
High concurrency results in faster response and service continues to work as expected.

** Actual behavior**
Service sometimes breaks. Results are missing.

Desktop (please complete the following information):

OS: Windows
Version
- Node -12.2.0
- package - d-ser-t-service-1.1.3-alpha.2

Verify CLI arguments

In main.start(), at minimum we should verify:

endpointID
serviceRegion
subscriptionKey

Refactor textfile parsing to be unit testable

Separation of concerns: Logic to verify path are valid and logic for split string in .txt file should exist as stand alone not tightly coupled.

parseTextFile should be split into 2 separate units, retrieveDataFromFile & splitDataIntoTestUnits, code coverage will focus on testing splitDataIntoTestUnits.

Analyze actual transcription and save output that is currently unhandled

The STT service occasionally has unexpected behavior. For example:

Spoken okay is sometimes transcribed as OK.
Spoken I've is sometimes capitalized.
Spoken products will sometimes contain hyphens.

Any STT output that contains unexpected characters should be logged and saved to a file that we will be added to as testing continues. Hopefully we can learn from this file (remove the "black box" aspect), update our code accordingly, and eventually not rely on this function.

Match audio files with transciption than the vice versa

If only a handful of audio wav files are provided to the tool, we'll see an issue like below:

(node:20560) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'replace' of undefined
at TranscriptionAnalysisService.cleanExpectedTranscription (C:\SBUX\d-ser-t\lib\TranscriptionAnalysisService.js:31:14)
at Object.exports.start (C:\SBUX\d-ser-t\lib\main.js:40:48)
at Object. (C:\SBUX\d-ser-t\lib\main.js:74:13)
at Module._compile (internal/modules/cjs/loader.js:774:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)
at Module.load (internal/modules/cjs/loader.js:641:32)
at Function.Module._load (internal/modules/cjs/loader.js:556:12)
at Function.Module.runMain (internal/modules/cjs/loader.js:837:10)
at internal/main/run_main_module.js:17:11

The reason being is tool looks at the transcription file and tries to find corresponding audio file and when it doesn't, it breaks.

The work should go the other way around to avoid issues like this.

JSON results should include expected Transcription

Currently saved responses, is only tracking actualTranscription and wordErrorRate;
should be reporting on:
actualTranscription
expectedTranscription
wordErrorRate

Transcriptions with false negatives should not be graded

Here is an example of an utterance that is a false negative:

expectedTranscription: "all right that's one milkshake"
actualTranscription: "all right that's one milk_shake"

The important pieces of this utterance have clearly been picked up correctly by the speech service, but grading as it currently stands doesn't take into account these "false negatives."

Each transcription object should have something similar to a falseNegative field. Responses that are false negatives should not be taken into account when grading.

refactor TranscriptionService Class for Testability

Current TranscriptionService class as it stands is hard to unit test. Unit Test is required to move forward.

2 incomplete results from a single .wav file

Transcribed a single .wav file whose expected transcription was stored in transcription.txt.
In test_results.json there were 2 results:
- Actual Result 1:
  - ~60% of the expected transcription.
  - Succeeded by ~0.5 seconds of silence.
- Actual Result 2:
  - Picked up exactly where AR1 left off.
  - ~5% of the expected transcription.
  - Succeeded by no silence.
Even with the 2 results, the remaining ~35% of the .wav file was not transcribed.

Any single .wav file should produce exactly one result.

successRate and avgUtteranceErrorRate should be measured on the same scale

Right now successRate is measured on a scale of 0 to 100, and avgUtteranceErrorRate is measured on a scale of 0 to 1. Both metrics should use the same scale, and probably from 0 to 1.

Bug : setting the wrong concurrency number crashes the tool

Only 4 files and concurrency of 17 causes a crash. Please see below. Setting the concurrency to 2 resolved the problem.

node lib\main.js -s xxxxxxxx -r westus2 -e yyyyyyyy -c 17 -d C:\SBUX\d-ser-t\audio -t "C:\SBUX\ramsay-utterances\transcripts\add-to-order-manual-text-utterances.txt"
Starting Recognizer 0 . . .
Starting Recognizer 1 . . .
Starting Recognizer 2 . . .
Starting Recognizer 3 . . .

ENCOUNTERED AN ERROR ####:

TypeError: Cannot read property 'recording' of undefined
at C:\SBUX\d-ser-t\lib\TranscriptionService.js:83:62
at new Promise ()
at TranscriptionService.internalRecognizer (C:\SBUX\d-ser-t\lib\TranscriptionService.js:56:20)
at C:\SBUX\d-ser-t\lib\TranscriptionService.js:111:101
at Array.map ()
at TranscriptionService.batchTranscribe (C:\SBUX\d-ser-t\lib\TranscriptionService.js:111:45)
at Object.exports.start (C:\SBUX\d-ser-t\lib\main.js:42:23)
at Object. (C:\SBUX\d-ser-t\lib\main.js:74:13)
at Module._compile (internal/modules/cjs/loader.js:774:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)

Force the node version to 12.4

We will get strange issues if one is running node version lower than 12.4. Let's figure out how we can enforce it.

Create an interface to work with any number of file types.

With the intro of #77, d-ser-t now supports outputting JSON or XML. XmlWriterService closely mirrors TranscriptionFileService, and is called in the same way as well.

As a design enhancement, we should create an interface to be used by this.localFileService in main.ts. The interface and design of these two files should work in a way that calling a single function will correctly route the work to either the XML or JSON writer.

TranscriptionAnalyzerCI does not factor in the usage of the same word twice in an utterance when running cleanActualTranscription

successRate should be in the range of 0-100

This issue presented itself at the same time as #11.

avgUtteranceErrorRate was correct, but the successRate from test_results.json was -100.

Validate transcription.txt early so that we fail fast

We should fail as fast as possible. To do so, we need to change a couple of things:

Factor the validation and cleaning components from validateExpectedTranscription() into separate functions - validateExpectedTranscription() and cleanExpectedTranscription().
Call validateExpectedTranscription() as soon as possible in the code so that we can quickly fail.

Any work done more than once should be a global/instance variable

For example:

Classes and services
RegEx

need to handle non-windows file paths

Update package.json to allow for publishing the tool with proper versioning

To unblock the CI setup, I have temporarily published a 1.0.0 version of your tool here:
https://www.npmjs.com/package/speech-recognition-testing

However, we should update the package.json to allow for easy publishing of new versions upon new bug fixes etc.

Remove work for unhandledSTTOutput

We originally generated unhandledSTTOutput.json to learn which special characters were returned from the speech service with our current configuration (e.g. we use the lexical response from NBest).

We no longer have questions in this arena, and this feature will only confuse future d-ser-t consumers, so we should nix all work having to do with unhandledSTTOutput. This includes any logging.

Convert d-ser-t to a Mocha test

d-ser-t needs to be converted into a mocha test so it can easier be integrated in to a CI pipeline.

If the accuracy is less than a certain threshold, the test should fail. The issue is currently is the tool outputting the results to the console, but you'd need post processing of the data with another script before you can mark the pipeline as successful.

This inefficiency can be avoided if we simply use a Mocha test.

eliminate log statements in package.

Currently package has console.logs which should be eliminated.

Consider whether or not we should close the recognizer

Derived from a comment in TranscriptionService.internalRecognizer() on recognizer.recognized():

// Should I be closing the recognizer?
// recognizer.close();
// Should this be resolving?

Add KER metric functionality

Add the option for an additional metric, KER (Keyword Error Rate). The input would be a collection of words (not phrases) important to a use case or part of a domain-specific vocabulary, and the output would be a WER measurement taken only from these keywords.

We may not get around to implementing this, but dropping the issue so I don't forget 😀

[Proposal] Standardize Commit Messages.

To standardize commit messages across the board;
Propose adopting Commitzen or git-cz for commit messages.

Move literals into a config file

Literals such as certain file paths and regular expressions should be moved to a config file so that adding or removing a character won't require a recompile of the application.

joll59 / d-ser-t Goto Github PK

d-ser-t's People

Contributors

Stargazers

Watchers

Forkers

d-ser-t's Issues

ENCOUNTERED AN ERROR ####:

Recommend Projects

Recommend Topics

Recommend Org

Jobs