cmusphinx / sphinx4 Goto Github PK
View Code? Open in Web Editor NEWPure Java speech recognition library
Home Page: cmusphinx.sourceforge.net
License: Other
Pure Java speech recognition library
Home Page: cmusphinx.sourceforge.net
License: Other
Sphinx-4 Speech Recognition System ------------------------------------------------------------------- Sphinx-4 is a state-of-the-art, speaker-independent, continuous speech recognition system written entirely in the Java programming language. It was created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP), with contributions from the University of California at Santa Cruz (UCSC) and the Massachusetts Institute of Technology (MIT). The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a "research-ready" system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source under a very generous BSD-style license. Because it is written entirely in the Java programming language, Sphinx-4 can run on a variety of platforms without requiring any special compilation or changes. We've tested Sphinx-4 on the following platforms with success. To get started with sphinx4 visit our wiki http://cmusphinx.sourceforge.net/wiki Please give Sphinx-4 a try and post your questions, comments, and feedback to one of the CMU Sphinx Forums: http://sourceforge.net/p/cmusphinx/discussion/sphinx4 We can also be reached at [email protected]. Sincerely, The Sphinx-4 Team: (in alph. order) Evandro Gouvea, CMU (developer and speech advisor) Peter Gorniak, MIT (developer) Philip Kwok, Sun Labs (developer) Paul Lamere, Sun Labs (design/technical lead) Beth Logan, HP (speech advisor) Pedro Moreno, Google (speech advisor) Bhiksha Raj, MERL (design lead) Mosur Ravishankar, CMU (speech advisor) Bent Schmidt-Nielsen, MERL (speech advisor) Rita Singh, CMU/MIT (design/speech advisor) JM Van Thong, HP (speech advisor) Willie Walker, Sun Labs (overall lead) Manfred Warmuth, USCS (speech advisor) Joe Woelfel, MERL (developer and speech advisor) Peter Wolf, MERL (developer and speech advisor)
85,93d84
<
<
< org.apache.maven.plugins
< maven-compiler-plugin
<
<
I'm not entirely sure if this is the best place to ask these kind of questions, so please point me to a better place in case there is one.
We are currently using the Sphinx4 Long Aligner with some success for a subtitling project at University Hamburg.
Today was the first time that I tried it successfully "in the field".
I took the transcription and this video from the CCC Congress and aligned the 35min video (of course i mean the converted wav according to your instructions) in ~88 min with Sphinx Long Aligner, which is pretty good i think. (You can see the (manually optimized) results on the linked video page.)
So right now the biggest problem for this application are pauses in speech. The words are always directly next to each other even if there are long pauses. This means a lot of manual dragging around of the results. Long story short: is there an option to turn on speech pauses detection?
Also, a little second problem: when trying the Aligner with a >50min audio, it fails with an Out of Memory error at the liveCMN stage (the java vm has a 7G limit), after about 2h. Is there a way to change this?
Thanks for your help and your great work, that enables us to work on subtitling the CCC videos a magnitude faster.
Hello,
In the Tutorial Page, there is a typo in the code:
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration
.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration
.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration
.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
configuration);
InputStream stream = new FileInputStream(new File("test.wav")))
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
}
In the code above, InputStream stream = new FileInputStream(new File("test.wav"))) should be InputStream stream = new FileInputStream(new File("test.wav")); instead.
Anyone can fix this on the website?
jinhua
My computer is running Ubuntu 14.04 and I finished writing some code.
This code uses the LiveSpeechRecognizer and spews out what I say (while the program is open).
Nothing in the result but the usual logging stuff.
hi!
i want to download the files that you mention in this blogpost, but i cant find the above in the en_us_nostress.tar.gz archive. Could you tell me where to find it?
Thanks alot!
edu.cmu.sphinx.frontend.feature.LiveCMN#sum was missing which resulted in NPEs in some cases.
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.frontend.feature.LiveCMN.normalize(LiveCMN.java:203)
at edu.cmu.sphinx.frontend.feature.LiveCMN.getData(LiveCMN.java:182)
at edu.cmu.sphinx.frontend.feature.AbstractFeatureExtractor.getNextData(AbstractFeatureExtractor.java:125)
at edu.cmu.sphinx.frontend.feature.AbstractFeatureExtractor.getData(AbstractFeatureExtractor.java:101)
at edu.cmu.sphinx.frontend.feature.FeatureTransform.getData(FeatureTransform.java:85)
at edu.cmu.sphinx.frontend.FrontEnd.getData(FrontEnd.java:222)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.getNextData(SimpleAcousticScorer.java:146)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.calculateScoresAndStoreData(SimpleAcousticScorer.java:104)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstLookaheadSearchManager.scoreFastMatchTokens(WordPruningBreadthFirstLookaheadSearchManager.java:286)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstLookaheadSearchManager.fastMatchRecognize(WordPruningBreadthFirstLookaheadSearchManager.java:206)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstLookaheadSearchManager.localStart(WordPruningBreadthFirstLookaheadSearchManager.java:244)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.startRecognition(WordPruningBreadthFirstSearchManager.java:274)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:62)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:106)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:122)
at edu.cmu.sphinx.api.AbstractSpeechRecognizer.getResult(AbstractSpeechRecognizer.java:60)
at edu.cmu.sphinx.demo.transcriber.TranscriberDemo.main(TranscriberDemo.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
I've observed the following NullPointerException
when calling GetWords
:
java.lang.NullPointerException
at edu.cmu.sphinx.result.Lattice.getWordResultPath(Lattice.java:1062)
at edu.cmu.sphinx.api.SpeechResult.getWords(SpeechResult.java:52)
...
It looks like node.getWord()
sometimes returns null. I haven't dug deep enough to see if that is intentional or not.
Using 5prealpha-SNAPSHOT
from oss.sonatype.org
.
while i download the sphinx4, i found i can find the build.xml, so can anybody tell me how to build it?
I have a big issue with the Sphinx demo Code. When I want to instantiate à Robot object. the program just do nothing on the instantiation line for Robot object, and doesn't continue the code. But the program is still running.
ConfigurationManager cm;
cm = new
ConfigurationManager(HelloWorld.class.getResource("helloworld.config.xml"));
Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
recognizer.allocate();
// start the microphone or exit if the programm if this is not possible
Microphone microphone = (Microphone) cm.lookup("microphone");
if (!microphone.startRecording()) {
System.out.println("Cannot start microphone.");
recognizer.deallocate();
System.exit(1);
}
Robot rob = new Robot();
// loop the recognition until the programm exits.
System.out.println("Start speaking. Press Ctrl-C to quit.\n");
Result result = recognizer.recognize();
if (result != null) {
recognizer.deallocate();
microphone.stopRecording();
String resultText = result.getBestFinalResultNoFiller();
System.out.println("You said: " + resultText + '\n');
} else {
System.out.println("I can't hear what you said.\n");
}
I have also try to launch the Sphinx code in a thread, and the Robot code in an other thread. With this solution I have also the same problem.
I think the problem is with the thread "Java Sound Event Dispatcher" but I have no idea to solve it.
Precision : Without the line Robot rob = new Robot(); All work very fine.
Thank for reading.
Hi all,
I am trying to use CMU Sphinx for the first time to do speech-text alignment. I have created a Java project and imported sphinx4-core and sphinx4-data jars as libraries.
First, I tried to run the instructions in http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4 but the following line does not compile because there is no SpeechAligner constructor with a single Configuration argument:
SpeechAligner aligner = new SpeechAligner(configuration);
Thus, I turned to sphinx4-samples and used to AlignerDemo. My current code is:
import java.io.File;
import java.net.URL;
import java.util.List;
import edu.cmu.sphinx.api.SpeechAligner;
import edu.cmu.sphinx.result.WordResult;
public class TextAlignment {
public static void main(String args[]) throws Exception {
String acousticModelPath = "resource:/edu/cmu/sphinx/models/en-us/en-us";
String dictionaryPath = "resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict";
String g2pPath = "resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin";
SpeechAligner aligner = new SpeechAligner(acousticModelPath, dictionaryPath, g2pPath);
URL audioURL = new File("data/10001-90210-01803.wav").toURI().toURL();
List<WordResult> results = aligner.align(audioURL, "one zero zero zero one nine oh two one oh zero one eight zero three");
}
}
Unfortunately, the last line results in the following error:
16:42:25.049 INFO dictionary Loading dictionary from: jar:file:/home/nicolas/IdeaProjects/TextAlignment/lib/sphinx4-data-1.0-20150630.174256-9.jar!/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
16:42:25.211 INFO dictionary Loading filler dictionary from: jar:file:/home/nicolas/IdeaProjects/TextAlignment/lib/sphinx4-data-1.0-20150630.174256-9.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict
Exception in thread "main" java.lang.RuntimeException: Allocation of search manager resources failed
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:247)
at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:103)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:164)
at edu.cmu.sphinx.api.SpeechAligner.align(SpeechAligner.java:110)
at edu.cmu.sphinx.api.SpeechAligner.align(SpeechAligner.java:65)
at edu.cmu.sphinx.demo.aligner.TextAlignment.main(TextAlignment.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.io.StreamCorruptedException: invalid stream header: 54726965
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
at java.io.ObjectInputStream.(ObjectInputStream.java:299)
at edu.cmu.sphinx.fst.ImmutableFst.loadModel(ImmutableFst.java:169)
at edu.cmu.sphinx.linguist.g2p.G2PConverter.(G2PConverter.java:86)
at edu.cmu.sphinx.linguist.dictionary.TextDictionary.allocate(TextDictionary.java:189)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:332)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:243)
... 10 more
Do you have any idea what I am doing wrong?
Thanks a lot,
Nicolas
I am trying to implement a Sphinx4 demo application using Netbeans IDE 8.0.2 and jre8. It builds successfully.
But when I try to run the project after setting edu.cmu.Sphinx4.demo.DemoRunner.java as the main class I get the following error.
The error reads
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default-cli) on project sphinx4-samples: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
To see the full stack trace of the errors, re-run Maven with the -e switch.
Re-run Maven using the -X switch to enable full debug logging.
For more information about the errors and possible solutions, please read the following articles:
[Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
The POM.XML file is
<project
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>edu.cmu.sphinx</groupId>
<artifactId>sphinx4-parent</artifactId>
<version>1.0-SNAPSHOT</version>
</parent>
<artifactId>sphinx4-samples</artifactId>
<packaging>jar</packaging>
<name>Sphinx4 demo applications</name>
<dependencies>
<dependency>
<groupId>edu.cmu.sphinx</groupId>
<artifactId>sphinx4-core</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>edu.cmu.sphinx</groupId>
<artifactId>sphinx4-data</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<mainClass>edu.cmu.sphinx.demo.DemoRunner</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9</version>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<additionalparam>-Xdoclint:none</additionalparam>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Is there anything I am missing?
I am using a multiple threads each having a single StreamSpeechRecognizer object . All threads use the same Configuration so same acoustic model. Using 5 threads does duplication of resources on a massive scale as The 70k acoustic models objects each use up 4+ GB of RAM . So with 5 threads I'm at 20 GB.
I wanted to confirm if it's thread-safe to reuse the same acoustic model objects for multiple objects of StreamSpeechRecognizer on different threads at the same time. I guess achieving this isn't too difficult if we use a HashMap to cache models by the filepath. However before I do this can the developers confirm if the acoustic model objects like HMMPool are safe to share among multiple threads. I plan to use only one thread to load the first object and then reuse the same object among other threads. This will work fine only if sphinx5 itself uses the acoustic model objects in a thread-safe manner.
This will be a great feature to have as it can reduce RAM usage by upto 4GB per thread re-using the same acoustic model that's already loaded.
Why are expections not being wrapped in a RuntimeException
and being rethrown?
public FeatureFileDumper(ConfigurationManager cm, String frontEndName)
throws IOException {
try {
frontEnd = (FrontEnd) cm.lookup(frontEndName);
audioSource = (StreamDataSource) cm.lookup("streamDataSource");
} catch (Exception e) {
e.printStackTrace();
}
}
As far as I can tell, there aren't any official sphinx4 release artifacts in Maven Central, only unofficial forks.
http://search.maven.org/#search%7Cga%7C1%7Csphinx4
It'd be nice to have official artifacts published under the "edu.cmu.sphinx" group.
This would make it easy for developers to add Sphinx4 to their Maven projects, and use it with confidence that it's an official release.
Have been using Sphinx4 in my projects for a while now, mainly for basic continuous transcription. Decided to play around with the speakerid features, took a look at the speakerid demo. It seems that I can't run the (unmodified) code without getting the above exception.
My process:
mvn clean install -X
)java -jar sphinx4-samples-1.0-SNAPSHOT-jar-with-dependencies.jar speakerid
It seems to go through a few iterations (I believe it gets down to the actual "statss collection" step in the inner loop that decodes each segment for a particular speaker), but always crashes before it gets to the second pass loop (the one that actually transcribes and dumps out text).
Here's the exception I'm getting (repeatable 100%):
Exception in thread "main" org.apache.commons.math3.linear.SingularMatrixException: matrix is singular
at org.apache.commons.math3.linear.LUDecomposition$Solver.solve(LUDecomposition.java:297)
at edu.cmu.sphinx.decoder.adaptation.Transform.computeMllrTransforms(Transform.java:112)
at edu.cmu.sphinx.decoder.adaptation.Transform.update(Transform.java:177)
at edu.cmu.sphinx.decoder.adaptation.Stats.createTransform(Stats.java:231)
at edu.cmu.sphinx.demo.speakerid.SpeakerIdentificationDemo.speakerAdaptiveDecoding(SpeakerIdentificationDemo.java:96)
at edu.cmu.sphinx.demo.speakerid.SpeakerIdentificationDemo.main(SpeakerIdentificationDemo.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at edu.cmu.sphinx.demo.DemoRunner.main(DemoRunner.java:44)
Please take a look at it when you have the time. My very rusty linear algebra remnants tell me that it's probably trying to solve (by inversion) and getting stuck at a singular matrix (divide by 0 determinant), so I'm hoping that it might be a small regression bug that you'd be able to quickly spot. Or it could be something with my setup, either way please have a loot.
edit : A few more details about my setup.
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
OSX 10.10.3
14.3.0 Darwin Kernel Version 14.3.0: Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64 x86_64
Hi,
i have been googling for the past hours to find out if Sphinx4 supports Keyword spotting right now.
I found:
Can you tell me if there is any information i am missing right now?
hi!
i want to download the files that you mention in http://cmusphinx.sourceforge.net/2014/07/long-audio-aligner-landed-in-trunk/ blogpost, but i cant find the above in the http://sourceforge.net/projects/cmusphinx/files/G2P%20Models/ archive. Could you tell me where to find it?
Thanks alot!
I can't upgrade to java 8 and sphinx4 to work with java8. I am getting following error while running TranscriberDemo ::
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at com.pericent.test.TranscriberDemo.main(TranscriberDemo.java:32)
TranscriberDemo.java Class is as below
public static void main(String[] args) throws Exception {
System.out.println("Loading models...");
Configuration configuration = new Configuration();
// Load model from the jar
configuration
.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// You can also load model from folder
// configuration.setAcousticModelPath("file:en-us");
configuration
.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration
.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
configuration);
InputStream stream = TranscriberDemo.class
.getResourceAsStream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav");
stream.skip(44);
// Simple recognition with generic model
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
System.out.println("List of recognized words and their times:");
for (WordResult r : result.getWords()) {
System.out.println(r);
}
System.out.println("Best 3 hypothesis:");
for (String s : result.getNbest(3))
System.out.println(s);
}
recognizer.stopRecognition();
// Live adaptation to speaker with speaker profiles
stream = TranscriberDemo.class
.getResourceAsStream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav");
stream.skip(44);
// Stats class is used to collect speaker-specific data
Stats stats = recognizer.createStats(1);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
stats.collect(result);
}
recognizer.stopRecognition();
// Transform represents the speech profile
Transform transform = stats.createTransform();
recognizer.setTransform(transform);
// Decode again with updated transform
stream = TranscriberDemo.class
.getResourceAsStream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav");
stream.skip(44);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
}
}```
My pom.xml is as below:
```<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>MavenTest</groupId>
<artifactId>MavenTest</artifactId>
<packaging>war</packaging>
<version>0.0.1-SNAPSHOT</version>
<name>MavenTest Maven Webapp</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<!-- servlet api jar for servlets -->
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>javax.servlet-api</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.2</version>
</dependency>
<dependency>
<groupId>edu.cmu.sphinx</groupId>
<artifactId>sphinx4-core</artifactId>
<version>5prealpha-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>edu.cmu.sphinx</groupId>
<artifactId>sphinx4-data</artifactId>
<version>5prealpha-SNAPSHOT</version>
</dependency>
<!-- <dependency>
<groupId>org.jboss.resteasy</groupId>
<artifactId>resteasy-jsapi</artifactId>
<version>2.3.1.GA</version>
</dependency> -->
</dependencies>
<repositories>
<repository>
<id>snapshots-repo</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases><enabled>false</enabled></releases>
<snapshots><enabled>true</enabled></snapshots>
</repository>
</repositories>
<build>
<finalName>MavenTest</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-war-plugin</artifactId>
<version>2.3</version>
<configuration>
<warName>MavenTest</warName>
<!-- <outputDirectory>D:\jboss-as-7.1.1.Final\standalone\deployments</outputDirectory> -->
<outputDirectory>/home/jboss-as-7.1.1.Final/standalone/deployments</outputDirectory>
</configuration>
</plugin>
</plugins>
</build>
</project>```
I am running my program : Run As -> Java Application
Need help installing sphinx on eclipse.
I try to download the repo and test. But when I start a java prj in eclipse some imports are automatically excluded. Is there any reason why? Thanks.
I am having hard time to integrate models given here. I tried to integrate model cmusphinx-en-us-5.2 in pocketsphinx-5prealpha. Since this model is missing 'sendump' file.. I was getting error on running make..
Can I find some doc where I can understand the meaning of the files inside a model.
Thanks
We have been working on a project pertaining to Voice recognition and we are using Sphinx4 for the Digital Signal Processing phase. After analysing the code, we have seen that the FrontEnd module sends the output data to a Scorer. We discovered that given a sample configuration, after the FrontEnd voice processing is done, the scorer receives a list of FloatData with a float array. We wanted to know what each element in the array represents. It would be really useful if you could help us with it.
I just start DialogDemo.java and get exception:
Exception in thread "main" java.lang.IllegalStateException: javax.sound.sampled.LineUnavailableException: line with format PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian not supported.
at edu.cmu.sphinx.api.Microphone.<init>(Microphone.java:38)
at edu.cmu.sphinx.api.SpeechSourceProvider.getMicrophone(SpeechSourceProvider.java:18)
at edu.cmu.sphinx.api.LiveSpeechRecognizer.<init>(LiveSpeechRecognizer.java:35)
at edu.cmu.sphinx.demo.dialog.DialogDemo.main(DialogDemo.java:145)
Caused by: javax.sound.sampled.LineUnavailableException: line with format PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian not supported.
at com.sun.media.sound.DirectAudioDevice$DirectDL.implOpen(Unknown Source)
at com.sun.media.sound.AbstractDataLine.open(Unknown Source)
at com.sun.media.sound.AbstractDataLine.open(Unknown Source)
at edu.cmu.sphinx.api.Microphone.<init>(Microphone.java:36)
... 3 more
I don't sure that it was a bug.
I'm using sphinx4 in my project for speaker identification purpose. The sphinx implementation is able to handle single file without any issues.
But, when I run two threads with different files into speaker identification(speaker cluster generation code) the implementation runs into a concurrency issue.
Failed with:
ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project sphinx4-core: Fatal error compiling: invalid target release: 1.7 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project sphinx4-core: Fatal error compiling
Building from master.
Attention Windows users, the microphone does not work due to a known issue with CMUSphinx.
edit: Sorry @nshmyrev sent this to the wrong repo. Thanks again!
Hello,
We are testing Sphinx to find text from audio in Hadoop.
is there a way where we can give multiple files (rather than a directory) to setAcousticModelPath
as seen below
configuration.setAcousticModelPath is expecting a directory and not a file,
configuration.setAcousticModelPath("./wsj_8kHz233/wsj_8kHz233/wsj_8kHz11111");
Please help
the demo was running well untill I changed 'English model' to 'Mandarin Language Model' , then I got this error:
java.lang.IndexOutOfBoundsException: Index: 71680, Size: 71680
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.Pool.get(Pool.java:55)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader.createSenonePool(Sphinx3Loader.java:501)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader.loadModelFiles(Sphinx3Loader.java:386)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader.load(Sphinx3Loader.java:315)
。。。。。
this is my code:
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("file:" + configPath + "\zh\zh");
configuration.setDictionaryPath("file:" + configPath + "\zh\zh_broadcastnews_utf8.dic");
configuration.setLanguageModelPath("file:"+ configPath+ "\zh\zh_broadcastnews_64000_utf8.dmp");
try {
StreamSpeechRecognizer streamRecognizer = new StreamSpeechRecognizer(configuration);
}catch (Exception ex) {
ex.printStackTrace();
}
Am stuck on a way to implement this in Unity3D. Any suggestions. Please is should be step by step.Thanks in Advance.
I have also the same issue with the Sphinx4-5 (up to date). When I want to instantiate à Robot object. the program just do nothing on the instantiation line for Robot object, and doesn't continue the code. But the program is still running. I have no stacktrace because java just freeze. I think there is a conflict with internal java thread.
public class HelloTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
LiveSpeechRecognizer recognizer = null;
try {
recognizer = new LiveSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
boolean trouve = true;
Robot robot = new Robot();
while(trouve){
SpeechResult result = recognizer.getResult();
System.out.println(result.getHypothesis());
}
// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();
} catch (IOException | AWTException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
I have also try to launch the Sphinx code in a thread, and the Robot code in an other thread. With this solution I have also the same problem.
I think the problem is with the thread "Java Sound Event Dispatcher" but I have no idea to solve it.
Precision : Without the line Robot rob = new Robot(); All work very fine.
Thank for reading.
just downloaded, I cannot build:
$ gradle
FAILURE: Build failed with an exception.
* Where:
Build file '/home/luca/Documenti/src/sphinx/sphinx4/build.gradle' line: 13
* What went wrong:
A problem occurred evaluating root project 'sphinx4-parent'.
> No such property: ossrhUsername for class: org.gradle.api.publication.maven.internal.ant.DefaultGroovyMavenDeployer
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
BUILD FAILED
Total time: 4.539 secs
The documentation on the site right now is out of date, and there is not much out there for documentation.
The index.html says do ant javadoc
to build javadocs for the project. However when this command is executed in the sphinx4
directory following error is thrown.
$ ant javadoc
Buildfile: build.xml does not exist!
Build failed
I am able to run the model but I am not getting good accuracy and I think since I have Indian english accent, this has a role to play in it.
So how can I train the model with my speech and text samples so that it better understands my accent?
I have downloaded sphinxtrain-5prealpha but I am clueless on how to use it to train the new model?
The language model was build by using Kaldi tool kit. I used it to train the continuous acoustic model using Sphinx5prealpha release on Linux Ubuntu 14. Finally, I got Sentence error rate 78% and word error rate 17% with out fixing any error and warning shown during the training period.
When I was about to use the trained model in simple application as shown on the CMU sphinx tutorial the following error showed up
16:12:52.663 INFO trieNgramModel Loading n-gram language model from:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 51361
at edu.cmu.sphinx.linguist.language.ngram.trie.BinaryLoader.readWords(BinaryLoader.java:151)
at edu.cmu.sphinx.linguist.language.ngram.trie.NgramTrieModel.allocate(NgramTrieModel.java:233)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:334)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:243)
at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:103)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:164)
at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:52)
at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:39)
at ASR.main(ASR.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
The file I used for the java application is this
Thanks in advance for your help.
Using code from TranscriberDemo.java I successfully saved Transform in to a file as follows:
// Transform represents the speech profile
Transform transform = stats.createTransform();
transform.store("MyVoiceTransform", 0);
But when I load Transform from file into the Recognizer, code didn't works.
I debug the code and found issue with Scanner object in load() of Transform Class.
Scanner is not working as expected, ie not able to parse file. I use the following code to load Transform from file in to Recognizer.
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
recognizer.loadTransform("MyVoiceTransform", 1);
I don't know if the title is a good one but I will try to explain what my goals are. English is not my native language so please forgive any mistakes I make :)
The basic idea is to listen to the user input by using your LiveSpeechRecognizer. The user says commands and the program should then execute them accordingly. A small example for the game I am working on:
User Input | Game Interpretation |
---|---|
play card 1 | card 1 gets played |
play card 2 | card 2 gets played |
select card 2 | card 2 gets selected |
card 2 | card 2 gets selected |
As you can see, the third and forth command are equivalent to each other but the command isn't the actually the same. The usage of select
feels more natural than just the plain card 2
command. Implementing this kind of behavior is not that hard. One just needs to create a simple grammar files and define select
as optional:
<select_card> = [select] card <digit>
Now sphinx4 will recognize now both commands, card 2
and select card 2
. No we come to the part were the feature request takes place :)
It is kinda hard to compare those two return values and decide if they are the same commands (which basically means they both match the grammar described above). The program needs to calculate the differences and then decide if they match or not. But the fact that sphinx4 already loads up the grammar and parses the input means that it knows both sentences are equally treated.
My request is that you might implement some kind of callback function where people can register to sphinx4 via an simple interface so they get notified if the user input matches some specific grammar conditions. Here is a small (pseudo-like) example of what I mean:
public interface GrammarCondition {
/**
* Gets called when sphinx4 recognizes the user input and can match it to some specific grammar part.
* @param grammarPartName the name of the grammar part that applies to the input (in this case, "select_card")
* @param userInput the String that could be parsed from the sound captured.
void conditionFound(String grammarPartName, String userInput);
}
And maybe some registration service:
Conditions.register(String grammarPartName, GrammarCondition condition);
I would really really like to get feedback of you guys! You did a great job on this project. Keep your work up :)
Thank you for reading all that stuff,
Sven
I built the pocketsphinx-5prealpha. And then I ran the following command:
./pocketsphinx_continuous -infile review2.mp3
I was expecting to get the speech to text output for review2.mp3 however I got longish output but nowhere it contains the text of the mp3 file
How do I achieve this: input the mp3/wav file and get output the text?
And is there is there a time limit on the length of the audio file?
The main method in FeatureFileDumper
catches exceptions and writes to System.err
but does not exit the process with a non-zero exit code nor throw a RuntimeException
. One of these should occur.
try {
...
} catch (IOException ioe) {
System.err.println("I/O Error " + ioe);
} catch (PropertyException p) {
System.err.println("Bad configuration " + p);
}
Sorry sphinx4 support lpc? How can i using lpc as a feature extraction? Please help me :(
I copied the demo code from TranscriberDemo
into a separate project.
After building sphinx from source, my code doesn't work anymore because it can't find some files:
edu.cmu.sphinx.util.props.InternalConfigurationException: Can't locate resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp
I unzipped sphinx-data-1.0-SNAPSHOT.jar
in my .m2
folder, here are all the files in there:
META-INF/
META-INF/MANIFEST.MF
edu/
edu/cmu/
edu/cmu/sphinx/
edu/cmu/sphinx/models/
edu/cmu/sphinx/models/en-us/
edu/cmu/sphinx/models/en-us/en-us/
edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
edu/cmu/sphinx/models/en-us/en-us/feat.params
edu/cmu/sphinx/models/en-us/en-us/mdef
edu/cmu/sphinx/models/en-us/en-us/means
edu/cmu/sphinx/models/en-us/en-us/mixture_weights
edu/cmu/sphinx/models/en-us/en-us/noisedict
edu/cmu/sphinx/models/en-us/en-us/README
edu/cmu/sphinx/models/en-us/en-us/sendump
edu/cmu/sphinx/models/en-us/en-us/transition_matrices
edu/cmu/sphinx/models/en-us/en-us/variances
edu/cmu/sphinx/models/en-us/en-us.lm.bin
META-INF/maven/
META-INF/maven/edu.cmu.sphinx/
META-INF/maven/edu.cmu.sphinx/sphinx4-data/
META-INF/maven/edu.cmu.sphinx/sphinx4-data/pom.xml
META-INF/maven/edu.cmu.sphinx/sphinx4-data/pom.properties
Since everything used to work before (when maven was downloading sphinx-data from the online repository), I assume that I'm building sphinx wrong (I did mvn install
- which runs tests and they pass ok), though I can't find any reference in the documentation on how to do this.
The simpletokenizer.java and USEnglishTokenizer.java are causing issue with character literals while building using gradle. Please help!
New to sphinx here. I am using java 1.7.
I downloaded and opened the project in eclipse. But when I run any of the demo files e.g. edu.cmu.sphinx.demo.allphone.AllphoneDemo.java
I get a null pointer exception with following trace. I am using simple run as a java app in eclipse.
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:71)
at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:58)
at edu.cmu.sphinx.api.Context.(Context.java:59)
at edu.cmu.sphinx.api.Context.(Context.java:44)
at edu.cmu.sphinx.demo.allphone.AllphoneDemo.main(AllphoneDemo.java:42)
Am I missing something here?
I've developed a Java program that consumes sphinx to perform speaker-clustering using SpeakerIdentification.
The program/algorithm works fine giving outputs in the same amount of time when I test it with small audio files with similar attributes(time duration, bit-rate, mono/stereo, etc.)
The program run into trouble when I supply it with a large audio file i.e. greater than 1-1.5 hour (wav, 16khz, mono, pcm). The program runs for many hours and sometimes runs forever if the file is excessively large.
I do not know if I'm consuming sphinx the right way. Please guide me solve the problem as it would help me optimize my application's performance to a great extent.
The DialogDemo has been broken for several months, throwing with the following exception on Windows:
Exception in thread "main" java.lang.IllegalStateException: javax.sound.sampled.LineUnavailableException: line with format PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian not supported.
at edu.cmu.sphinx.api.Microphone.<init>(Microphone.java:38)
at edu.cmu.sphinx.api.SpeechSourceProvider.getMicrophone(SpeechSourceProvider.java:18)
at edu.cmu.sphinx.api.LiveSpeechRecognizer.<init>(LiveSpeechRecognizer.java:35)
at edu.cmu.sphinx.demo.dialog.DialogDemo.main(DialogDemo.java:144)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: javax.sound.sampled.LineUnavailableException: line with format PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian not supported.
at com.sun.media.sound.DirectAudioDevice$DirectDL.implOpen(DirectAudioDevice.java:513)
at com.sun.media.sound.AbstractDataLine.open(AbstractDataLine.java:121)
at com.sun.media.sound.AbstractDataLine.open(AbstractDataLine.java:413)
at edu.cmu.sphinx.api.Microphone.<init>(Microphone.java:36)
... 8 more
This issue makes it impossible to stop recording on a LiveSpeechRecognizer and reuse the same microphone on Windows. According to @nshmyrev, this is due to be fixed soon https://sourceforge.net/p/cmusphinx/bugs/412/.
Related
When using LiveSpeechRecognizer I'm getting
javax.sound.sampled.LineUnavailableException: line with format PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian not supported.
when starting speech recognition second time after it has been stopped using stopRecognition method
I Started porting Sphinx 4 to CSharp but I noticed it needed Jsapi. Can it be done without jsapi?
Sorry having trouble figuring out what I am doing wrong.
mvn clean package
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] Failure executing javac, but could not parse the error:
javac: invalid target release: 1.7
Usage: javac <options> <source files>
use -help for a list of possible options
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Sphinx4 ........................................... SUCCESS [1.081s]
[INFO] Sphinx4 models .................................... SUCCESS [7.561s]
[INFO] Sphinx4 core ...................................... FAILURE [0.154s]
[INFO] Sphinx4 demo applications ......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.968s
[INFO] Finished at: Thu Jan 22 15:37:15 PST 2015
[INFO] Final Memory: 10M/81M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project sphinx4-core: Compilation failure
[ERROR] Failure executing javac, but could not parse the error:
[ERROR] javac: invalid target release: 1.7
[ERROR] Usage: javac <options> <source files>
[ERROR] use -help for a list of possible options
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :sphinx4-core
If there is a better place to ask let me know.
Thanks!
Hello Nicolai, thank you for fixing the issue pertaining to the "singular matrix" exception. I can verify now that it's no longer an issue, the speaker segments are being properly calculated and dumped to console with printSpeakerIntervals().
However, my concern now is that it seems that the speakerAdaptiveDecoding() never prints any hypotheses. A quick look at the demo code tells me that it should be iterating over the many speaker segments and running the (adapted) recognition. I can definitely see it iterate many times (loading language/acoustic/etc models) and running speedTracker, but I never see any hypothesis output.
In addition, the speedTracker reports the transcription time as "0.00 X realtime", prompting me to believe that it never actually runs after the models are loaded.
I'm running the demo "as is", on the in-package /edu/cmu/sphinx/demo/speakerid/test.wav file, but I'd be happy to try this out on some of my own media if you think it would help.
I've been using Sphinx4 for over half a year now, and generally under the right conditions I feel it works pretty well, I was hoping to get a bit of an accuracy boost from the speaker adaptation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.