Comments (12)
你是不是 sherpa-onnx 和 sherpa-ncnn 弄混了?
有运行
pip install sherpa-onnx
么
from sherpa-onnx.
你是不是 sherpa-onnx 和 sherpa-ncnn 弄混了?
有运行
pip install sherpa-onnx
么
有安装的:Installing collected packages: sherpa-onnx
Successfully installed sherpa-onnx-1.9.10;
windows运行:
python .\speech-recognition-from-microphone-onnx.py --tokens=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt --encoder=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx --decoder=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx --joiner=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx
#!/usr/bin/env python3
# Real-time speech recognition from a microphone with sherpa-onnx Python API
#
# Please refer to
# https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
# to download pre-trained models
import argparse
import sys
from pathlib import Path
try:
import sounddevice as sd
except ImportError:
print("Please install sounddevice first. You can use")
print()
print(" pip install sounddevice")
print()
print("to install it")
sys.exit(-1)
import sherpa_onnx
def assert_file_exists(filename: str):
assert Path(filename).is_file(), (
f"{filename} does not exist!\n"
"Please refer to "
"https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html to download it"
)
def get_args():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument(
"--tokens",
type=str,
help="Path to tokens.txt",
)
parser.add_argument(
"--encoder",
type=str,
help="Path to the encoder model",
)
parser.add_argument(
"--decoder",
type=str,
help="Path to the decoder model",
)
parser.add_argument(
"--joiner",
type=str,
help="Path to the joiner model",
)
parser.add_argument(
"--decoding-method",
type=str,
default="greedy_search",
help="Valid values are greedy_search and modified_beam_search",
)
return parser.parse_args()
def create_recognizer():
args = get_args()
assert_file_exists(args.encoder)
assert_file_exists(args.decoder)
assert_file_exists(args.joiner)
assert_file_exists(args.tokens)
# Please replace the model files if needed.
# See https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
# for download links.
print(args.tokens,
args.encoder,
args.decoder,
args.joiner)
recognizer = sherpa_onnx.OnlineRecognizer(
tokens=args.tokens,
encoder=args.encoder,
decoder=args.decoder,
joiner=args.joiner,
num_threads=1,
sample_rate=16000,
feature_dim=80,
decoding_method=args.decoding_method,
)
return recognizer
def main():
recognizer = create_recognizer()
print("Started! Please speak")
# The model is using 16 kHz, we use 48 kHz here to demonstrate that
# sherpa-onnx will do resampling inside.
sample_rate = 48000
samples_per_read = int(0.1 * sample_rate) # 0.1 second = 100 ms
last_result = ""
stream = recognizer.create_stream()
# last_result = ""
i=0
with sd.InputStream(channels=1, dtype="float32", samplerate=sample_rate) as s:
while True:
samples, _ = s.read(samples_per_read) # a blocking read
samples = samples.reshape(-1)
stream.accept_waveform(sample_rate, samples)
while recognizer.is_ready(stream):
recognizer.decode_stream(stream)
result = recognizer.get_result(stream)
# if last_result != result:
# last_result = result
# print("\r{}".format(result), end="", flush=True)
if last_result != result:
if i==0:
print("{}".format(result),end='')
last_result = result
i=i+1
else:
last_result_len=len(last_result)
new_word = result[last_result_len:]
# print(last_result,result,new_word)
print("{}".format(new_word),end='', flush=True)
last_result = result
if __name__ == "__main__":
devices = sd.query_devices()
print(devices)
default_input_device_idx = sd.default.device[0]
print(f'Use default device: {devices[default_input_device_idx]["name"]}')
try:
main()
except KeyboardInterrupt:
print("\nCaught Ctrl + C. Exiting")
from sherpa-onnx.
import sherpa_onnx
print(sherpa_onnx.__file__)
print(help(sherpa_onnx.OnlineRecognizer))
这几行,输出什么呢?
from sherpa-onnx.
```python print(help(sherpa_onnx.OnlineRecognizer))
import sherpa_onnx
>>> print(sherpa_onnx.__file__)
C:\Users\loong\.conda\envs\nlp\Lib\site-packages\sherpa_onnx\__init__.py
>>> print(help(sherpa_onnx.OnlineRecognizer))
Help on class OnlineRecognizer in module sherpa_onnx.online_recognizer:
class OnlineRecognizer(builtins.object)
| A class for streaming speech recognition.
|
| Please refer to the following files for usages
| - https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/python/tests/test_online_recognizer.py
| - https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/online-decode-files.py
|
| Methods defined here:
|
| create_stream(self, hotwords: Optional[str] = None)
|
| decode_stream(self, s: _sherpa_onnx.OnlineStream)
|
| decode_streams(self, ss: List[_sherpa_onnx.OnlineStream])
|
| get_result(self, s: _sherpa_onnx.OnlineStream) -> str
|
| is_endpoint(self, s: _sherpa_onnx.OnlineStream) -> bool
|
| is_ready(self, s: _sherpa_onnx.OnlineStream) -> bool
|
| reset(self, s: _sherpa_onnx.OnlineStream) -> bool
|
| timestamps(self, s: _sherpa_onnx.OnlineStream) -> List[float]
|
| tokens(self, s: _sherpa_onnx.OnlineStream) -> List[str]
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| from_paraformer(tokens: str, encoder: str, decoder: str, num_threads: int = 2, sample_rate: float = 16000, feature_dim: int = 80, enable_endpoint_detection: bool = False, rule1_min_trailing_silence: float = 2.4, rule2_min_trailing_silence: float = 1.2, rule3_min_utterance_length: float = 20.0, decoding_method: str = 'greedy_search', provider: str = 'cpu') from builtins.type
| Please refer to
| `<https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html>`_
| to download pre-trained models for different languages, e.g., Chinese,
| English, etc.
|
| Args:
| tokens:
| Path to ``tokens.txt``. Each line in ``tokens.txt`` contains two
| columns::
|
| symbol integer_id
|
| encoder:
| Path to ``encoder.onnx``.
| decoder:
| Path to ``decoder.onnx``.
| num_threads:
| Number of threads for neural network computation.
| sample_rate:
| Sample rate of the training data used to train the model.
| feature_dim:
| Dimension of the feature used to train the model.
| enable_endpoint_detection:
| True to enable endpoint detection. False to disable endpoint
| detection.
| rule1_min_trailing_silence:
| Used only when enable_endpoint_detection is True. If the duration
| of trailing silence in seconds is larger than this value, we assume
| an endpoint is detected.
| rule2_min_trailing_silence:
| Used only when enable_endpoint_detection is True. If we have decoded
| something that is nonsilence and if the duration of trailing silence
| in seconds is larger than this value, we assume an endpoint is
| detected.
| rule3_min_utterance_length:
| Used only when enable_endpoint_detection is True. If the utterance
| length in seconds is larger than this value, we assume an endpoint
| is detected.
| decoding_method:
| The only valid value is greedy_search.
| provider:
| onnxruntime execution providers. Valid values are: cpu, cuda, coreml.
|
| from_transducer(tokens: str, encoder: str, decoder: str, joiner: str, num_threads: int = 2, sample_rate: float = 16000, feature_dim: int = 80, enable_endpoint_detection: bool = False, rule1_min_trailing_silence: float = 2.4, rule2_min_trailing_silence: float = 1.2, rule3_min_utterance_length: float = 20.0, decoding_method: str = 'greedy_search', max_active_paths: int = 4, hotwords_score: float = 1.5, blank_penalty: float = 0.0, hotwords_file: str = '', provider: str = 'cpu', model_type: str = '', lm: str = '', lm_scale: float = 0.1) from builtins.type
| Please refer to
| `<https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html>`_
| to download pre-trained models for different languages, e.g., Chinese,
| English, etc.
|
| Args:
| tokens:
| Path to ``tokens.txt``. Each line in ``tokens.txt`` contains two
| columns::
|
| symbol integer_id
|
| encoder:
| Path to ``encoder.onnx``.
| decoder:
| Path to ``decoder.onnx``.
| joiner:
| Path to ``joiner.onnx``.
| num_threads:
| Number of threads for neural network computation.
| sample_rate:
| Sample rate of the training data used to train the model.
| feature_dim:
| Dimension of the feature used to train the model.
| enable_endpoint_detection:
| True to enable endpoint detection. False to disable endpoint
| detection.
| rule1_min_trailing_silence:
| Used only when enable_endpoint_detection is True. If the duration
| of trailing silence in seconds is larger than this value, we assume
| an endpoint is detected.
| rule2_min_trailing_silence:
| Used only when enable_endpoint_detection is True. If we have decoded
| something that is nonsilence and if the duration of trailing silence
| in seconds is larger than this value, we assume an endpoint is
| detected.
| rule3_min_utterance_length:
| Used only when enable_endpoint_detection is True. If the utterance
| length in seconds is larger than this value, we assume an endpoint
| is detected.
| decoding_method:
| Valid values are greedy_search, modified_beam_search.
| max_active_paths:
| Use only when decoding_method is modified_beam_search. It specifies
| the maximum number of active paths during beam search.
| blank_penalty:
| The penalty applied on blank symbol during decoding.
| hotwords_file:
| The file containing hotwords, one words/phrases per line, and for each
| phrase the bpe/cjkchar are separated by a space.
| hotwords_score:
| The hotword score of each token for biasing word/phrase. Used only if
| hotwords_file is given with modified_beam_search as decoding method.
| provider:
| onnxruntime execution providers. Valid values are: cpu, cuda, coreml.
| model_type:
| Online transducer model type. Valid values are: conformer, lstm,
| zipformer, zipformer2. All other values lead to loading the model twice.
|
| from_wenet_ctc(tokens: str, model: str, chunk_size: int = 16, num_left_chunks: int = 4, num_threads: int = 2, sample_rate: float = 16000, feature_dim: int = 80, enable_endpoint_detection: bool = False, rule1_min_trailing_silence: float = 2.4, rule2_min_trailing_silence: float = 1.2, rule3_min_utterance_length: float = 20.0, decoding_method: str = 'greedy_search', provider: str = 'cpu') from builtins.type
| Please refer to
| `<https://k2-fsa.github.io/sherpa/onnx/pretrained_models/wenet/index.html>`_
| to download pre-trained models for different languages, e.g., Chinese,
| English, etc.
|
| Args:
| tokens:
| Path to ``tokens.txt``. Each line in ``tokens.txt`` contains two
| columns::
|
| symbol integer_id
|
| model:
| Path to ``model.onnx``.
| chunk_size:
| The --chunk-size parameter from WeNet.
| num_left_chunks:
| The --num-left-chunks parameter from WeNet.
| num_threads:
| Number of threads for neural network computation.
| sample_rate:
| Sample rate of the training data used to train the model.
| feature_dim:
| Dimension of the feature used to train the model.
| enable_endpoint_detection:
from sherpa-onnx.
你写的代码
recognizer = sherpa_onnx.OnlineRecognizer(
tokens=args.tokens,
encoder=args.encoder,
decoder=args.decoder,
joiner=args.joiner,
num_threads=1,
sample_rate=16000,
feature_dim=80,
decoding_method=args.decoding_method,
)
return recognizer
这个是从哪里来的?
from sherpa-onnx.
最新的代码是这个
from sherpa-onnx.
你写的代码
recognizer = sherpa_onnx.OnlineRecognizer( tokens=args.tokens, encoder=args.encoder, decoder=args.decoder, joiner=args.joiner, num_threads=1, sample_rate=16000, feature_dim=80, decoding_method=args.decoding_method, ) return recognizer这个是从哪里来的?
嗯,我用新的测试喜爱,这份代码可能是去年的
from sherpa-onnx.
最新的代码是这个
好像还是报错
recognizer = sherpa_onnx.OnlineRecognizer.from_transducer(
File "C:\Users\loong\.conda\envs\nlp\Lib\site-packages\sherpa_onnx\online_recognizer.py", line 181, in from_transducer
self.recognizer = _Recognizer(recognizer_config)
RuntimeError: Failed to load model because protobuf parsing failed.
from sherpa-onnx.
你模型文件路径,是不是不对?
https://k2-fsa.github.io/sherpa/onnx/python/real-time-speech-recongition-from-a-microphone.html
这个是具体的文档,你去看看?
请确保
- 你有下载模型
- 你有正确的给定模型文件路径
from sherpa-onnx.
Comment
Write
PreviewAdd your comment here...
Markdown is supported
Paste, drop, or click to add files
Close with comment
Comment
Remember, contributions to this repository should follow our GitHub Community Guidelines.
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development
No branches or pull requestsNotifications
CustomizeUnsubscribe
You’re receiving notifications because you authored the thread.Footer
Footer navigation
嗯,应该是文件损坏,我用的huggingface-cli下载,好像下载的文件有一定问题
from sherpa-onnx.
https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
from sherpa-onnx.
https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
好的收到,谢谢
from sherpa-onnx.
Related Issues (20)
- speaker-identification.py的speaker-file参数具体是指? HOT 3
- Crash After Package Name Changed HOT 7
- This library cannot reconcile with the react-native onnxuruntime official library: unresolved OrtGetApiBase symbol HOT 17
- how to solve the channel problem and bit rate problem? HOT 2
- ANR On Android HOT 3
- 这边项目也支持语音唤醒吗? HOT 12
- Cannot instantiate Java OnlineRecognizer on Windows HOT 3
- Requred File? HOT 1
- Issue with sampling rate requirement for online_websocket_client_decode_file.py HOT 2
- whisper model recognition is unstable and error
- 在SherpaOnnxKws(Android)运行过程中发现的一个问题 HOT 4
- 热词影响onnx端点识别问题 HOT 5
- 单词写错需要修改 HOT 2
- 麻烦问下endpoint识别与vad识别有什么区别吗? HOT 5
- Use On .NET UWP HOT 6
- transcription inconsistency in different runs HOT 3
- export 3d speaker campplus sv model to onnx error
- Whisper onnxruntime exception on Android HOT 18
- 设置keywords后,例如小新小新这样的识别就很高,如果是“学习管家”这样的就基本很难识别到,老哥指导下怎么优化? HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sherpa-onnx.