GithubHelp home page GithubHelp logo

Comments (12)

csukuangfj avatar csukuangfj commented on May 29, 2024

你是不是 sherpa-onnx 和 sherpa-ncnn 弄混了?

有运行

pip install sherpa-onnx

from sherpa-onnx.

lonngxiang avatar lonngxiang commented on May 29, 2024

你是不是 sherpa-onnx 和 sherpa-ncnn 弄混了?

有运行

pip install sherpa-onnx

有安装的:Installing collected packages: sherpa-onnx
Successfully installed sherpa-onnx-1.9.10;

windows运行:

python .\speech-recognition-from-microphone-onnx.py  --tokens=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt --encoder=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx   --decoder=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx --joiner=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx
#!/usr/bin/env python3

# Real-time speech recognition from a microphone with sherpa-onnx Python API
#
# Please refer to
# https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
# to download pre-trained models

import argparse
import sys
from pathlib import Path

try:
    import sounddevice as sd
except ImportError:
    print("Please install sounddevice first. You can use")
    print()
    print("  pip install sounddevice")
    print()
    print("to install it")
    sys.exit(-1)

import sherpa_onnx




def assert_file_exists(filename: str):
    assert Path(filename).is_file(), (
        f"{filename} does not exist!\n"
        "Please refer to "
        "https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html to download it"
    )


def get_args():
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument(
        "--tokens",
        type=str,
        help="Path to tokens.txt",
    )

    parser.add_argument(
        "--encoder",
        type=str,
        help="Path to the encoder model",
    )

    parser.add_argument(
        "--decoder",
        type=str,
        help="Path to the decoder model",
    )

    parser.add_argument(
        "--joiner",
        type=str,
        help="Path to the joiner model",
    )

    parser.add_argument(
        "--decoding-method",
        type=str,
        default="greedy_search",
        help="Valid values are greedy_search and modified_beam_search",
    )

    return parser.parse_args()


def create_recognizer():
    args = get_args()
    assert_file_exists(args.encoder)
    assert_file_exists(args.decoder)
    assert_file_exists(args.joiner)
    assert_file_exists(args.tokens)
    # Please replace the model files if needed.
    # See https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
    # for download links.
    print(args.tokens,
        args.encoder,
        args.decoder,
        args.joiner)
    recognizer = sherpa_onnx.OnlineRecognizer(
        tokens=args.tokens,
        encoder=args.encoder,
        decoder=args.decoder,
        joiner=args.joiner,
        num_threads=1,
        sample_rate=16000,
        feature_dim=80,
        decoding_method=args.decoding_method,
    )
    return recognizer


def main():
    recognizer = create_recognizer()
    print("Started! Please speak")

    # The model is using 16 kHz, we use 48 kHz here to demonstrate that
    # sherpa-onnx will do resampling inside.
    sample_rate = 48000
    samples_per_read = int(0.1 * sample_rate)  # 0.1 second = 100 ms
    last_result = ""
    stream = recognizer.create_stream()
    # last_result = ""
    i=0
    with sd.InputStream(channels=1, dtype="float32", samplerate=sample_rate) as s:
        while True:
            samples, _ = s.read(samples_per_read)  # a blocking read
            samples = samples.reshape(-1)
            stream.accept_waveform(sample_rate, samples)
            while recognizer.is_ready(stream):
                recognizer.decode_stream(stream)
            result = recognizer.get_result(stream)
            # if last_result != result:
            #     last_result = result
            #     print("\r{}".format(result), end="", flush=True)

            if last_result != result:
                if i==0:
                    print("{}".format(result),end='')
                    last_result = result
                    i=i+1
                else:
                    last_result_len=len(last_result)
                    
                    new_word = result[last_result_len:]
                    # print(last_result,result,new_word)
                    print("{}".format(new_word),end='', flush=True)
                    last_result = result


if __name__ == "__main__":
    devices = sd.query_devices()
    print(devices)
    default_input_device_idx = sd.default.device[0]
    print(f'Use default device: {devices[default_input_device_idx]["name"]}')

    try:
        main()
    except KeyboardInterrupt:
        print("\nCaught Ctrl + C. Exiting")

from sherpa-onnx.

csukuangfj avatar csukuangfj commented on May 29, 2024
import sherpa_onnx
print(sherpa_onnx.__file__)
print(help(sherpa_onnx.OnlineRecognizer))

这几行,输出什么呢?

from sherpa-onnx.

lonngxiang avatar lonngxiang commented on May 29, 2024
```python
print(help(sherpa_onnx.OnlineRecognizer))
import sherpa_onnx
>>> print(sherpa_onnx.__file__)
C:\Users\loong\.conda\envs\nlp\Lib\site-packages\sherpa_onnx\__init__.py
>>> print(help(sherpa_onnx.OnlineRecognizer))
Help on class OnlineRecognizer in module sherpa_onnx.online_recognizer:

class OnlineRecognizer(builtins.object)
 |  A class for streaming speech recognition.
 |
 |  Please refer to the following files for usages
 |   - https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/python/tests/test_online_recognizer.py
 |   - https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/online-decode-files.py
 |
 |  Methods defined here:
 |
 |  create_stream(self, hotwords: Optional[str] = None)
 |
 |  decode_stream(self, s: _sherpa_onnx.OnlineStream)
 |
 |  decode_streams(self, ss: List[_sherpa_onnx.OnlineStream])
 |
 |  get_result(self, s: _sherpa_onnx.OnlineStream) -> str
 |
 |  is_endpoint(self, s: _sherpa_onnx.OnlineStream) -> bool
 |
 |  is_ready(self, s: _sherpa_onnx.OnlineStream) -> bool
 |
 |  reset(self, s: _sherpa_onnx.OnlineStream) -> bool
 |
 |  timestamps(self, s: _sherpa_onnx.OnlineStream) -> List[float]
 |
 |  tokens(self, s: _sherpa_onnx.OnlineStream) -> List[str]
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  from_paraformer(tokens: str, encoder: str, decoder: str, num_threads: int = 2, sample_rate: float = 16000, feature_dim: int = 80, enable_endpoint_detection: bool = False, rule1_min_trailing_silence: float = 2.4, rule2_min_trailing_silence: float = 1.2, rule3_min_utterance_length: float = 20.0, decoding_method: str = 'greedy_search', provider: str = 'cpu') from builtins.type
 |      Please refer to
 |      `<https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html>`_
 |      to download pre-trained models for different languages, e.g., Chinese,
 |      English, etc.
 |
 |      Args:
 |        tokens:
 |          Path to ``tokens.txt``. Each line in ``tokens.txt`` contains two
 |          columns::
 |
 |              symbol integer_id
 |
 |        encoder:
 |          Path to ``encoder.onnx``.
 |        decoder:
 |          Path to ``decoder.onnx``.
 |        num_threads:
 |          Number of threads for neural network computation.
 |        sample_rate:
 |          Sample rate of the training data used to train the model.
 |        feature_dim:
 |          Dimension of the feature used to train the model.
 |        enable_endpoint_detection:
 |          True to enable endpoint detection. False to disable endpoint
 |          detection.
 |        rule1_min_trailing_silence:
 |          Used only when enable_endpoint_detection is True. If the duration
 |          of trailing silence in seconds is larger than this value, we assume
 |          an endpoint is detected.
 |        rule2_min_trailing_silence:
 |          Used only when enable_endpoint_detection is True. If we have decoded
 |          something that is nonsilence and if the duration of trailing silence
 |          in seconds is larger than this value, we assume an endpoint is
 |          detected.
 |        rule3_min_utterance_length:
 |          Used only when enable_endpoint_detection is True. If the utterance
 |          length in seconds is larger than this value, we assume an endpoint
 |          is detected.
 |        decoding_method:
 |          The only valid value is greedy_search.
 |        provider:
 |          onnxruntime execution providers. Valid values are: cpu, cuda, coreml.
 |
 |  from_transducer(tokens: str, encoder: str, decoder: str, joiner: str, num_threads: int = 2, sample_rate: float = 16000, feature_dim: int = 80, enable_endpoint_detection: bool = False, rule1_min_trailing_silence: float = 2.4, rule2_min_trailing_silence: float = 1.2, rule3_min_utterance_length: float = 20.0, decoding_method: str = 'greedy_search', max_active_paths: int = 4, hotwords_score: float = 1.5, blank_penalty: float = 0.0, hotwords_file: str = '', provider: str = 'cpu', model_type: str = '', lm: str = '', lm_scale: float = 0.1) from builtins.type
 |      Please refer to
 |      `<https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html>`_
 |      to download pre-trained models for different languages, e.g., Chinese,
 |      English, etc.
 |
 |      Args:
 |        tokens:
 |          Path to ``tokens.txt``. Each line in ``tokens.txt`` contains two
 |          columns::
 |
 |              symbol integer_id
 |
 |        encoder:
 |          Path to ``encoder.onnx``.
 |        decoder:
 |          Path to ``decoder.onnx``.
 |        joiner:
 |          Path to ``joiner.onnx``.
 |        num_threads:
 |          Number of threads for neural network computation.
 |        sample_rate:
 |          Sample rate of the training data used to train the model.
 |        feature_dim:
 |          Dimension of the feature used to train the model.
 |        enable_endpoint_detection:
 |          True to enable endpoint detection. False to disable endpoint
 |          detection.
 |        rule1_min_trailing_silence:
 |          Used only when enable_endpoint_detection is True. If the duration
 |          of trailing silence in seconds is larger than this value, we assume
 |          an endpoint is detected.
 |        rule2_min_trailing_silence:
 |          Used only when enable_endpoint_detection is True. If we have decoded
 |          something that is nonsilence and if the duration of trailing silence
 |          in seconds is larger than this value, we assume an endpoint is
 |          detected.
 |        rule3_min_utterance_length:
 |          Used only when enable_endpoint_detection is True. If the utterance
 |          length in seconds is larger than this value, we assume an endpoint
 |          is detected.
 |        decoding_method:
 |          Valid values are greedy_search, modified_beam_search.
 |        max_active_paths:
 |          Use only when decoding_method is modified_beam_search. It specifies
 |          the maximum number of active paths during beam search.
 |        blank_penalty:
 |          The penalty applied on blank symbol during decoding.
 |        hotwords_file:
 |          The file containing hotwords, one words/phrases per line, and for each
 |          phrase the bpe/cjkchar are separated by a space.
 |        hotwords_score:
 |          The hotword score of each token for biasing word/phrase. Used only if
 |          hotwords_file is given with modified_beam_search as decoding method.
 |        provider:
 |          onnxruntime execution providers. Valid values are: cpu, cuda, coreml.
 |        model_type:
 |          Online transducer model type. Valid values are: conformer, lstm,
 |          zipformer, zipformer2. All other values lead to loading the model twice.
 |
 |  from_wenet_ctc(tokens: str, model: str, chunk_size: int = 16, num_left_chunks: int = 4, num_threads: int = 2, sample_rate: float = 16000, feature_dim: int = 80, enable_endpoint_detection: bool = False, rule1_min_trailing_silence: float = 2.4, rule2_min_trailing_silence: float = 1.2, rule3_min_utterance_length: float = 20.0, decoding_method: str = 'greedy_search', provider: str = 'cpu') from builtins.type
 |      Please refer to
 |      `<https://k2-fsa.github.io/sherpa/onnx/pretrained_models/wenet/index.html>`_
 |      to download pre-trained models for different languages, e.g., Chinese,
 |      English, etc.
 |
 |      Args:
 |        tokens:
 |          Path to ``tokens.txt``. Each line in ``tokens.txt`` contains two
 |          columns::
 |
 |              symbol integer_id
 |
 |        model:
 |          Path to ``model.onnx``.
 |        chunk_size:
 |          The --chunk-size parameter from WeNet.
 |        num_left_chunks:
 |          The --num-left-chunks parameter from WeNet.
 |        num_threads:
 |          Number of threads for neural network computation.
 |        sample_rate:
 |          Sample rate of the training data used to train the model.
 |        feature_dim:
 |          Dimension of the feature used to train the model.
 |        enable_endpoint_detection:

from sherpa-onnx.

csukuangfj avatar csukuangfj commented on May 29, 2024

你写的代码

    recognizer = sherpa_onnx.OnlineRecognizer(
        tokens=args.tokens,
        encoder=args.encoder,
        decoder=args.decoder,
        joiner=args.joiner,
        num_threads=1,
        sample_rate=16000,
        feature_dim=80,
        decoding_method=args.decoding_method,
    )
    return recognizer

这个是从哪里来的?

from sherpa-onnx.

csukuangfj avatar csukuangfj commented on May 29, 2024

https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speech-recognition-from-microphone.py

最新的代码是这个

from sherpa-onnx.

lonngxiang avatar lonngxiang commented on May 29, 2024

你写的代码

    recognizer = sherpa_onnx.OnlineRecognizer(
        tokens=args.tokens,
        encoder=args.encoder,
        decoder=args.decoder,
        joiner=args.joiner,
        num_threads=1,
        sample_rate=16000,
        feature_dim=80,
        decoding_method=args.decoding_method,
    )
    return recognizer

这个是从哪里来的?

嗯,我用新的测试喜爱,这份代码可能是去年的

from sherpa-onnx.

lonngxiang avatar lonngxiang commented on May 29, 2024

https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speech-recognition-from-microphone.py

最新的代码是这个

好像还是报错

recognizer = sherpa_onnx.OnlineRecognizer.from_transducer(
  File "C:\Users\loong\.conda\envs\nlp\Lib\site-packages\sherpa_onnx\online_recognizer.py", line 181, in from_transducer
    self.recognizer = _Recognizer(recognizer_config)
RuntimeError: Failed to load model because protobuf parsing failed.

from sherpa-onnx.

csukuangfj avatar csukuangfj commented on May 29, 2024

你模型文件路径,是不是不对?

https://k2-fsa.github.io/sherpa/onnx/python/real-time-speech-recongition-from-a-microphone.html

这个是具体的文档,你去看看?

请确保

  1. 你有下载模型
  2. 你有正确的给定模型文件路径

from sherpa-onnx.

lonngxiang avatar lonngxiang commented on May 29, 2024

Comment
Write
Preview

Add your comment here...

Markdown is supported
Paste, drop, or click to add files
Close with comment
Comment
Remember, contributions to this repository should follow our GitHub Community Guidelines.
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development
No branches or pull requests

Notifications
Customize

Unsubscribe
You’re receiving notifications because you authored the thread.

2 participants
@csukuangfj @lonngxiang

Footer


© 2024 GitHub, Inc.

Footer navigation

嗯,应该是文件损坏,我用的huggingface-cli下载,好像下载的文件有一定问题

from sherpa-onnx.

csukuangfj avatar csukuangfj commented on May 29, 2024

https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models

这边可以下载,你去看看?直接 wget 就好
Screenshot 2024-02-21 at 12 08 38

from sherpa-onnx.

lonngxiang avatar lonngxiang commented on May 29, 2024

https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models

这边可以下载,你去看看?直接 wget 就好 Screenshot 2024-02-21 at 12 08 38

好的收到,谢谢

from sherpa-onnx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.