espnet / espnet_model_zoo Goto Github PK
View Code? Open in Web Editor NEWESPnet Model Zoo
License: Apache License 2.0
ESPnet Model Zoo
License: Apache License 2.0
Hi, thank for developing very useful user-friendly library. I have tried to run STT with the following code. However, I wasn't able to finish the code due to Resource error. I found that usage of memory increased to 200GB with htop
command.
#!/usr/env/python
import soundfile as sf
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.asr_inference import Speech2Text
d = ModelDownloader()
info = d.download_and_unpack(task='asr', corpus='csj')
speech2text = Speech2Text(
**info)
wav, fs = sf.read("./A02F0116.wav") # csj wav file
text, token, *_ = speech2text(wav)[0]
print(text)
Hi all, I noticed that these two model tags link to the same download. Is there a pre-trained ljspeech vits model with space/pauses?
kan-bayashi/ljspeech_tts_train_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave
kan-bayashi/ljspeech_vits
Hi, I recently ran a successful evaluation for Japanese ASR using the pretrained CSJ transformer-based model. I understand that model zoo now defaults to huggingface for the models, and the published pretrained conformer model is not anymore on the list. How do I replace the run.sh arguments to accommodate the downloading of the conformer model from zenodo?
To clarify, I can download the files using the direct link on zenodo via python, but I'm clarifying whether there's a new way to download via the recipe's run.sh script or the CLI commands.
It would be nice to have an option for the downloader so that it isn't pulling git from huggingface if the model is already downloaded on the disk and cached. Right now, even if the model is cached a query to huggingface is made. Also if there is no internet or no connection to huggingface, there seems to be no timeout and the downloder hangs.
Hello,
I've been trying to use the chime4 pretrained model for ASR.
Unfortunately, I can't load the model.
Using the model name doesn't work, the model isn't found.
Using the direct link to zenodo zip file is making it possible to download the model but the init method returns the following error.
__init__() got an unexpected keyword argument 'ignore_nan_grad'
Is this because the model is under an outdated version of ESPNET ? And if so, what the easiest way for me to use this model ?
I'm trying to figure out how to use models from here in the espnet2 colab demo: https://colab.research.google.com/github/espnet/notebook/blob/master/espnet2_tts_realtime_demo.ipynb
I'm using these values for tag
, vocoder_tag
, etc:
fs, lang = 24000, "English"
tag = "kan-bayashi/vctk_tts_train_gst_tacotron2_raw_phn_tacotron_g2p_en_no_space_train.loss.best"
vocoder_tag = "ljspeech_multi_band_melgan.v2"
And trying to get it to run by setting a random value for speech
x = "This is my favorite sentence!"
speech = torch.randn(512, 80) # this is wrong
# synthesis
with torch.no_grad():
start = time.time()
wav, c, *_ = text2speech(x, speech=speech)
wav = vocoder.inference(c)
rtf = (time.time() - start) / (len(wav) / fs)
print(f"RTF = {rtf:5f}")
# let us listen to generated samples
from IPython.display import display, Audio
display(Audio(wav.view(-1).cpu().numpy(), rate=fs))
But I get the error:
RuntimeError: Padding size should be less than the corresponding input dimension, but got: padding (1024, 1024) at dimension 2 of input (1,.,.) = ...
I feel like this is because the shape of speech is wrong. Are there any examples of how to compute a proper value for it?
I see in the code it's expecting shape (Lmax, idim)
. I can see that odim
is 80
and when I look at the value of text2speech.tts.odim
.
On the other hand, perhaps I've misunderstood something obvious about how to approach this. :)
Hi everybody,
I am trying to test some speech enhancement models but I cannot find any in the table.csv.
Are they somewhere?
Hello,
I am pretty new to ESPnet and I am attempting to perform inference using the vctk_tts_train_xvector_transformer_raw_phn_tacotron_g2p_en_no_space_train.loss.ave
pretrained model.
Steps Taken:
speechbrain/spkrec-xvect-voxceleb
to create speaker embeddings for specific voices.The problem is that the generated audios are extremely short (0.125 or 0.013 seconds) and sound noisy.
I am using the Python API. I only provided text and spembs fields when calling the Text2Speech class. I also have successfully used the Python API with other pretrained models that do not require speaker embeddings. I am unsure if there are additional arguments or steps required when using this specific model with speaker embeddings.
If more information is needed, I am happy to provide it. Has anyone experienced a similar issue or can provide guidance on how to resolve this?
Thank you for your assistance,
TypeError: init() got an unexpected keyword argument 'train_config'
import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.asr_inference import Speech2Text
d = ModelDownloader()
speech2text = Speech2Text(
**d.download_and_unpack("Emiru Tsunoo/aishell_asr_train_asr_streaming_transformer_raw_zh_char_sp_valid.acc.ave"),
# Decoding parameters are not included in the model file
maxlenratio=0.0,
minlenratio=0.0,
beam_size=20,
ctc_weight=0.3,
lm_weight=0.5,
penalty=0.0,
nbest=1
)
Error message:
Traceback (most recent call last):
File "test_asr.py", line 5, in <module>
speech2text = Speech2Text(
File "/home/ming-y/anaconda3/envs/espnet/lib/python3.8/site-packages/espnet2/bin/asr_inference.py", line 73, in __init__
asr_model, asr_train_args = ASRTask.build_model_from_file(
File "/home/ming-y/anaconda3/envs/espnet/lib/python3.8/site-packages/espnet2/tasks/abs_task.py", line 1834, in build_model_from_file
model = cls.build_model(args)
File "/home/ming-y/anaconda3/envs/espnet/lib/python3.8/site-packages/espnet2/tasks/asr.py", line 388, in build_model
encoder_class = encoder_choices.get_class(args.encoder)
File "/home/ming-y/anaconda3/envs/espnet/lib/python3.8/site-packages/espnet2/train/class_choices.py", line 75, in get_class
raise ValueError(
ValueError: --encoder must be one of ('conformer', 'transformer', 'vgg_rnn', 'rnn'): --encoder contextual_block_transformer
I don't know why.
https://github.com/espnet/espnet/tree/master/egs2/libritts/tts1/conf/tuning
There are many model configs in libritts, but the only models that can be downloaded are transformer and fastspeech2. T T
When I want to use the inference, something wrong related to pretrained model.
How can I place the file".lock"?
FileNotFoundError: [Errno 2] No such file or directory: 'D:\anaconda3\envs\style\lib\site-packages\espnet_model_zoo\79ec90b8bd3dbaba9b8d75d7f7e53392\asr_train_asr_conformer5_raw_bpe5000_frontend_confn_fft512_frontend_confhop_length256_scheduler_confwarmup_steps25000_batch_bins140000000_optim_conflr0.0015_initnone_sp_valid.acc.ave.zip.lock'
Hi,
Thanks for the work. I am trying to use the pre-trained model, but I don't know how to get the decoding score for the corresponding decoding results.
nbests = speech2text(speech)
text, *_ = nbests[0]
print(text)
The code above only prints text. I would like to get decoding confidence as well.
I checked speech2text class.
for hyp in nbest_hyps:
assert isinstance(hyp, Hypothesis), type(hyp)
# remove sos/eos and get results
token_int = hyp.yseq[1:-1].tolist()
# remove blank symbol id, which is assumed to be 0
token_int = list(filter(lambda x: x != 0, token_int))
# Change integer-ids to tokens
token = self.converter.ids2tokens(token_int)
if self.tokenizer is not None:
text = self.tokenizer.tokens2text(token)
else:
text = None
results.append((text, token, token_int, hyp))
assert check_return_type(results)
return results
From the code above I conjecture that the confidence should be obtained from the "hyp", but it is not clear to me how
to parse "hyp" to get the score.
I cant install espnet_model_zoo as I get this sentencepiece building problem. What version of sentencepiece is this using? I cant figure it out.. as setup.py doesnt have a fixed version.
I have successfully installed pip install sentencepiece - so its installed. All I can think is this requires a different version..
pip install espnet_model_zoo
Collecting espnet_model_zoo
Using cached espnet_model_zoo-0.1.7-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pandas in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet_model_zoo) (2.0.3)
Requirement already satisfied: requests in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet_model_zoo) (2.31.0)
Requirement already satisfied: tqdm in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet_model_zoo) (4.66.1)
Requirement already satisfied: numpy in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet_model_zoo) (1.24.3)
Collecting espnet (from espnet_model_zoo)
Using cached espnet-202402-py3-none-any.whl.metadata (68 kB)
Requirement already satisfied: huggingface-hub in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet_model_zoo) (0.22.2)
Requirement already satisfied: filelock in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet_model_zoo) (3.12.2)
Requirement already satisfied: setuptools>=38.5.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (69.5.1)
Requirement already satisfied: packaging in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (23.1)
Collecting configargparse>=1.2.1 (from espnet->espnet_model_zoo)
Using cached ConfigArgParse-1.7-py3-none-any.whl.metadata (23 kB)
Collecting typeguard==2.13.3 (from espnet->espnet_model_zoo)
Using cached typeguard-2.13.3-py3-none-any.whl.metadata (3.6 kB)
Collecting humanfriendly (from espnet->espnet_model_zoo)
Using cached humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Requirement already satisfied: scipy>=1.4.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (1.11.2)
Collecting librosa==0.9.2 (from espnet->espnet_model_zoo)
Using cached librosa-0.9.2-py3-none-any.whl.metadata (8.2 kB)
Requirement already satisfied: jamo==0.4.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (0.4.1)
Requirement already satisfied: PyYAML>=5.1.2 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (6.0.1)
Requirement already satisfied: soundfile>=0.10.2 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (0.12.1)
Collecting h5py>=2.10.0 (from espnet->espnet_model_zoo)
Using cached h5py-3.11.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (2.5 kB)
Collecting kaldiio>=2.18.0 (from espnet->espnet_model_zoo)
Using cached kaldiio-2.18.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: torch>=1.11.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (2.1.0)
Collecting torch-complex (from espnet->espnet_model_zoo)
Using cached torch_complex-0.4.3-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: nltk>=3.4.5 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (3.8.1)
Collecting numpy (from espnet_model_zoo)
Using cached numpy-1.23.5-cp311-cp311-macosx_11_0_arm64.whl.metadata (2.3 kB)
Requirement already satisfied: protobuf in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (4.24.1)
Collecting hydra-core (from espnet->espnet_model_zoo)
Using cached hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting opt-einsum (from espnet->espnet_model_zoo)
Using cached opt_einsum-3.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting sentencepiece==0.1.97 (from espnet->espnet_model_zoo)
Using cached sentencepiece-0.1.97.tar.gz (524 kB)
Preparing metadata (setup.py) ... done
Collecting ctc-segmentation>=1.6.6 (from espnet->espnet_model_zoo)
Using cached ctc_segmentation-1.7.4-cp311-cp311-macosx_13_0_arm64.whl
Collecting pyworld>=0.3.4 (from espnet->espnet_model_zoo)
Using cached pyworld-0.3.4-cp311-cp311-macosx_13_0_arm64.whl
Collecting pypinyin<=0.44.0 (from espnet->espnet_model_zoo)
Using cached pypinyin-0.44.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting espnet-tts-frontend (from espnet->espnet_model_zoo)
Using cached espnet_tts_frontend-0.0.3-py3-none-any.whl.metadata (3.4 kB)
Collecting ci-sdr (from espnet->espnet_model_zoo)
Using cached ci_sdr-0.0.2-py3-none-any.whl
Collecting fast-bss-eval==0.1.3 (from espnet->espnet_model_zoo)
Using cached fast_bss_eval-0.1.3-py3-none-any.whl
Requirement already satisfied: asteroid-filterbanks==0.4.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet->espnet_model_zoo) (0.4.0)
Collecting editdistance (from espnet->espnet_model_zoo)
Using cached editdistance-0.8.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (3.9 kB)
Collecting importlib-metadata<5.0 (from espnet->espnet_model_zoo)
Using cached importlib_metadata-4.13.0-py3-none-any.whl.metadata (4.9 kB)
Requirement already satisfied: typing-extensions in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from asteroid-filterbanks==0.4.0->espnet->espnet_model_zoo) (4.9.0)
Requirement already satisfied: audioread>=2.1.9 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from librosa==0.9.2->espnet->espnet_model_zoo) (3.0.0)
Requirement already satisfied: scikit-learn>=0.19.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from librosa==0.9.2->espnet->espnet_model_zoo) (1.3.0)
Requirement already satisfied: joblib>=0.14 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from librosa==0.9.2->espnet->espnet_model_zoo) (1.3.2)
Requirement already satisfied: decorator>=4.0.10 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from librosa==0.9.2->espnet->espnet_model_zoo) (4.4.2)
Collecting resampy>=0.2.2 (from librosa==0.9.2->espnet->espnet_model_zoo)
Using cached resampy-0.4.3-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: numba>=0.45.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from librosa==0.9.2->espnet->espnet_model_zoo) (0.57.0)
Requirement already satisfied: pooch>=1.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from librosa==0.9.2->espnet->espnet_model_zoo) (1.6.0)
Requirement already satisfied: fsspec>=2023.5.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from huggingface-hub->espnet_model_zoo) (2023.6.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from pandas->espnet_model_zoo) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from pandas->espnet_model_zoo) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from pandas->espnet_model_zoo) (2023.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from requests->espnet_model_zoo) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from requests->espnet_model_zoo) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from requests->espnet_model_zoo) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from requests->espnet_model_zoo) (2023.7.22)
Requirement already satisfied: Cython in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from ctc-segmentation>=1.6.6->espnet->espnet_model_zoo) (0.29.30)
Collecting zipp>=0.5 (from importlib-metadata<5.0->espnet->espnet_model_zoo)
Using cached zipp-3.18.1-py3-none-any.whl.metadata (3.5 kB)
Requirement already satisfied: click in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from nltk>=3.4.5->espnet->espnet_model_zoo) (8.1.7)
Requirement already satisfied: regex>=2021.8.3 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from nltk>=3.4.5->espnet->espnet_model_zoo) (2023.8.8)
Requirement already satisfied: six>=1.5 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->espnet_model_zoo) (1.16.0)
Requirement already satisfied: cffi>=1.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from soundfile>=0.10.2->espnet->espnet_model_zoo) (1.15.1)
Requirement already satisfied: sympy in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from torch>=1.11.0->espnet->espnet_model_zoo) (1.12)
Requirement already satisfied: networkx in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from torch>=1.11.0->espnet->espnet_model_zoo) (2.8.8)
Requirement already satisfied: jinja2 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from torch>=1.11.0->espnet->espnet_model_zoo) (3.1.2)
Requirement already satisfied: einops in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from ci-sdr->espnet->espnet_model_zoo) (0.6.1)
Collecting unidecode>=1.0.22 (from espnet-tts-frontend->espnet->espnet_model_zoo)
Using cached Unidecode-1.3.8-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: inflect>=1.0.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from espnet-tts-frontend->espnet->espnet_model_zoo) (5.6.0)
Collecting jaconv (from espnet-tts-frontend->espnet->espnet_model_zoo)
Using cached jaconv-0.3.4-py3-none-any.whl
Collecting g2p-en (from espnet-tts-frontend->espnet->espnet_model_zoo)
Using cached g2p_en-2.1.0-py3-none-any.whl.metadata (4.5 kB)
Requirement already satisfied: omegaconf<2.4,>=2.2 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from hydra-core->espnet->espnet_model_zoo) (2.3.0)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from hydra-core->espnet->espnet_model_zoo) (4.9.3)
Requirement already satisfied: pycparser in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from cffi>=1.0->soundfile>=0.10.2->espnet->espnet_model_zoo) (2.21)
Requirement already satisfied: llvmlite<0.41,>=0.40.0dev0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from numba>=0.45.1->librosa==0.9.2->espnet->espnet_model_zoo) (0.40.1)
Requirement already satisfied: appdirs>=1.3.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from pooch>=1.0->librosa==0.9.2->espnet->espnet_model_zoo) (1.4.4)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from scikit-learn>=0.19.1->librosa==0.9.2->espnet->espnet_model_zoo) (3.2.0)
Collecting distance>=0.1.3 (from g2p-en->espnet-tts-frontend->espnet->espnet_model_zoo)
Using cached Distance-0.1.3-py3-none-any.whl
Requirement already satisfied: MarkupSafe>=2.0 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from jinja2->torch>=1.11.0->espnet->espnet_model_zoo) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages (from sympy->torch>=1.11.0->espnet->espnet_model_zoo) (1.3.0)
Using cached espnet_model_zoo-0.1.7-py3-none-any.whl (19 kB)
Using cached espnet-202402-py3-none-any.whl (1.8 MB)
Using cached librosa-0.9.2-py3-none-any.whl (214 kB)
Using cached typeguard-2.13.3-py3-none-any.whl (17 kB)
Using cached numpy-1.23.5-cp311-cp311-macosx_11_0_arm64.whl (13.3 MB)
Using cached ConfigArgParse-1.7-py3-none-any.whl (25 kB)
Using cached h5py-3.11.0-cp311-cp311-macosx_11_0_arm64.whl (2.9 MB)
Using cached importlib_metadata-4.13.0-py3-none-any.whl (23 kB)
Using cached kaldiio-2.18.0-py3-none-any.whl (28 kB)
Using cached pypinyin-0.44.0-py2.py3-none-any.whl (1.3 MB)
Using cached editdistance-0.8.1-cp311-cp311-macosx_11_0_arm64.whl (79 kB)
Using cached espnet_tts_frontend-0.0.3-py3-none-any.whl (11 kB)
Using cached humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Using cached torch_complex-0.4.3-py3-none-any.whl (9.1 kB)
Using cached resampy-0.4.3-py3-none-any.whl (3.1 MB)
Using cached Unidecode-1.3.8-py3-none-any.whl (235 kB)
Using cached zipp-3.18.1-py3-none-any.whl (8.2 kB)
Using cached g2p_en-2.1.0-py3-none-any.whl (3.1 MB)
Building wheels for collected packages: sentencepiece
Building wheel for sentencepiece (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [88 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-13.4-arm64-cpython-311
creating build/lib.macosx-13.4-arm64-cpython-311/sentencepiece
copying src/sentencepiece/__init__.py -> build/lib.macosx-13.4-arm64-cpython-311/sentencepiece
copying src/sentencepiece/_version.py -> build/lib.macosx-13.4-arm64-cpython-311/sentencepiece
copying src/sentencepiece/sentencepiece_model_pb2.py -> build/lib.macosx-13.4-arm64-cpython-311/sentencepiece
copying src/sentencepiece/sentencepiece_pb2.py -> build/lib.macosx-13.4-arm64-cpython-311/sentencepiece
running build_ext
Package sentencepiece was not found in the pkg-config search path.
Perhaps you should add the directory containing `sentencepiece.pc'
to the PKG_CONFIG_PATH environment variable
No package 'sentencepiece' found
Cloning into 'sentencepiece'...
Note: switching to '58f256cf6f01bb86e6fa634a5cc560de5bd1667d'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
./build_bundled.sh: line 19: cmake: command not found
./build_bundled.sh: line 20: nproc: command not found
./build_bundled.sh: line 20: cmake: command not found
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/private/var/folders/y8/jgtt0gmx5z54mbjd0zx_hb8m0000gn/T/pip-install-2zqoh_nv/sentencepiece_ad33f7f96f2a4cc08ae00c731de66ee4/setup.py", line 136, in <module>
setup(
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/__init__.py", line 104, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
self.run_command("build")
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 91, in run
_build_ext.run(self)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
self.build_extensions()
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
self._build_extensions_serial()
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
self.build_extension(ext)
File "/private/var/folders/y8/jgtt0gmx5z54mbjd0zx_hb8m0000gn/T/pip-install-2zqoh_nv/sentencepiece_ad33f7f96f2a4cc08ae00c731de66ee4/setup.py", line 89, in build_extension
subprocess.check_call(['./build_bundled.sh', __version__])
File "/Users/willwade/.pyenv/versions/3.11.4/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['./build_bundled.sh', '0.1.97']' returned non-zero exit status 127.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for sentencepiece
Running setup.py clean for sentencepiece
Failed to build sentencepiece
ERROR: Could not build wheels for sentencepiece, which is required to install pyproject.toml-based projects
This is my first time submitting an issue. I apologize for any inadequacies.
I want to add the original tts model to "espnet_model_zoo" and when I run the following code, I get the following error message at the push stage.
Executed code
~/espnet_model_zoo$ git push origin develop
Error messages displayed
remote: Permission to espnet/espnet_model_zoo.git denied to c44128. fatal: unable to access 'https://github.com/espnet/espnet_model_zoo.git/': The requested URL returned error: 403
I wanted to create a pull request to have it added to "table.csv", but I am told that I do not have write access to the repository.
I am asking this question because I could not solve the problem on my own.
I would appreciate it if you could tell me how to solve this problem.
Thank you for always providing good tools and continuous support.
I have a good model trained on espnet1 and would like to use it in espnet model zoo, can I use the model trained on espnet1 in model zoo?
I am trying to reproduce the same results with espnet2 but so far have not been successful.
Hi everyone,
I am trying to implement ESPnet zoo using a model I already trained using the regular ESPnet before ESPnet2 came out.
First, I made it work using a downloaded model I chose from table.csv.
I couldn't find documentation on how to use a pretrained model so I contacted @sw005320, who directed me to this section of the asr.sh script.
My plan now is to create a new script that would only run espnet2.bin.pack asr
and take the following arguments:
--lm_train_config <PATH/TO/lm.yaml>
--lm_file <PATH/TO/rnnlm.model.best>
--asr_train_config <PATH/TO/train.yaml>
--asr_model_file <PATH/TO/model.acc.best>
--option <PATH/TO/train_clean_unigram_${nbpe}.model> # instead of ${bpemodel}
--outpath <PATH/TO/packed_model.zip>
I got a couple of questions:
asr.sh
script it's given:--option ${asr_stats_dir}/train/feats_stats.npz
cmvn.ark
. How can I parse it as an argument to the python script?${lm_exp}/perplexity_test/ppl
${lm_exp}/images
"${asr_exp}"/RESULTS.md
"${asr_exp}"/images
Are those needed?
Thank you very much,
Daniel
insatall espnet_model_zoo and torch successfuly but get follow errors:
speech2text = Speech2Text.from_pretrained(
AttributeError: type object 'Speech2Text' has no attribute 'from_pretrained'
Hi everyone!
I am trying to use espnet_zoo for a locally trained model.
Everything works fine when I use a pre-trained model from Zenodo's table.csv.
However, when I change the model's path to my local trained model (the zip file generated from ESPnet 2) I get the error:
RuntimeError: /home/ubuntu/espnet/tools/venv/lib/python3.7/site-packages/espnet_model_zoo/0abcf46495c3333043c8e3679ea9a844/exp/<my_models_name>/396epoch.pth is a zip archive (did you mean to use torch.jit.load()?)
Environment:
In this python GitHub issue, where they get the same errer they say it is a bug, although it comes from a different command.
Have anyone got this issue when trying to use a local model for zoo?
Thank you in advance!
Hello, I am trying to export my ESPnet2 model to Zenodo but I got this issue pasted below.
The cause of this is because my status_code is 403 (blocked) here
What should I do to upload the model ?
Thank you !
Traceback (most recent call last):
File "/jet/home/berrebbi/miniconda3/envs/espnet/bin/espnet_model_zoo_upload", line 8, in <module>
sys.exit(main())
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/espnet_model_zoo/zenodo_upload.py", line 297, in main
upload_espnet_model(**kwargs)
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/espnet_model_zoo/zenodo_upload.py", line 230, in upload_espnet_model
publish=publish,
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/espnet_model_zoo/zenodo_upload.py", line 141, in upload
r = zenodo.create_deposition()
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/espnet_model_zoo/zenodo_upload.py", line 48, in create_deposition
raise RuntimeError(r.json()["message"])
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/jet/home/berrebbi/miniconda3/envs/espnet/lib/python3.7/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ```
Hi, I tried a model from Huggingface (https://huggingface.co/espnet/simpleoier_librispeech_asr_train_asr_conformer7_wavlm_large_raw_en_bpe5000_sp) and copied the code from the "Use in ESPnet" button.
The example was broken, I had to change
text, *_ = model(speech)
to
text, *_ = model(speech)[0]
According to the readme of espnet_model_zoo, the user has to use the getitem first.
I don't know, how to fix that. Could you fix the example on huggingface?
Here, the examples from huggingface and github with the mismatch of the expected output of Speech2Text
:
ASR demo is not thread-safe
I have created new conda env and install espnet_model_zoo.
I ran this command -
from espnet_model_zoo.downloader import ModelDownloader
got error -
ERROR:root:
espnet_model_zoo
is not installed. Please install viapip install -U espnet_model_zoo
.
Traceback (most recent call last):
File "", line 1, in
File "/home/knit/espnet/my_scripts/espnet_model_zoo.py", line 8, in
speech2text = Speech2Text.from_pretrained(
File "/home/knit/anaconda3/envs/tf2onnx/lib/python3.8/site-packages/espnet2/bin/asr_inference.py", line 358, in from_pretrained
from espnet_model_zoo.downloader import ModelDownloader
ModuleNotFoundError: No module named 'espnet_model_zoo.downloader'; 'espnet_model_zoo' is not a package
Downloaded models are stored per default in the package folder.
However, when espnet_model_zoo
is imported as a system library from a distribution package, its path is read-only.
Proposed solution:
Check for output folder access using os.access(modelcache, os.W_OK)
.
If not writeable, default to a different directory, for example $XDG_CACHE_HOME
.
Steps to reproduce the issue:
espnet_model_zoo
as system package.from espnet_model_zoo.downloader import ModelDownloader
d = ModelDownloader()
wsjmodel = d.download_and_unpack("kamo-naoyuki/wsj")
With this, the following exception is thrown:
https://zenodo.org/record/4003381/files/asr_train_asr_transformer_raw_char_valid.acc.ave.zip?download=1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 156M/156M [00:09<00:00, 17.0MB/s]
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
<ipython-input-3-64ba61e9d85d> in <module>
----> 1 wsjmodel = d.download_and_unpack("kamo-naoyuki/wsj")
/usr/lib/python3.9/site-packages/espnet_model_zoo/downloader.py in download_and_unpack(self, name, version, quiet, **kwargs)
288
289 # Download the file to an unique path
--> 290 filename = self.download(url, quiet=quiet)
291
292 # Extract files from archived file
/usr/lib/python3.9/site-packages/espnet_model_zoo/downloader.py in download(self, name, version, quiet, **kwargs)
243 # Download the model file if not existing
244 if not (outdir / filename).exists():
--> 245 download(url, outdir / filename, quiet=quiet)
246
247 # Write the url for debugging
/usr/lib/python3.9/site-packages/espnet_model_zoo/downloader.py in download(url, output_path, retry, chunk_size, quiet)
82 pbar.update(len(chunk))
83
---> 84 Path(output_path).parent.mkdir(parents=True, exist_ok=True)
85 shutil.move(Path(d) / "tmp", output_path)
86
/usr/lib/python3.9/pathlib.py in mkdir(self, mode, parents, exist_ok)
1310 """
1311 try:
-> 1312 self._accessor.mkdir(self, mode)
1313 except FileNotFoundError:
1314 if not parents or self.parent == self:
PermissionError: [Errno 13] Permission denied: '/usr/lib/python3.9/site-packages/espnet_model_zoo/b2d27107e15dd714684f5767ef10d402'
Hi @kamo-naoyuki , could you please update the latest version of this repo to the Pypi? I'm working on some demonstrations for espnet2 and would like to include some latest models, which unfortunately is not shown in the pip version. Many thanks!
Hi, Is there a Mandarin multi-speaker pretrained model? I didn't find it
I've successfully trained a streaming transformer for German with 13000 hours of data and end-to-end punctuation. See https://huggingface.co/speechcatcher/speechcatcher_german_espnet_streaming_transformer_13k_train_size_m_raw_de_bpe1024 . Espnet2 with asr.sh is really nice, thanks for that!
Following https://github.com/espnet/notebook/blob/master/espnet2_streaming_asr_demo.ipynb, inference also works on the training machine (Linux with Cuda). When I was trying to do inference with my model on a Mac mini (M1, arm) I was getting this error though:
File "/Users/me/projects/speechcatcher/speechcatcher.py", line 66, in <module>
recognize("test1.wav")
File "/Users/me/projects/speechcatcher/speechcatcher.py", line 52, in recognize
results = speech2text(speech=speech[i*sim_chunk_length:(i+1)*sim_chunk_length], is_final=False)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/espnet2/bin/asr_inference_streaming.py", line 310, in __call__
feats, feats_lengths, self.frontend_states = self.apply_frontend(
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/espnet2/bin/asr_inference_streaming.py", line 253, in apply_frontend
feats, feats_lengths = self.asr_model._extract_feats(**batch)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/espnet2/asr/espnet_model.py", line 407, in _extract_feats
feats, feats_lengths = self.frontend(speech, speech_lengths)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/espnet2/asr/frontend/default.py", line 87, in forward
input_stft, feats_lens = self._compute_stft(input, input_lengths)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/espnet2/asr/frontend/default.py", line 122, in _compute_stft
input_stft, feats_lens = self.stft(input, input_lengths)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/espnet2/layers/stft.py", line 139, in forward
stft = librosa.stft(input[i].numpy(), **stft_kwargs)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/librosa/util/decorators.py", line 88, in inner_f
return f(*args, **kwargs)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/librosa/core/spectrum.py", line 204, in stft
fft_window = get_window(window, win_length, fftbins=True)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/librosa/util/decorators.py", line 88, in inner_f
return f(*args, **kwargs)
File "/Users/me/projects/speechcatcher/speechcatcher_env/lib/python3.10/site-packages/librosa/filters.py", line 1191, in get_window
raise ParameterError(
librosa.util.exceptions.ParameterError: Window size mismatch: 512 != 400
I've used the frontend conf from https://github.com/espnet/espnet/blob/master/egs2/jsut/asr1/conf/tuning/train_asr_conformer.yaml#L3-L8 to generate the features on the fly, assuming I'm getting the standard 25ms / 10ms hop filterbank features with this.
In espnet2/layers/stft.py:
# NOTE(kamo):
# The default behaviour of torch.stft is compatible with librosa.stft
# about padding and scaling.
# Note that it's different from scipy.signal.stft
# For the compatibility of ARM devices, which do not support
# torch.stft() due to the lake of MKL.
if input.is_cuda or torch.backends.mkl.is_available():
#use torch.stft ....
else:
#use librosa ....
Basically claims that both implementations are compatible. On my training machine, it must have used torch.stft and on the mac mini librosa. Seems they are not 100% replaceable implementations of STFT after all. Librosa can't do a window size of 400 with an FFT of 512, there is a comment in the code that says:
Raises
------
ParameterError
If `window` is supplied as a vector of length != `n_fft`,
or is otherwise mis-specified.
Do I need to train model with a window size of 512 to make it compatible with librosa?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.