yeyupiaoling / paddlepaddle-deepspeech Goto Github PK

View Code? Open in Web Editor NEW

621.0 621.0 140.0 15.13 MB

基于PaddlePaddle实现的语音识别，中文语音识别。项目完善，识别效果好。支持Windows，Linux下训练和预测，支持Nvidia Jetson开发板预测。

Home Page: https://yeyupiaoling.blog.csdn.net/article/details/102904306

License: Apache License 2.0

Python 93.21% JavaScript 3.60% HTML 2.50% CSS 0.68%

asr chinese deep-learning deepspeech deepspeech2 docker nvidia-docker paddlepaddle speech-recognition speech-to-text

paddlepaddle-deepspeech's Introduction

开发者，你们好！

核心项目

项目类型	Pytorch版本	PaddlePaddle版本
语音识别	MASR	PPASR
声纹识别	VoiceprintRecognition-Pytorch	VoiceprintRecognition-PaddlePaddle
声音分类	AudioClassification-Pytorch	AudioClassification-PaddlePaddle
语音情感识别	SpeechEmotionRecognition-Pytorch	SpeechEmotionRecognition-PaddlePaddle
语音合成	VITS-Pytorch	VITS-PaddlePaddle

语音项目

基于PaddlePaddle动态图实现的语音识别项目：PPASR
基于Pytorch实现的语音识别项目：MASR
微调Whisper模型和加速推理：Whisper-Finetune
基于PaddlePaddle静态图实现的语音识别项目：PaddlePaddle-DeepSpeech
基于Pytorch实现的声音分类项目：AudioClassification-Pytorch
基于PaddlePaddle实现声音分类项目：AudioClassification-PaddlePaddle
基于PaddlePaddle实现声纹识别项目：VoiceprintRecognition-PaddlePaddle
基于Pytorch实现声纹识别项目：VoiceprintRecognition-Pytorch
基于Tensorflow实现声纹识别项目：VoiceprintRecognition-Tensorflow
基于Keras实现声纹识别项目：VoiceprintRecognition-Keras
基于PaddlePaddle实现的语音情感识别：SpeechEmotionRecognition-PaddlePaddle
基于Pytorch实现的语音情感识别：SpeechEmotionRecognition-Pytorch
基于PaddlePaddle实现的VIST语音合成：VITS-PaddlePaddle
基于Pytorch实现的VIST语音合成：VITS-Pytorch

视觉项目

基于PaddlePaddle实现的人脸识别项目：PaddlePaddle-MobileFaceNets
基于Pytorch实现的人脸识别项目：Pytorch-MobileFaceNet
基于PaddlePaddle实现的SSD目标检测模型：PaddlePaddle-SSD
基于Pytorch实现的人脸关键点检测MTCNN模型：Pytorch-MTCNN
基于PaddlePaddle实现的人脸关键点检测MTCNN模型：PaddlePaddle-MTCNN
基于PaddlePaddle实现的文字识别CRNN模型：PaddlePaddle-CRNN
基于PaddlePaddle实现的人流密度CrowdNet模型：PaddlePaddle-CrowdNet
基于MXNET实现的年龄性别识别项目：Age-Gender-MXNET
使用Tensorflow Lite、Paddle Lite、MNN、TNN框架在Android上不是图像分类模型：ClassificationForAndroid
基于PaddlePaddle实现的PP-YOLOE模型：PP-YOLOE
在Android部署的人脸检测、口罩识别、关键检测模型：FaceKeyPointsMask
在Android上部署语义分割模型实现换人物背景：ChangeHumanBackground
使用Tensorflow实现的人脸识别项目：Tensorflow-FaceRecognition

系列教程

PaddlePaddle V2版本系列教程：LearnPaddle
PaddlePaddle Fluid版本系列教程：LearnPaddle2

书籍源码

《PaddlePaddle从入门到实战》源码：PaddlePaddleCourse
《深度学习应用实战之PaddlePaddle》源码：BookSource

github contribution grid snake animation

paddlepaddle-deepspeech's People

Stargazers

Watchers

Forkers

laugha gaozheyuan 4colors maxenergy hehaoq weimingtom cxzhou007 wuchaowei2012 hommmm gt-acerzhang zhangrong-mz zcswdt org-mars yinghuochongxiaoq shixiangbupt 1364468984qqcom cqutwanghong ceasarlee panzuanxin jasonzhang-jx coderboy24x7 yyht cocobar cxapython undercontroller ntzzc scorpiokay zhzhuangxue asyncgo zhangyifei1 yueyedeai zelda3721 fangrn davidhefan tim-chen-code bitdaocao sky8652 chinayiqun sodawater05 jianchi2001 xiaohuochai123 litianw yxun6966 johndoe117 lansefangzhou coderchuan winderwl a-new-eruption justwdh littlestone0806 shuiniu86 kerwinchina cy5211 ground-truth xtly2012 bonima123 maggic303 juie destinyming hongsiyu zoand xiahongjin rookie-j ly03240921 imogenqi leo812993 caixiong110 chenhaohan88 flash-lw saxh ekicham thinkinchaos alanlv jiahong3837 perfyperfect nlp-chnproject 2259798112 convect-bot lianglili shadofung daish98 fanhuafeng errolyan anonymouslycn ohki-ki lonelyxmas wanghaisheng hangzhou1 lty628 wtszu bluep0int hailangzz yangboz maxuanjun tommy13579 guoxiongfei skinny-joey sqjkl normonisping dannyneo

paddlepaddle-deepspeech's Issues

export FLAGS_sync_nccl_allreduce=0的意义

export FLAGS_sync_nccl_allreduce=0
这行代码是训练前先在命令行里执行么？意义是什么呢？

ValueError: Failed to parse the augmentation config json: [Errno 2] No such file or directory: 'dataset/manifest.noise'

ValueError: Failed to parse the augmentation config json: [Errno 2] No such file or directory: 'dataset/manifest.noise'这个怎么搞

python tools/tune.py卡了很长时间不动

如题，运行程序后，卡在
finish initing model from pretrained params from .checkpoints/step_final不动，卡了很长时间，这是正常的吗

声道数、采样率、模型

1、请问作者声道数和采样率对识别结果影响大吗，它们分别是多少的时候识别效果最好呢？
2、作者还有用更大的数据集训练出来的模型么？现在的这几个发布的模型感觉在实际的对话录音场景中，效果还是不够理想额。。。
非常感谢

如何在您这个项目中使用官百度提供英语模型，进行英语识别

我下载百度提供的英语模型，直接替换您项目中相关的路径似乎不能运行，希望大神指点一下

您好开发者

您好，我是百度飞桨运营，看了您的项目觉得很优秀，希望能与您取得联系，请问可以加一下我的微信（paddlehelp）备注飞桨开发者么？
期待您的回复~

想拿作者训练好的模型只跑测试，可以不用装docker吗，需要安装哪些依赖库？

想作为一个工具来识别音频输出文本

关于预测接口问题

您好，
1.输入的音频数据都需要转成wav后才能喂到接口里吗？目前接口支持哪个格式的语音数据
2.接口有没有转换采样频率的逻辑，还是说必须调整好采用频率后才能给到接口里
谢谢

FileNotFoundError: [Errno 2] No such file or directory: './dataset/mean_std.npz'

作用python3 eval.py --model_path=./models/step_final/报错,FileNotFoundError: [Errno 2] No such file or directory: './dataset/mean_std.npz'

3070的卡可以正常训练，但预测遇到以下问题

由于30系列只支持CUDA-11.0以上，我换掉了CUDA-10.0和cudnn7.3到cuda11.3和cudnn8.2
paddlepaddle也换到2.1.0的版本
models.py
加入
import paddle
paddle.enable.static()

python3 train.py 正常训练，正常保存模型

但是 infer_path.py 加入了
import paddle
paddle.enable.static()
会出现以下错误：

python3 infer_path.py --model_path=./models/epoch_0/ --wav_path=./dataset/test.wav

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
W0520 03:02:36.909706 16902 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.3, Runtime API Version: 11.2
W0520 03:02:36.911597 16902 device_context.cc:422] device: 0, cuDNN Version: 8.2
> 成功加载了预训练模型：./models/epoch_0/
[INFO 2021-05-20 03:02:40,163 model.py:524] begin to initialize the external scorer for decoding
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  kenlm/lm/vocab.cc:43 in void lm::ngram::{anonymous}::ReadWords(int, lm::EnumerateVocab*, lm::WordIndex, uint64_t) threw FormatLoadException because `memcmp(check_unk, "<unk>", 6)'.
Vocabulary words are in the wrong place.  This could be because the binary file was built with stale gcc and old kenlm.  Stale gcc, including the gcc distributed with RedHat and OS X, has a bug that ignores pragma pack for template-dependent types.  New kenlm works around this, so you'll save memory but have to rebuild any binary files using the probing data structure.

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   Scorer::Scorer(double, double, std::string const&, std::vector<std::string, std::allocator<std::string > > const&)
1   Scorer::setup(std::string const&, std::vector<std::string, std::allocator<std::string > > const&)
2   Scorer::load_lm(std::string const&)
3   lm::ngram::LoadVirtual(char const*, lm::ngram::Config const&, lm::ngram::ModelType)
4   lm::ngram::detail::GenericModel<lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>, lm::ngram::ProbingVocabulary>::GenericModel(char const*, lm::ngram::Config const&)
5   lm::ngram::ProbingVocabulary::LoadedBinary(bool, int, lm::EnumerateVocab*, unsigned long)
6   paddle::framework::SignalHandle(char const*, int)
7   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

----------------------
Error Message Summary:
----------------------
FatalError: `Process abort signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1621479760 (unix time) try "date -d @1621479760" if you are using GNU date ***]
  [SignalInfo: *** SIGABRT (@0x4206) received by PID 16902 (TID 0x7f254b481700) from PID 16902 ***]

将ctc_decoders单独出来编译

将ctc_decoders单独出来编译是指在decoder文件夹下运行setup.sh文件吗

document so large,else ways to pull container

docker pull is so large have else to download docker container?

音频长度大于15

目前手里的数据集音频长度都是大于15s的，batch_size设置成16的话就爆显存，然后就设置了batch_size=8,结果就碰到了loss=nan问题，想知道有方法能对长音频进行分割吗，现在几乎无法训练了

python2 中文编码问题

UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-31: ordinal not in range(128)

超大规模预训练模型和官方的vocab不匹配。

你好，官方的vocab和mean_std和这个预训练不匹配，能不能提供预制配套的vocab和mean_std

如果测试音频有背景音乐等噪音，效果好像变差了很多，有办法解决吗

代码内有类似人声提取等预处理方法吗，或者有类似方法推荐吗，谢谢

推理出错

你好，我运行infer_path的时候，显示Warning: The pretrained params do not exist.程序就运行结束了，
参数如下：

请问怎么解决呢

tune

tune.py中
add_arg('model_path', str, './models/srep_final', 应为step_final
然后
PYTHONPATH=.:$PYTHONPATH CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/tune.py 只有1个GPU起作用，这个会是哪里问题？

win10可以运行吗？

我想请问win10可以运行吗，ModuleNotFoundError: No module named 'swig_decoders'，我运行的时候这个包安装不上

请问在线试用网站的功能没有开启吗？

我上传了一个音频，想看看效果怎么样，返回了{'error': 1001, 'msg': '这是测试！'}

预训练模型

请问夜雨飘零大神有预训练模型吗，试了下paddle官方的预训练模型效果不太理想。

运行sudo sh setup.sh错误

Running setup.py install for llvmlite ... error

RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
求大神指点这个错误应该怎么解决。
是不是要安装LLVM这个工具啊

你好，非常不好意思打扰你。我想搭建中文识别、说话人识别、声音场景分类这三个的功能项目。然后我找了一些开源库。但因为我之前是一直在开发的方向上走，没有接触过语音这一块。想咨询你几个问题：
1.我使用了官方paddlepaddle-DeepSpeech开源代码搭建了一个baidu1.2k模型的中文语音在线识别系统，但识别效果不是太好。想咨询下你的这个工程在使用baidu1.2k模型和自训练模型的效果如何？或者可以帮忙推荐一个代码和模型都开源的效果比较好的中文识别项目么？
2.声纹识别我也找了一个开源库搭建起来了，但还没有用数据集去测试。想咨询下您基于paddlepaddle的声纹识别的项目效果如何？
3.我看了你其他的两个基于paddlepaddle和TensorFlow的用urbansound8k搭建的声音场景分类工程，想咨询这个的效果怎么样？

非常抱歉打扰您~非常希望你能拨冗给予我一点帮助，谢谢

多卡环境

您好，按照您的教程我拉取镜像安装完成后，
import paddle
paddle.fluid.install_check.run_check()
报错了,log 如下
///////////////////////////////////////////////////////
Running Verify Fluid Program ...
W0318 01:57:14.200362 29 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 11.0, Runtime API Version: 10.0
W0318 01:57:14.200688 29 device_context.cc:260] device: 0, cuDNN Version: 7.6.
Your Paddle Fluid works well on SINGLE GPU or CPU.
/usr/local/python3.5.1/lib/python3.5/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
WARNING:root:Your Paddle Fluid has some problem with multiple GPU. This may be caused by:

There is only 1 or 0 GPU visible on your Device;
No.1 or No.2 GPU or both of them are occupied now
Wrong installation of NVIDIA-NCCL2, please follow instruction on https://github.com/NVIDIA/nccl-tests
to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
Original Error is:

C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, ncclUniqueId*, unsigned long, unsigned long)
3 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<std::string, std::allocatorstd::string > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocatorpaddle::framework::Scope* > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)

Error Message Summary:
ExternalError: Nccl error, unhandled system error at (/paddle/paddle/fluid/platform/nccl_helper.h:114)

Your Paddle Fluid is installed successfully ONLY for SINGLE GPU or CPU!
Let's start deep Learning with Paddle Fluid now
////////////////////////////////////////////////////////////////////
我打算用多卡训练，现在只能用一块卡，想咨询一下您训练的时候有没有用多卡。

这个项目可以部署在windows上吗

模型问题

官方提供的模型和自训练的哪个效果更好啊（微云上的东西下载很慢。。。）

infer_path

您好，我想直接用你提供的模型对我的音频进行ASR预测(https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm)
那么我应该使用那个数据集对应的mean_std.npz和zh_vocab.txt呢。
请问您是否有已经处理好的这两个文件，如果可以希望可以共享一下。[email protected]

docker 环境问题

您好，我使用您提供配置docker的教程，但遇到：

E1210 12:48:08.426810 4204 pybind.cc:1261] Cannot use GPU because there is no GPU detected on your machine.

使用gpu推理错误

版本、环境信息：
1）PaddlePaddle版本：Paddlepaddle2.0.2-gpu（直接安装官方编译好的whl）
2）系统环境：NVIDIA Jetson AGX Xavier JetPack4.3 Ubuntu18.04、Python3.6.9,cuda10.0,cudnn7.6,
你好，当我将'use_gpu', 设为True的时候会出现如下错误

将其改为False可以正常运行，只是时间会慢很多，
您是否遇到过这样的问题？

无法打开麦克风。异常信息:18

访问localhost:5002 ，点击录音按钮会报错：无法打开麦克风。异常信息:18。请问是什么原因？

dataloader能不能多进程加载数据啊

能像PPASR一样加num_worker多进程加载data吗

模型加载问题

如图，卡在这一步

请问如何接上次训练的模型训练

FileNotFoundError: [Errno 2] No such file or directory: './dataset/manifest.test'

这个怎么搞

多卡速度问题

您好，我在训练过程中发现，一个batch,用两块儿卡比一块卡还要慢。单卡跑一个batch 要3分钟，双卡跑一个batch用了8分钟。

AISHELL模型和1300小时模型对比

你好，我下载了AISHELL模型和1300小时模型这两个模型，我下载了一段普通话片段，AISHELL模型效果会更好；如果是data_thchs30里面的数据集，是1300小时的模型更好一点，请问，你测试的时候也是这样吗

离线配置环境问题

请问公司的离线服务器无法直接运行decoder中的setup.sh，该怎么安装ctc_decoders
现在运行主程序报错是：
modulenotfounderror: no module named'ctc_decoders'
非常感谢！

大佬，加载自己之前的预训练模型报错怎么办呢？

复杂度问题

Error Message Summary:

InvalidArgumentError: Cannot parse tensor desc
[Hint: Expected desc.ParseFromArray(buf.get(), size) == true, but received desc.ParseFromArray(buf.get(), size):0 != true:1.] at (/paddle/paddle/fluid/framework/tensor_util.cu:527)
[operator < load_combine > error]

运行setup.sh报错

Could not find a version that satisfies the requirement numba==0.52.0 (from versions: 0.1, 0.2, 0.3, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.12.2, 0.13.0, 0.13.2, 0.13.3, 0.13.4, 0.14.0, 0.15.1, 0.16.0, 0.17.0, 0.18.1, 0.18.2, 0.19.1, 0.19.2, 0.20.0, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.23.1, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.1, 0.29.0, 0.30.0, 0.30.1, 0.31.0, 0.32.0, 0.33.0, 0.34.0, 0.35.0, 0.36.1, 0.36.2, 0.37.0, 0.38.0, 0.38.1, 0.39.0, 0.40.0, 0.40.1, 0.41.0, 0.42.0, 0.42.1, 0.43.0, 0.43.1, 0.44.0, 0.44.1, 0.45.0, 0.45.1, 0.46.0, 0.47.0)
No matching distribution found for numba==0.52.0

请问下numba其他版本影响运行吗？

超参数设置问题

@yeyupiaoling 你好，我在跑您的代码，数据集采用的是"MAGICDATA Mandarin Chinese Read Speech Corpus"，训练集大概有570000条语音。我的机器是单机RTX2080Ti，我的batch_size设为16，请问如何设置对应的学习率呢，训练太慢了，没法自己去一点点调参。感谢