open-speech / speech-aligner Goto Github PK

speech-aligner，是一个从“人声语音”及其“语言文本”，产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription

License: Other

CMake 0.28% Shell 2.63% C++ 67.34% C 24.65% Makefile 3.89% M4 0.08% Batchfile 0.01% Python 1.12%

speech kaldi cpp

speech-aligner's Introduction

speech-aligner

Chinese readme：

speech-aligner，是一个从“人声语音”及其“语言文本”，产生音素级别时间对齐标注的工具。

示例

# 调用 bin，输入语音列表和文本、输出对齐结果
cd egs/cn_phn
speech-aligner --config=conf/align.conf data/wav.scp data/text data/out.ali
# 查看输出对齐结果，包含: 文件名，音素时间起点(秒) 音素时间终点(秒) 音素
cat data/text data/out.ali
BAC009S0002W0122 而对楼市成交抑制作用最大的限购
BAC009S0002W0122
0.000 0.535 sil
0.535 0.540 $0
0.540 0.745 er_2
0.745 0.850 d
0.850 0.895 ui_4
0.895 1.305 l
1.305 1.435 ou_2
...
4.955 5.055 x
5.055 5.525 ian_4
5.525 5.745 g
5.745 5.930 ou_4
5.930 5.975 sil
.

编译

预先准备：
- cmake >= 3.1
- 有如下blas接口数学库之一：
  - 建议：mkl
    - 安装 conda，并通过conda安装mkl：conda install mkl（mkl默认会随conda一起安装）
    - 编译时，确保conda可执行（which conda有输出）
  - atlas
    - ubuntu安装: sudo apt-get install libatlas3-base
    - linux发行版众多，数学库路径不一且变动，所以可以通过如下命令进行路径指定：
    - cmake -DBLAS_VENDORS=ATLAS -DBLAS_ATLAS_LIB_DIRS=[/path/to/atlas/lib ..
  - OSX系统（Darwin）自带Accelerate framework，可调过这项
  - …其他数学库，可查看cmake/Modules/FindBLAS.cmake，了解支持的数学库

cmake编译

git clone .../speech-aligner.git
cd speech-aligner
mkdir build && cd build
cmake ..
make -j

编译结果
- bin/speech-aligner: 二进制可执行文件，典型调用见egs/cn_phn/run.sh，包括三个参数：
  - 配置：支持通过配置文件和命令行读取参数，建议使用如--config=egs/cn_phn/conf/align.conf
  - 输入：音频列表、对应的文本列表
  - 输出：音素时间对齐标注

应用场景和示例

研究：
- 为TTS产生音素时间标注的训练数据
  - egs/cn_phn
工程：
- 歌词对齐
  - egs/cn_lyric [todo]
- 字幕对齐
  - egs/cn_subtitle [todo]
for fun:
- 鬼畜
  - egs/cn_gc [todo]

更新

增加支持中文拼音（带调）输入，见egs/cn_phn/data/text

Todo

中文环境：标点和英文的处理
增加更多示例

关于

该工程基于著名语音开源项目kaldi，copyright遵循原项目。
示例egs/cn_phn中，使用的音素列表，来自另一个中文词典开源项目DaCiDian。

English readme：

speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription

Usage example

# call the bin，with speech and transcript as inputs
./bin/speech-aligner --config=egs/cn_phn/conf/align.conf egs/cn_phn/data/wav.scp egs/cn_phn/data/text egs/cn_phn/data/out.ali
# check the output alignment, include: filename, phoneme and its start/end time
cat egs/cn_phn/data/text egs/cn_phn/data/out.ali
BAC009S0002W0123
0.000 0.025 y
0.025 0.460 e_3
0.460 0.850 sil
0.850 0.985 ch
0.985 1.095 eng_2
...
2.655 2.735 zh
2.735 2.900 ong_1
2.900 2.960 d
2.960 3.665 ing_1
3.665 3.845 sil
.

Compile

requirements
- cmake >= 3.1
- one of blas math lib:
  - mkl (recommended)
    - install conda, and use it to install mkl: conda install mkl (mkl is installed with conda by default)
    - when cmake, conda should be in your path
  - atlas
    - ubuntu: sudo apt-get install libatlas3-base
    - when cmake, it maynot find your atlas automatically, thus you need set the math lib path as below:
    - cmake -DBLAS_VENDORS=ATLAS -DBLAS_ATLAS_LIB_DIRS=[/path/to/atlas/lib ..
  - Accelerate framework (need do nothing for "macOS/Darwin")
  - ...

cmake

git clone .../speech-aligner.git
cd speech-aligner
mkdir build && cd build
cmake ..
make -j

results
- bin/speech-aligner: a binary executable file, with arguments:
  - configuration: through config file (recommendation, e.g.: --config=egs/cn_phn/conf/align.conf) or command line
  - inputs: the wav list and the correspoing transcription list (e.g. egs/cn_phn/data )
  - output: the result alignment

Applications

for research:
- generate training data for TTS
  - egs/cn_phn: generate chinese phoneme alignment
for engineering:
- align lyric
  - egs/cn_lyric [todo]
- align subtitle
  - egs/cn_subtitle[todo]
for fun:
- きちく
  - egs/cn_gc [todo]

About

This project is based on a great speech open-source project kaldi.
The phonemes used in the environment: egs/cn_phn, come from a chinese dictionary open-source project DaCiDian.

speech-aligner's People

Contributors

Stargazers

Watchers

Forkers

huguanglong megazone87 xzm2004260 toannhu errolyan agangzz voicevio jamess010 shartoo whaozl shiweipku bobosui sundy1219 brucexwang lql0716 linzai1992 templeblock dachengai allensmile xinkez colingogo zhongxingpeng ilvcyy htwmedia auzxb ltcxjtu hongwen-sun niucheney makinglong cc8848 deryy adamchau superhg2012 lyulianghui aixingxy junshipeng caozhengquan appalachianwine handexin1 kingstorm ntzzc liangtianxin xiongmaoxia betterwgo copperdong barryzm ishine chenz85 zth9730 dwtcourses awoziji chentutu jinsongpan tybian zhaohengli whitewolfkings normonisping canvsleo wyp19930313 possichen ouc-lan yuchiwang gavinljj assassindesign windowxiaoming dongsig hommmm michaeljayw super-alex zhangxincheng coolwind8214 jsyzc2019 aidreamwin cdevelop canyouimagine liyc1968 codybai maxmax2016 shaun95 qcj1206 monicaarnaud barrykcl zhu-gu-an road2018 runngezhang-jx shanshu1015 trutestida markyouyuren klei22

speech-aligner's Issues

doesn't basic kaldi binaries for alignment do this anyway?

just curious what makes this different

支持什么场景的音频

您好，这个模型是用什么数据训练的？我需要处理手机APP 16K的数据，性能会好吗？

Failed to load data

I am trying to train my data。But I meet a problems:

WARNING (speech-aligner[5.4.2154-f2b7]:EnsureObjectLoaded():util/kaldi-table-inl.h:310) Failed to open file data/wav/IC0001W0001.wav
ERROR (speech-aligner[5.4.2154-f2b7]:Value():util/kaldi-table-inl.h:164) Failed to load object from data/wav/IC0001W0001.wav (to suppress this error, add the permissive (p, ) option to the rspecifier.
How to slove this probems? The data format is same as e

xamples.

ERROR (speech-aligner[5.4.215~4-f2b7]:Input():util/kaldi-io.cc:756) Error opening input stream res/tree

I meet a error as follows:
ERROR (speech-aligner[5.4.215~4-f2b7]:Input():util/kaldi-io.cc:756) Error opening input stream res/tree

现在支持纯英文和中英文吗

为何自己的音频总是报sample fre error

命令$ ./bin/speech-aligner --acoustic-scale=0.01 --careful=true --sample-frequency=48000 --config=egs/cn_phn/conf/align.conf egs/cn_phn/data1/wav.scp egs/cn_phn/data1/text egs/cn_phn/data/out1.ali
报错
./bin/speech-aligner --acoustic-scale=0.01 --careful=true --sample-frequency=48000 --config=egs/cn_phn/conf/align.conf egs/cn_phn/data1/wav.scp egs/cn_phn/data1/text egs/cn_phn/data/out1.ali
LOG (speech-aligner[5.4.2154-f2b7]:main():bin/speech-aligner.cc:351) zhuni
ERROR (speech-aligner[5.4.2154-f2b7]:main():bin/speech-aligner.cc:425) Sample frequency mismatch: you specified 16000 but data has 48000 (use --sample-frequency option). Utterance is zhuni

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
main
__libc_start_main
_start

make错误

请问make -j的时候出现如系错误：
/usr/bin/ld: CMakeFiles/speech-aligner.dir/src/matrix/kaldi-matrix.cc.o: undefined reference to symbol 'cblas_ssyrk'
//usr/lib/x86_64-linux-gnu/libblas.so.3: 无法添加符号: DSO missing from command line
collect2: error: ld returned 1 exit status
CMakeFiles/speech-aligner.dir/build.make:1636: recipe for target '../bin/speech-aligner' failed
make[2]: *** [../bin/speech-aligner] Error 1
CMakeFiles/Makefile2:150: recipe for target 'CMakeFiles/speech-aligner.dir/all' failed
make[1]: *** [CMakeFiles/speech-aligner.dir/all] Error 2
Makefile:148: recipe for target 'all' failed
make: *** [all] Error 2
怎么解？

Cannot change the sampling rate

当我更改配置文件 speech_aligner/egs/data/conf/align.conf 中的--sample-frequency项时，似乎不起作用。是因为目前只支持16000的采样率吗？

Did not successfully decode file BAC009S0002W0125, len = 629

这是个什么问题？

怎么支持其他模型？

ERROR (speech-aligner[5.4.215~4-f2b7]:LogLikelihoodZeroBased():gmm/decodable-am-diag-gmm.cc:50) Dim mismatch: data dim = 48 vs. model dim = 39

特征怎么改成Deltas + Delta-Deltas形式？
3*13=39

能否把输入参数改为音素

如题，输入参数是拼音，但是这个软件好像只支持汉字，所以还得强行把拼音转成汉字。有办法直接输入拼音吗？

改进编译指南

Ubuntu 16.04 下，正确的依赖包应该是 libatlas-base-dev，然后直接执行 cmake .. 即可，不需要其它参数

样例对的不齐

你好，我之前也做过对齐，不过没你做的这么友好，我跑通了你的样例，用praat打开看了一下，发现BAC009S0002W0123，开头静音以及ye3对的一点都不齐，请问你现在还有在研究吗？有什么好的方法，保证可以对的比较齐呢？谢谢

怎么更改采样率

相比Montreal-Forced-Aligner有什么优点吗？

Montreal-Forced-Aligner:
https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner

我常常使用这个训练模型并对齐

cmake 版本需要3.1.0以上

cmake 版本需要3.1以上

what's the algorithm for speech-aligner?

cmake error

-- FindBLAS: Searching for MKL mkl_rt - BLAS_MKL_mkl_rt_LIBRARY-NOTFOUND
-- FindBLAS: Searching for ATLAS BLAS
-- FindBLAS: Searching for ATLAS f77blas - BLAS_ATLAS_f77blas_LIBRARY-NOTFOUND
-- FindBLAS: Searching for OpenBLAS
-- FindBLAS: Searching for OPEN openblas - BLAS_OPEN_openblas_LIBRARY-NOTFOUND
-- FindBLAS: Searching for AMD ACML
-- FindBLAS: Searching for IBM ESSL (int64)
-- FindBLAS: Searching for ESSL essl6464 - BLAS_ESSL_essl6464_LIBRARY-NOTFOUND
-- FindBLAS: Searching for GotoBLAS2
-- FindBLAS: Searching for GOTO goto2 - BLAS_GOTO_goto2_LIBRARY-NOTFOUND
-- FindBLAS: Searching for SGI SCSL (int64)
-- FindBLAS: Searching for SCSL scs_i8 - BLAS_SCSL_scs_i8_LIBRARY-NOTFOUND
-- FindBLAS: Searching for Sun PerfLib
-- FindBLAS: Searching for SUNPERF sunperf - BLAS_SUNPERF_sunperf_LIBRARY-NOTFOUND
-- FindBLAS: Searching for VECLIB (int64)
-- FindBLAS: Searching for VECLIB veclib8 - BLAS_VECLIB_veclib8_LIBRARY-NOTFOUND
-- FindBLAS: Searching for generic BLAS
-- FindBLAS: Searching for GENERIC blas - /usr/lib/x86_64-linux-gnu/libblas.so
-- FindBLAS: BLAS vendors found: GENERIC
-- FindBLAS: BLAS vendor selected: GENERIC
using blas lib: /usr/lib/x86_64-linux-gnu/libblas.so
-- Found a library with BLAS API (GENERIC).
CMake Error at CMakeLists.txt:32 (message):
Cannot find atlas or clapack, CMake will exit.

-- Configuring incomplete, errors occurred!
See also "/media/hacker/ErrolYan/00Github/A6HY_voice_V1.2/speech-aligner/CMakeFiles/CMakeOutput.log".
See also "/media/hacker/ErrolYan/00Github/A6HY_voice_V1.2/speech-aligner/CMakeFiles/CMakeError.log".

LOG (speech-aligner[5.4.215~4-f2b7]:main():bin/speech-aligner.cc:430) file

大师您好，请问这个是什么什么原因导致的？

cannot find blass

我在ubuntu20.04下使用cmkae3.23.2进行编译，出现这样的报错，应该是找不到blas，请问怎么解决呢
-- FindBLAS: BLAS vendors found:
-- FindBLAS: BLAS library not found
CMake Error at CMakeLists.txt:35 (message):
Cannot find blas, CMake will exit.
-- Configuring incomplete, errors occurred!