GithubHelp home page GithubHelp logo

open-speech / speech-aligner Goto Github PK

View Code? Open in Web Editor NEW
392.0 15.0 103.0 41.16 MB

speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription

License: Other

CMake 0.28% Shell 2.63% C++ 67.34% C 24.65% Makefile 3.89% M4 0.08% Batchfile 0.01% Python 1.12%
speech kaldi cpp

speech-aligner's Introduction

speech-aligner

Chinese readme:

speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。

示例

# 调用 bin,输入语音列表和文本、输出对齐结果
cd egs/cn_phn
speech-aligner --config=conf/align.conf data/wav.scp data/text data/out.ali
# 查看输出对齐结果,包含: 文件名,音素时间起点(秒) 音素时间终点(秒) 音素
cat data/text data/out.ali
BAC009S0002W0122 而对楼市成交抑制作用最大的限购
BAC009S0002W0122
0.000 0.535 sil
0.535 0.540 $0
0.540 0.745 er_2
0.745 0.850 d
0.850 0.895 ui_4
0.895 1.305 l
1.305 1.435 ou_2
...
4.955 5.055 x
5.055 5.525 ian_4
5.525 5.745 g
5.745 5.930 ou_4
5.930 5.975 sil
.

编译

  • 预先准备:

    • cmake >= 3.1

    • 有如下blas接口数学库之一:

      • 建议:mkl

        • 安装 conda,并通过conda安装mkl:conda install mkl(mkl默认会随conda一起安装)
        • 编译时,确保conda可执行(which conda有输出)
      • atlas

        • ubuntu安装: sudo apt-get install libatlas3-base

        • linux发行版众多,数学库路径不一且变动,所以可以通过如下命令进行路径指定:

        • cmake -DBLAS_VENDORS=ATLAS -DBLAS_ATLAS_LIB_DIRS=[/path/to/atlas/lib ..
      • OSX系统(Darwin)自带Accelerate framework,可调过这项

      • …其他数学库,可查看cmake/Modules/FindBLAS.cmake,了解支持的数学库

  • cmake编译

    git clone .../speech-aligner.git
    cd speech-aligner
    mkdir build && cd build
    cmake ..
    make -j
  • 编译结果

    • bin/speech-aligner: 二进制可执行文件,典型调用见egs/cn_phn/run.sh,包括三个参数:
      • 配置:支持通过配置文件和命令行读取参数,建议使用如--config=egs/cn_phn/conf/align.conf
      • 输入:音频列表、对应的文本列表
      • 输出:音素时间对齐标注

应用场景和示例

  • 研究:
    • 为TTS产生音素时间标注的训练数据
      • egs/cn_phn
  • 工程:
    • 歌词对齐
      • egs/cn_lyric [todo]
    • 字幕对齐
      • egs/cn_subtitle [todo]
  • for fun:
    • 鬼畜
      • egs/cn_gc [todo]

更新

  • 增加支持中文拼音(带调)输入,见egs/cn_phn/data/text

Todo

  • 中文环境:标点和英文的处理
  • 增加更多示例

关于

  • 该工程基于著名语音开源项目kaldi,copyright遵循原项目。
  • 示例egs/cn_phn中,使用的音素列表,来自另一个中文词典开源项目DaCiDian

English readme:

speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription

Usage example

# call the bin,with speech and transcript as inputs
./bin/speech-aligner --config=egs/cn_phn/conf/align.conf egs/cn_phn/data/wav.scp egs/cn_phn/data/text egs/cn_phn/data/out.ali
# check the output alignment, include: filename, phoneme and its start/end time
cat egs/cn_phn/data/text egs/cn_phn/data/out.ali
BAC009S0002W0123
0.000 0.025 y
0.025 0.460 e_3
0.460 0.850 sil
0.850 0.985 ch
0.985 1.095 eng_2
...
2.655 2.735 zh
2.735 2.900 ong_1
2.900 2.960 d
2.960 3.665 ing_1
3.665 3.845 sil
.

Compile

  • requirements

    • cmake >= 3.1

    • one of blas math lib:

      • mkl (recommended)

        • install conda, and use it to install mkl: conda install mkl (mkl is installed with conda by default)
        • when cmake, conda should be in your path
      • atlas

        • ubuntu: sudo apt-get install libatlas3-base

        • when cmake, it maynot find your atlas automatically, thus you need set the math lib path as below:

        • cmake -DBLAS_VENDORS=ATLAS -DBLAS_ATLAS_LIB_DIRS=[/path/to/atlas/lib ..
      • Accelerate framework (need do nothing for "macOS/Darwin")

      • ...

  • cmake

    git clone .../speech-aligner.git
    cd speech-aligner
    mkdir build && cd build
    cmake ..
    make -j
  • results

    • bin/speech-aligner: a binary executable file, with arguments:
      • configuration: through config file (recommendation, e.g.: --config=egs/cn_phn/conf/align.conf) or command line
      • inputs: the wav list and the correspoing transcription list (e.g. egs/cn_phn/data )
      • output: the result alignment

Applications

  • for research:
    • generate training data for TTS
      • egs/cn_phn: generate chinese phoneme alignment
  • for engineering:
    • align lyric
      • egs/cn_lyric [todo]
    • align subtitle
      • egs/cn_subtitle[todo]
  • for fun:
    • きちく
      • egs/cn_gc [todo]

About

  • This project is based on a great speech open-source project kaldi.
  • The phonemes used in the environment: egs/cn_phn, come from a chinese dictionary open-source project DaCiDian.

speech-aligner's People

Contributors

megazone87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech-aligner's Issues

支持什么场景的音频

您好,这个模型是用什么数据训练的?我需要处理手机APP 16K的数据,性能会好吗?

Failed to load data

I am trying to train my data。But I meet a problems:

WARNING (speech-aligner[5.4.2154-f2b7]:EnsureObjectLoaded():util/kaldi-table-inl.h:310) Failed to open file data/wav/IC0001W0001.wav
ERROR (speech-aligner[5.4.215
4-f2b7]:Value():util/kaldi-table-inl.h:164) Failed to load object from data/wav/IC0001W0001.wav (to suppress this error, add the permissive (p, ) option to the rspecifier.
How to slove this probems? The data format is same as e
20190915160531
xamples.

为何自己的音频总是报sample fre error

命令$ ./bin/speech-aligner --acoustic-scale=0.01 --careful=true --sample-frequency=48000 --config=egs/cn_phn/conf/align.conf egs/cn_phn/data1/wav.scp egs/cn_phn/data1/text egs/cn_phn/data/out1.ali
报错
./bin/speech-aligner --acoustic-scale=0.01 --careful=true --sample-frequency=48000 --config=egs/cn_phn/conf/align.conf egs/cn_phn/data1/wav.scp egs/cn_phn/data1/text egs/cn_phn/data/out1.ali
LOG (speech-aligner[5.4.2154-f2b7]:main():bin/speech-aligner.cc:351) zhuni
ERROR (speech-aligner[5.4.215
4-f2b7]:main():bin/speech-aligner.cc:425) Sample frequency mismatch: you specified 16000 but data has 48000 (use --sample-frequency option). Utterance is zhuni

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
main
__libc_start_main
_start

make错误

请问make -j的时候出现如系错误:
/usr/bin/ld: CMakeFiles/speech-aligner.dir/src/matrix/kaldi-matrix.cc.o: undefined reference to symbol 'cblas_ssyrk'
//usr/lib/x86_64-linux-gnu/libblas.so.3: 无法添加符号: DSO missing from command line
collect2: error: ld returned 1 exit status
CMakeFiles/speech-aligner.dir/build.make:1636: recipe for target '../bin/speech-aligner' failed
make[2]: *** [../bin/speech-aligner] Error 1
CMakeFiles/Makefile2:150: recipe for target 'CMakeFiles/speech-aligner.dir/all' failed
make[1]: *** [CMakeFiles/speech-aligner.dir/all] Error 2
Makefile:148: recipe for target 'all' failed
make: *** [all] Error 2
怎么解?

Cannot change the sampling rate

当我更改配置文件 speech_aligner/egs/data/conf/align.conf 中的--sample-frequency项时,似乎不起作用。是因为目前只支持16000的采样率吗?

怎么支持其他模型?

ERROR (speech-aligner[5.4.215~4-f2b7]:LogLikelihoodZeroBased():gmm/decodable-am-diag-gmm.cc:50) Dim mismatch: data dim = 48 vs. model dim = 39

特征怎么改成Deltas + Delta-Deltas形式?
3*13=39

能否把输入参数改为音素

如题,输入参数是拼音,但是这个软件好像只支持汉字,所以还得强行把拼音转成汉字。有办法直接输入拼音吗?

改进编译指南

Ubuntu 16.04 下,正确的依赖包应该是 libatlas-base-dev,然后直接执行 cmake .. 即可,不需要其它参数

样例对的不齐

你好,我之前也做过对齐,不过没你做的这么友好,我跑通了你的样例,用praat打开看了一下,发现BAC009S0002W0123,开头静音以及ye3对的一点都不齐,请问你现在还有在研究吗?有什么好的方法,保证可以对的比较齐呢?谢谢

cmake error

-- FindBLAS: Searching for MKL mkl_rt - BLAS_MKL_mkl_rt_LIBRARY-NOTFOUND
-- FindBLAS: Searching for ATLAS BLAS
-- FindBLAS: Searching for ATLAS f77blas - BLAS_ATLAS_f77blas_LIBRARY-NOTFOUND
-- FindBLAS: Searching for OpenBLAS
-- FindBLAS: Searching for OPEN openblas - BLAS_OPEN_openblas_LIBRARY-NOTFOUND
-- FindBLAS: Searching for AMD ACML
-- FindBLAS: Searching for IBM ESSL (int64)
-- FindBLAS: Searching for ESSL essl6464 - BLAS_ESSL_essl6464_LIBRARY-NOTFOUND
-- FindBLAS: Searching for GotoBLAS2
-- FindBLAS: Searching for GOTO goto2 - BLAS_GOTO_goto2_LIBRARY-NOTFOUND
-- FindBLAS: Searching for SGI SCSL (int64)
-- FindBLAS: Searching for SCSL scs_i8 - BLAS_SCSL_scs_i8_LIBRARY-NOTFOUND
-- FindBLAS: Searching for Sun PerfLib
-- FindBLAS: Searching for SUNPERF sunperf - BLAS_SUNPERF_sunperf_LIBRARY-NOTFOUND
-- FindBLAS: Searching for VECLIB (int64)
-- FindBLAS: Searching for VECLIB veclib8 - BLAS_VECLIB_veclib8_LIBRARY-NOTFOUND
-- FindBLAS: Searching for generic BLAS
-- FindBLAS: Searching for GENERIC blas - /usr/lib/x86_64-linux-gnu/libblas.so
-- FindBLAS: BLAS vendors found: GENERIC
-- FindBLAS: BLAS vendor selected: GENERIC
using blas lib: /usr/lib/x86_64-linux-gnu/libblas.so
-- Found a library with BLAS API (GENERIC).
CMake Error at CMakeLists.txt:32 (message):
Cannot find atlas or clapack, CMake will exit.

-- Configuring incomplete, errors occurred!
See also "/media/hacker/ErrolYan/00Github/A6HY_voice_V1.2/speech-aligner/CMakeFiles/CMakeOutput.log".
See also "/media/hacker/ErrolYan/00Github/A6HY_voice_V1.2/speech-aligner/CMakeFiles/CMakeError.log".

cannot find blass

我在ubuntu20.04下使用cmkae3.23.2进行编译,出现这样的报错,应该是找不到blas,请问怎么解决呢
-- FindBLAS: BLAS vendors found:
-- FindBLAS: BLAS library not found
CMake Error at CMakeLists.txt:35 (message):
Cannot find blas, CMake will exit.
-- Configuring incomplete, errors occurred!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.