GithubHelp home page GithubHelp logo

huakunyang / summertts Goto Github PK

View Code? Open in Web Editor NEW
341.0 341.0 59.0 5.21 MB

SummerTTS 是一个基于C++的独立编译的中文和英文语音合成项目,可以本地运行不需要网络,而且没有额外的依赖,一键编译完成即可用于中文和英文的语音合成。SummerTTS is a standalone Chinese and English speech synthesis(TTS) project that has almost no dependency and could be easily used for Chinese TTS with just one key build out

CMake 1.41% C++ 88.34% C 1.77% Shell 0.17% HTML 0.07% Cuda 0.57% Fortran 4.87% XSLT 0.02% Python 0.04% JavaScript 0.03% CSS 0.02% Makefile 2.05% Cython 0.63%
cplusplus no-dependencies speech-synthesis standalone tts vits

summertts's People

Contributors

huakunyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

summertts's Issues

How to compile DLL

I attempted to compile using Ubuntu on WSL, I was able to generate .so files. However, I require DLL and LIB files for the Windows platform. If I modify CMAKE_CXX_COMPILER to x86_64-w64-mingw32-g++, it triggers compilation errors. What should I do? Any suggestions? @huakunyang

this is CMakeLists.txt

cmake_minimum_required(VERSION 3.5)

project(tts)

#set(CMAKE_CXX_FLAGS " -O3 -w -std=c++11 ")
#set(CMAKE_C_FLAGS " -O3 -w -std=c++11 ")

set(CMAKE_CXX_FLAGS " -O3 -fopenmp -std=c++11 ")
set(CMAKE_C_FLAGS " -O3 -fopenmp -std=c++11 ")

#set(CMAKE_CXX_COMPILER x86_64-w64-mingw32-g++)
#set(CMAKE_CC_COMPILER x86_64-w64-mingw32-gcc)

add_library(tts
        SHARED
        ${SOURCE_FILES}
        ./test/main.cpp
        ./src/tn/glog/src/demangle.cc
        ./src/tn/glog/src/logging.cc
        ./src/tn/glog/src/raw_logging.cc
        ./src/tn/glog/src/symbolize.cc
        ./src/tn/glog/src/utilities.cc
        ./src/tn/glog/src/vlog_is_on.cc
        ./src/tn/glog/src/signalhandler.cc
        ./src/tn/gflags/src/gflags.cc
        ./src/tn/gflags/src/gflags_reporting.cc
        ./src/tn/gflags/src/gflags_completions.cc
        ./src/tn/openfst/src/lib/compat.cc
        ./src/tn/openfst/src/lib/flags.cc
        ./src/tn/openfst/src/lib/fst.cc
        ./src/tn/openfst/src/lib/fst-types.cc
        ./src/tn/openfst/src/lib/mapped-file.cc
        ./src/tn/openfst/src/lib/properties.cc
        ./src/tn/openfst/src/lib/symbol-table.cc
        ./src/tn/openfst/src/lib/symbol-table-ops.cc
        ./src/tn/openfst/src/lib/util.cc
        ./src/tn/openfst/src/lib/weight.cc
        ./src/tn/processor.cc
        ./src/tn/token_parser.cc
        ./src/tn/utf8_string.cc
        ./src/engipa/EnglishText2Id.cpp
        ./src/engipa/InitIPASymbols.cpp
        ./src/engipa/alphabet.cpp
        ./src/engipa/ipa.cpp
        ./src/hz2py/hanzi2phoneid.cpp
        ./src/hz2py/Hanz2Piny.cpp
        ./src/hz2py/pinyinmap.cpp
        ./src/nn_op/nn_conv1d.cpp
        ./src/nn_op/nn_softmax.cpp
        ./src/nn_op/nn_layer_norm.cpp
        ./src/nn_op/nn_relu.cpp
        ./src/nn_op/nn_gelu.cpp
        ./src/nn_op/nn_tanh.cpp
        ./src/nn_op/nn_flip.cpp
        ./src/nn_op/nn_cumsum.cpp
        ./src/nn_op/nn_softplus.cpp
        ./src/nn_op/nn_clamp_min.cpp
        ./src/nn_op/nn_sigmoid.cpp
        ./src/nn_op/nn_conv1d_transposed.cpp
        ./src/nn_op/nn_leaky_relu.cpp
        ./src/platform/tts_file_io.cpp
        ./src/platform/tts_logger.cpp
        ./src/utils/utils.cpp
        ./src/modules/iStft.cpp
        ./src/modules/hann.cpp
        ./src/modules/attention_encoder.cpp
        ./src/modules/multi_head_attention.cpp
        ./src/modules/ffn.cpp
        ./src/modules/ConvFlow.cpp
        ./src/modules/DDSConv.cpp
        ./src/modules/ElementwiseAffine.cpp
        ./src/modules/random_gen.cpp
        ./src/modules/ResidualCouplingLayer.cpp
        ./src/modules/ResBlock1.cpp
        ./src/modules/WN.cpp
        ./src/modules/pqmf.cpp
        ./src/models/TextEncoder.cpp
        ./src/models/StochasticDurationPredictor.cpp
        ./src/models/FixDurationPredictor.cpp
        ./src/models/DurationPredictor_base.cpp
        ./src/models/ResidualCouplingBlock.cpp
        ./src/models/Generator_base.cpp
        ./src/models/Generator_hifigan.cpp
        ./src/models/Generator_MS.cpp
        ./src/models/Generator_Istft.cpp
        ./src/models/Generator_MBB.cpp
        ./src/models/SynthesizerTrn.cpp)

target_include_directories(tts PUBLIC ./eigen-3.4.0
                                            ./src/tn/header
                                              ./include
                                              ./src/header)

set_target_properties(tts PROPERTIES
    OUTPUT_NAME "tts"
)

段错误

./tts_test ../test.txt ../models/single_speaker_fast.bin out.wav

关于TTS文本的变量

比如有下面这一段话: 您的小型汽车xxxxxx被交通技术监控设备记录,请立即驶离。
x的部分是车牌号,是动态的变量,其余部分是固定不变的。

如果直接转换这句话,会因为长句而导致延时太长。
一般的做法是,固定不变的部分先生成并保存好,动态的部分由TTS即时转换,然后依次播放这三部分的语音文件。
但是这又会导致拼接的部分放音不流畅。

有没有一种可能,将“您的小型汽车”的生成的语音文件保存下来后,同时也将其内部的上下文数据也保存下来,当需要即时生成车牌号语音的时候,就根据这上下文数据进行生成,这样子语音就能流畅起来了。

multi模型设定说话人

简单好用,先点一个赞。
使用multi合成的时候默认生成了十个不同说话人的十条语音,麻烦问一下可以指定aishell3的某一个说话人吗?参数是什么呢,没找到呢

windows compile

@huakunyang
First of all, thank you very much for creating this project

Because the question I asked a few days ago was not effectively resolved (#33) , I did some research on Windows related compilation

Let me show you the results. This is the effect of compiling this project into the game

2024-01-06.18-04-05.mp4

I will mainly talk about the parts that need to be modified in Windows compilation :

  1. Compilation issues with glog and gflags: The compilation is not proceeding smoothly because your project code is compiled using referenced files. However, Windows compilation requires one of the files specific to Windows. This necessitates some adjustments to the compilation code; otherwise, you will encounter the error mentioned in this link: #5 (comment). If you simply use #include "windows/port.h", you will notice similar errors in other files. The final solution is to treat it as a library compilation and then compile the project, linking it with this library.

  2. Compilation errors and crashes caused by Openfst: Skipping the steps of identifying the problem, the fundamental reason is that the MSVC toolchain, which uses the cl compiler, cannot correctly recognize FST types (such as StdVectorFst), leading to crashes. This is because all type definitions are registered through fst-types, but MSVC cannot link smoothly, resulting in crashes. However, if you adjust the compiler to clang-cl, you won't encounter this problem.

  3. Missing definitions:
    3.1. M_PI:

#ifndef M_PI
#define M_PI 3.1415926535897932384626433832795
#endif

3.2. unistd.h, dirent.h, getopt.c, getopt.h: You need to replace some include code in certain files, such as tts_file_io.cpp, etc.
code.zip

After these adjustments, you should be able to successfully compile the project.

=============================================================================
感觉**人很多,再讲讲**话

  1. glog 和 gflags 无法顺利编译:因为你的项目代码中使用的是引用文件的形式进行的编译,但是windows编译需要的是其中一个windows下的文件,这就需要对于编译代码进行一些调整,否则就会出现这个报错#5 (comment) . 如果你单纯的使用# include “windows/port.h” 你会发现其他文件也开始出现相似的报错 :
    最后的解决方法:我将其看作一个库编译,然后在编译项目的时候再连接这个库的lib和头文件从其中获得代码编译成dll就可以了

  2. Openfst 引发的编译报错和崩溃:省略发现问题的步骤,根本原因是因为:MSVC工具链使用的cl编译器无法正确的识别FST类型(如StdVectorFst)导致崩溃,因为所有的类型定义是通过fst-types 注册的,但是msvc无法顺利的link,所以会崩溃。但是:如果你将编译器调整成clang-cl就不会有这个问题!

  3. 缺失定义:

    1. M_PI :
    #ifndef M_PI
    #define M_PI 3.1415926535897932384626433832795
    #endif
    
    1. unistd.h , dirent.h, getopt.c , getopt.h : 你需要替换一些文件的include代码,比如tts_file_io.cpp 等等
      code.zip

然后你应该就能编译成功项目了

英文字母

请问如何增加英文字母读法的支持,比如APP,WIFI

Successfully run on Windows

The command is as follows:

  cmake -G "MinGW Makefiles" ..
  mingw32-make
  tts_test ../test.txt ../models/single_speaker_small.bin out.wav

感谢大佬,我编译成动态库,在海思的环境上,没法使用

我通过makefie 和 交叉编译工具,编译出来的so库 ,版本是 GNU/Linux
ELF 32-bit LSB shared object, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, stripped
自己写的代码,相同的环境,编译出来的动态库,这个版本 GNU/Linux 《===》SYSV) 会有差别,不知为啥
ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, stripped
我的程序,只要链接了这个动态库,程序就无法运行,通过strace 查看进程运行过程,卡死在死锁上。
不连接这个动态库,我的业务都是ok的
因此还是觉得so库的编译环境有点问题。编译过程 定义了一些 -DOS_LINUX -DARCH=arm -D__arm__ -D__ARMCC_VERSION
编译宏,问题依然存在。

看看大佬能否帮忙给看看

请求百度网盘内的文件

注意到百度网盘为获取文件设置了安装其客户端之门槛,所以希望作者(或其他人)能分享放在百度网盘内的文件。

我想 tar --zstd 后发布到本 repo 的 releases 就是很好的文件发布方案。此外 Cloudflare R2 也值得推荐。

Segmentation fault 分段故障,大佬我该怎么办

root@DESKTOP-J3L5B5E:/ff/SummerTTS-main/build# cmake ..
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /ff/SummerTTS-main/build
root@DESKTOP-J3L5B5E:/ff/SummerTTS-main/build# make
[ 1%] Building CXX object CMakeFiles/tts_test.dir/test/main.cpp.o
[ 2%] Building CXX object CMakeFiles/tts_test.dir/src/tn/glog/src/demangle.cc.o
[ 4%] Building CXX object CMakeFiles/tts_test.dir/src/tn/glog/src/logging.cc.o
[ 5%] Building CXX object CMakeFiles/tts_test.dir/src/tn/glog/src/raw_logging.cc.o
[ 6%] Building CXX object CMakeFiles/tts_test.dir/src/tn/glog/src/symbolize.cc.o
[ 8%] Building CXX object CMakeFiles/tts_test.dir/src/tn/glog/src/utilities.cc.o
[ 9%] Building CXX object CMakeFiles/tts_test.dir/src/tn/glog/src/vlog_is_on.cc.o
[ 11%] Building CXX object CMakeFiles/tts_test.dir/src/tn/glog/src/signalhandler.cc.o
[ 12%] Building CXX object CMakeFiles/tts_test.dir/src/tn/gflags/src/gflags.cc.o
[ 13%] Building CXX object CMakeFiles/tts_test.dir/src/tn/gflags/src/gflags_reporting.cc.o
[ 15%] Building CXX object CMakeFiles/tts_test.dir/src/tn/gflags/src/gflags_completions.cc.o
[ 16%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/compat.cc.o
[ 18%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/flags.cc.o
[ 19%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/fst.cc.o
[ 20%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/fst-types.cc.o
[ 22%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/mapped-file.cc.o
[ 23%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/properties.cc.o
[ 25%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/symbol-table.cc.o
[ 26%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/symbol-table-ops.cc.o
[ 27%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/util.cc.o
[ 29%] Building CXX object CMakeFiles/tts_test.dir/src/tn/openfst/src/lib/weight.cc.o
[ 30%] Building CXX object CMakeFiles/tts_test.dir/src/tn/processor.cc.o
[ 31%] Building CXX object CMakeFiles/tts_test.dir/src/tn/token_parser.cc.o
[ 33%] Building CXX object CMakeFiles/tts_test.dir/src/tn/utf8_string.cc.o
[ 34%] Building CXX object CMakeFiles/tts_test.dir/src/engipa/EnglishText2Id.cpp.o
[ 36%] Building CXX object CMakeFiles/tts_test.dir/src/engipa/InitIPASymbols.cpp.o
[ 37%] Building CXX object CMakeFiles/tts_test.dir/src/engipa/alphabet.cpp.o
[ 38%] Building CXX object CMakeFiles/tts_test.dir/src/engipa/ipa.cpp.o
[ 40%] Building CXX object CMakeFiles/tts_test.dir/src/hz2py/hanzi2phoneid.cpp.o
[ 41%] Building CXX object CMakeFiles/tts_test.dir/src/hz2py/Hanz2Piny.cpp.o
[ 43%] Building CXX object CMakeFiles/tts_test.dir/src/hz2py/pinyinmap.cpp.o
[ 44%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_conv1d.cpp.o
[ 45%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_softmax.cpp.o
[ 47%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_layer_norm.cpp.o
[ 48%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_relu.cpp.o
[ 50%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_gelu.cpp.o
[ 51%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_tanh.cpp.o
[ 52%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_flip.cpp.o
[ 54%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_cumsum.cpp.o
[ 55%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_softplus.cpp.o
[ 56%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_clamp_min.cpp.o
[ 58%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_sigmoid.cpp.o
[ 59%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_conv1d_transposed.cpp.o
[ 61%] Building CXX object CMakeFiles/tts_test.dir/src/nn_op/nn_leaky_relu.cpp.o
[ 62%] Building CXX object CMakeFiles/tts_test.dir/src/platform/tts_file_io.cpp.o
[ 63%] Building CXX object CMakeFiles/tts_test.dir/src/platform/tts_logger.cpp.o
[ 65%] Building CXX object CMakeFiles/tts_test.dir/src/utils/utils.cpp.o
[ 66%] Building CXX object CMakeFiles/tts_test.dir/src/modules/iStft.cpp.o
[ 68%] Building CXX object CMakeFiles/tts_test.dir/src/modules/hann.cpp.o
[ 69%] Building CXX object CMakeFiles/tts_test.dir/src/modules/attention_encoder.cpp.o
[ 70%] Building CXX object CMakeFiles/tts_test.dir/src/modules/multi_head_attention.cpp.o
[ 72%] Building CXX object CMakeFiles/tts_test.dir/src/modules/ffn.cpp.o
[ 73%] Building CXX object CMakeFiles/tts_test.dir/src/modules/ConvFlow.cpp.o
[ 75%] Building CXX object CMakeFiles/tts_test.dir/src/modules/DDSConv.cpp.o
[ 76%] Building CXX object CMakeFiles/tts_test.dir/src/modules/ElementwiseAffine.cpp.o
[ 77%] Building CXX object CMakeFiles/tts_test.dir/src/modules/random_gen.cpp.o
[ 79%] Building CXX object CMakeFiles/tts_test.dir/src/modules/ResidualCouplingLayer.cpp.o
[ 80%] Building CXX object CMakeFiles/tts_test.dir/src/modules/ResBlock1.cpp.o
[ 81%] Building CXX object CMakeFiles/tts_test.dir/src/modules/WN.cpp.o
[ 83%] Building CXX object CMakeFiles/tts_test.dir/src/modules/pqmf.cpp.o
[ 84%] Building CXX object CMakeFiles/tts_test.dir/src/models/TextEncoder.cpp.o
[ 86%] Building CXX object CMakeFiles/tts_test.dir/src/models/StochasticDurationPredictor.cpp.o
[ 87%] Building CXX object CMakeFiles/tts_test.dir/src/models/FixDurationPredictor.cpp.o
[ 88%] Building CXX object CMakeFiles/tts_test.dir/src/models/DurationPredictor_base.cpp.o
[ 90%] Building CXX object CMakeFiles/tts_test.dir/src/models/ResidualCouplingBlock.cpp.o
[ 91%] Building CXX object CMakeFiles/tts_test.dir/src/models/Generator_base.cpp.o
[ 93%] Building CXX object CMakeFiles/tts_test.dir/src/models/Generator_hifigan.cpp.o
[ 94%] Building CXX object CMakeFiles/tts_test.dir/src/models/Generator_MS.cpp.o
[ 95%] Building CXX object CMakeFiles/tts_test.dir/src/models/Generator_Istft.cpp.o
[ 97%] Building CXX object CMakeFiles/tts_test.dir/src/models/Generator_MBB.cpp.o
[ 98%] Building CXX object CMakeFiles/tts_test.dir/src/models/SynthesizerTrn.cpp.o
[100%] Linking CXX executable tts_test
[100%] Built target tts_test
root@DESKTOP-J3L5B5E:/ff/SummerTTS-main/build# ./tts_test ../test.txt ../models/single_speaker_fast.bin out.wav
Segmentation fault
root@DESKTOP-J3L5B5E:/ff/SummerTTS-main/build#

请问模型初始后,每次调用合成是否会提高速度?

请问如果第一次加载了模型,再次合成的时候,是否能提高速度?
如果是的话建议增加一个接口:初始化模型,然后之后就只需要调用合成接口,直接合成语音。
由于我不是很专业,不知道我理解的是否正确,谢谢。

onnxruntime option support?

eigen虽好但不好维护(应该也不是最快的方案),有没有兴趣将onnxruntime作为候选项支持?

我该如何修改音高

@huakunyang
因为我要实现的接口是一个游戏引擎内的通用接口,其已有接口定义:

void tts_speak(text: String, voice: String, volume: int = 50, pitch: float = 1.0, rate: float = 1.0, utterance_id: int = 0, interrupt: bool = false)

向队列中添加发言。如果 interrupt 为 true,则会先清空队列。

- voice 语音标识符是 tts_get_voices() 所返回的 "id" 值,也可以是 tts_get_voices_for_language() 返回的值。

- volume 音量从 0(最低)到 100(最高)。

- pitch 音高从 0.0(最低)到 2.0(最高), 1.0 为当前语音的默认音高。

- rate 语速从 0.1(最低)到 10.0(最高), 1.0 为普通语速。其他值为相对百分比。

- utterance_id 话语 ID 会作为参数传递给回调函数。

我需要 volume ,pitch ,rate 这3个参数在SynthesizerTrn::infer(const string & line, int32_t sid, float lengthScale, int32_t & dataLen) 对应的参数是什么。

在我的认识中:
rate 对应 lengthScale,
volume 可以通过 retData[i]*=volume/100 实现
这样的对应关系对吗?

那么 我该如何获得 pitch 的对应实现?我没有在代码中发现,可以给我一个指导吗?

如果没有实现也不要紧,我可以自己实现一个,只是如果在summertts 已经使用的库或者代码的底层是否已经有了原生实现,有的话我就可以直接使用了。

建议将底层从vits升级至Bert-VITS2

首先非常感谢您提供此项目,respect!

VITS出来也蛮久了,不过由于中文韵律方面一直不太理想,所以叫好不叫座,大家也一直不太实际运用起来。不过近期的https://github.com/fishaudio/Bert-VITS2 项目通过集成Bert很好地解决了中文韵律的问题,如果能升级到这个方案,那将是十分完美的。

性能

使用chrono时间库

root@d255e0717767:/tmp_workspace/SummerTTS-main/build# ./tts_test ../test.txt ../models/single_speaker_big.bin out.wav
ttsLoadModel cost time = 110ms
new Synthesizer cost time = 68ms
infer synthesizer cost time = 42259ms
root@d255e0717767:/tmp_workspace/SummerTTS-main/build# ./tts_test ../test.txt ../models/single_speaker_medium.bin out.wav
ttsLoadModel cost time = 76ms
new Synthesizer cost time = 50ms
infer synthesizer cost time = 18842ms
root@d255e0717767:/tmp_workspace/SummerTTS-main/build# ./tts_test ../test.txt ../models/single_speaker_small.bin out.wav
ttsLoadModel cost time = 64ms
new Synthesizer cost time = 45ms
infer synthesizer cost time = 9463ms
root@d255e0717767:/tmp_workspace/SummerTTS-main/build# ./tts_test ../test.txt ../models/single_speaker_tiny.bin out.wav
ttsLoadModel cost time = 66ms
new Synthesizer cost time = 43ms
infer synthesizer cost time = 4891ms

out.wav dur = 5744ms

window 10 编译问题和解决方案

编译环境

  1. 操作系统: window 10
  2. 编译环境:mingw
  3. g++版本:8.1.0

编译问题:

error: 'M_PI' was not declared in this scope

解决方案:

CMakeLists.txt top level 加上 add_definitions(-D_USE_MATH_DEFINES)

编译命令:

cd SummerTTS/build
cmake -G "MinGW Makefiles" ..
make -j 8

参考方案:

https://github.com/orgs/robotology/discussions/81

Apple Sillicon运行make报错

macos ventura 运行
cmake ..
make
clang: error: unsupported option '-fopenmp'
make[2]: *** [CMakeFiles/tts_test.dir/test/main.cpp.o] Error 1
make[1]: *** [CMakeFiles/tts_test.dir/all] Error 2
make: *** [all] Error 2

尝试brew install gcc 后在shell which gcc也已经指向/opt/homebrew/bin/gcc-13 仍然同样错误

How to train a new model

@huakunyang
再一次感谢你的项目,经过不懈努力,我已经初步合成到了我的游戏中,

2024-01-10.23-46-46.mp4

my godot addon

这是题外话,我现在想知道的是

  1. 如何训练新的模型
  2. 底层的模型结构是怎么样的,如果不方便公开代码可以告知一下是否使用什么通用的模型结构进行训练的?:根据你的感谢清单,你在使用MB-iSTFT-VITS模型吗?那么怎么将pth转换为bin文件?
  3. 是否有其他语言模型的bin文件可以直接使用?

关于中英文混合数据集的建议

首先,这个项目很棒,很轻量级(相对bark而言)。

但是中英文混合还是硬伤,有没有考虑过用edge-tts来生成训练数据呢?
仅需要收集中英文混合的文本即可,然后用edge-tts生成音频文件。

例如这样, pip安装edge-tts后,就可以直接操作了

pip install edge-tts
edge-tts -f "./text001.txt" --voice zh-CN-YunxiNeural --write-media ./text001.mp3

要是怕微软商业许可啥的,你可以开源一下训练步骤,我们自己动手

想请教一下如何控制音量、音调呢

我可以控制语速,但是似乎不能控制音量、音调(pitch)

另外我如果要生成采样率为8000hz的文件,该如何做呢,电话里的语音用的都是8000hz的

不支持换行及空格

只支持一段话。能否支持整篇文章的文字转语音,包括文章空格,空行 换行等特殊条件

Android 编译成功

主要修改2处

1:修改glog 里面 使用到的Android没有的<execinfo.h>

"execinfo.h"是glibc里面的,因为android是bionic,而非glibc。

/src/tn/glog/src/tacktrace_generic-inl.h 里面的#include <execinfo.h>注释掉。

backtrace(stack, kStackLength); 替换为 0;

2: 将 CMakeLists.txt 里面的 -fopenmp去掉,否则会导致运行时报错 android library "libomp.so" not found.

修改后的CMakeLists.txt:
cmake_minimum_required(VERSION 3.5)
project(tts)
set(CMAKE_CXX_FLAGS " -O3 -w -std=c++11 ")
set(CMAKE_C_FLAGS " -O3 -w -std=c++11 ")

include_directories(tts PUBLIC ./eigen-3.4.0
                    ./src/tn/header
                    ./include
                    ./src/header)
set(lib_src ./src/tn/glog/src/demangle.cc
            ./src/tn/glog/src/logging.cc
            ./src/tn/glog/src/raw_logging.cc
            ./src/tn/glog/src/symbolize.cc
            ./src/tn/glog/src/utilities.cc
            ./src/tn/glog/src/vlog_is_on.cc
            ./src/tn/glog/src/signalhandler.cc
            ./src/tn/gflags/src/gflags.cc
            ./src/tn/gflags/src/gflags_reporting.cc
            ./src/tn/gflags/src/gflags_completions.cc
            ./src/tn/openfst/src/lib/compat.cc
            ./src/tn/openfst/src/lib/flags.cc
            ./src/tn/openfst/src/lib/fst.cc
            ./src/tn/openfst/src/lib/fst-types.cc
            ./src/tn/openfst/src/lib/mapped-file.cc
            ./src/tn/openfst/src/lib/properties.cc
            ./src/tn/openfst/src/lib/symbol-table.cc
            ./src/tn/openfst/src/lib/symbol-table-ops.cc
            ./src/tn/openfst/src/lib/util.cc
            ./src/tn/openfst/src/lib/weight.cc
            ./src/tn/processor.cc
            ./src/tn/token_parser.cc
            ./src/tn/utf8_string.cc
            ./src/engipa/EnglishText2Id.cpp
            ./src/engipa/InitIPASymbols.cpp
            ./src/engipa/alphabet.cpp
            ./src/engipa/ipa.cpp
            ./src/hz2py/hanzi2phoneid.cpp
            ./src/hz2py/Hanz2Piny.cpp
            ./src/hz2py/pinyinmap.cpp
            ./src/nn_op/nn_conv1d.cpp
            ./src/nn_op/nn_softmax.cpp
            ./src/nn_op/nn_layer_norm.cpp
            ./src/nn_op/nn_relu.cpp
            ./src/nn_op/nn_gelu.cpp
            ./src/nn_op/nn_tanh.cpp
            ./src/nn_op/nn_flip.cpp
            ./src/nn_op/nn_cumsum.cpp
            ./src/nn_op/nn_softplus.cpp
            ./src/nn_op/nn_clamp_min.cpp
            ./src/nn_op/nn_sigmoid.cpp
            ./src/nn_op/nn_conv1d_transposed.cpp
            ./src/nn_op/nn_leaky_relu.cpp
            ./src/platform/tts_file_io.cpp
            ./src/platform/tts_logger.cpp
            ./src/utils/utils.cpp
            ./src/modules/iStft.cpp
            ./src/modules/hann.cpp
            ./src/modules/attention_encoder.cpp
            ./src/modules/multi_head_attention.cpp
            ./src/modules/ffn.cpp
            ./src/modules/ConvFlow.cpp
            ./src/modules/DDSConv.cpp
            ./src/modules/ElementwiseAffine.cpp
            ./src/modules/random_gen.cpp
            ./src/modules/ResidualCouplingLayer.cpp
            ./src/modules/ResBlock1.cpp
            ./src/modules/WN.cpp
            ./src/modules/pqmf.cpp
            ./src/models/TextEncoder.cpp
            ./src/models/StochasticDurationPredictor.cpp
            ./src/models/FixDurationPredictor.cpp
            ./src/models/DurationPredictor_base.cpp
            ./src/models/ResidualCouplingBlock.cpp
            ./src/models/Generator_base.cpp
            ./src/models/Generator_hifigan.cpp
            ./src/models/Generator_MS.cpp
            ./src/models/Generator_Istft.cpp
            ./src/models/Generator_MBB.cpp
            ./src/models/SynthesizerTrn.cpp)



add_library(tts SHARED ${lib_src})

add_library(tts_static STATIC ${lib_src})

add_executable(tts_test ./test/main.cpp ${lib_src} )

可以在 CMakeLists.txt 下新建android-build目录,创建编译脚本,更改一下ndk路径,运行,在libs目录下就得到生成产物。

编译脚本内容:
#!/bin/bash


##设置NDK路径
NDK_PATH=~/ndk/android-ndk-r21e
##CMakeLists.txt所在目录
PROJECT_DIR=$(dirname $(pwd))
BUILD_DIR=$(pwd)
JOBS=$(nproc 2> /dev/null || sysctl -n hw.ncpu 2> /dev/null || echo 4)
echo "Using $JOBS jobs for make"


buildAndroid(){

ABI=$1

rm -rf "$ABI"
mkdir -p "$ABI" && cd "$ABI"
rm -rf CMakeCache.txt
rm -rf CMakeFiles

cmake \
-DCMAKE_TOOLCHAIN_FILE=$NDK_PATH/build/cmake/android.toolchain.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_SYSTEM_NAME=Android \
-DCMAKE_SYSTEM_VERSION=21 \
-DANDROID_STL=c++_shared \
-DANDROID_LD=lld \
-DANDROID_ABI=$ABI \
-DANDROID_NDK=$NDK_PATH \
-DANDROID_PLATFORM=android-21 \
-DANDROID_TOOLCHAIN=clang \
-DBUILD_SHARED_LIBS=1 $PROJECT_DIR
make -j$JOBS

sleep 2
mkdir -p "$BUILD_DIR/libs/$ABI"
mv -f libtts.so $BUILD_DIR/libs/$ABI/libtts.so
mv -f libtts_static.a $BUILD_DIR/libs/$ABI/libtts_static.a
mv -f tts_test $BUILD_DIR/libs/$ABI/tts_test
sleep 2

cd ..


}


for cpu in "arm64-v8a" "armeabi-v7a" "x86" "x86_64"
do
    buildAndroid $cpu
done

是否考虑从工具向转SDK向

  • 可单独编译 C++推理库(静态、动态,跨平台或者至少跨 Desktop 操作系统)
  • 可泛用的模型转换脚本

这些是刚需啊~

PS: 被 paddle speech 折磨得苦不堪言,它得 CPU 推理太慢了,转 ONNX 有茫茫多的 BUG

GPU 加速

麻烦问一下,目前的c++是cpu计算的吗?我了解到Eigen 不能加入GPU,咱们项目可以使用GPU进行加速吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.