ntt123 / viettts Goto Github PK
View Code? Open in Web Editor NEWVietnamese Text to Speech library
License: MIT License
Vietnamese Text to Speech library
License: MIT License
Hi, me again.
I'm training your tts. My dataset is about 16 hours
First, because my dataset utterance is similar to yours, I'm training acoustic model use 2 approach:
Continue to train your acoustic_checkpoint to 1.46M step: val loss:0.227 and it's gonna converge.
Here is full detail : https://drive.google.com/drive/folders/1j0OT7KgJOk5hmcOVNPdcdkaekRRxHekk?usp=sharing
Second, I train Hifigan Vocoder (with acoustic 1.46M) about 290k step:
My transcript text: "xin chào tôi là phương anh bản thử số chín"
I got this : https://drive.google.com/file/d/1UtgE1gTC8mwo1SV1b7chauvWPC7uPjxM/view?usp=sharing
=> The result that speaker talk non-sense but intonation is quite good.
Here is 50k vocoder + 1.46M acoustic, just to compare:
https://drive.google.com/file/d/1InQ8ykYC_P7qaKhv_58SmTC0r-b_4_0h/view?usp=sharing
And from 50k vocoder + 800k from scratch: https://drive.google.com/file/d/1E-FjOfBqFf9vHTKXmAUhamtB2FsAlAMT/view?usp=sharing
I got stuck, should I focus on acoustic or vocoder or dataset to improve the result ?
Thanks!
I cannot downlaod jaxlib in windows OS.
Please help
Hi ,
Could you share with me about your Gen Loss and Mel-Spec Error when you got the final model?
I'm trying to train on another dataset for around 3 days and I see loss, error decrease but the quality voice when testing is quite creepy :)))))))
Thank you so much, I hope you can see my question
Hi,
Based on your repo and your answers, I have built successfully a Vietnamese text-to-speech app with my own dataset. It sounds so good in the majority of cases. But I am still stuck on how to handle some English words (e.g, vaccine, morning...) that appear in the text. I have created a list of English words and mapping it with Vietnamese pronounce (e.g, vaccine - vắc xin) and updated it when new English words appear. However, It seems inefficient way.
Do you have any advice for me in this case? Thank you so much.
Hi, I'm preprocess like your pipeline. But when I run fine tune code. I got error:
checkpoints directory : small_cp_hifigan
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
2021-06-18 07:29:10.395782: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Epoch: 1
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
Steps : 0, Gen Loss Total : 88.451, Mel-Spec. Error : 1.812, s/b : 2.910
train.py:199: UserWarning: Using a target size (torch.Size([1, 80, 305])) that is different to the input size (torch.Size([1, 80, 304])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
val_err_tot += F.l1_loss(y_mel, y_g_hat_mel).item()
Traceback (most recent call last):
File "train.py", line 271, in
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 199, in train
val_err_tot += F.l1_loss(y_mel, y_g_hat_mel).item()
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2897, in l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/usr/local/lib/python3.7/dist-packages/torch/functional.py", line 74, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore
RuntimeError: The size of tensor a (304) must match the size of tensor b (305) at non-singleton dimension 2
I don't know why, maybe error when you freeze tensor. What is dimension 2 ? Help!
Thanks!
Hi,
I try to play with this model but how to keep voice is the same person for each sentence ?
I saw that your synthesized audio clip is good!
But the phoneme is just a single "character" like "ô ư i u ...", how about complicated cases (maybe better as I think) such as: hươu -> h ươu, thái -> th ái
I am trying FastSpeech2, but it seems not good. Did you hear it?
It is great if we can contact further.
Thanks!
Thanks for your great work!
I want to implement univnet vocoder but your model is written in haiku and jax numpy.
https://github.com/mindslab-ai/univnet
I followed your convert code but got stuck:
jax numpy has no function unfold or to or transpose ?
How can I fix that problems ?
Reading speed is too fast with 48k audio samples. Is there a way to reduce the audio speed? Looking forward to everyone's feedback. Thank you so much.
Today, I ran your acoustic model on colab and I got this issues
training: 0% 0/1900001 [00:00<?, ?it/s]2021-12-07 03:51:13.659473: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2085] Execution of replica 0 failed: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED
training: 0% 0/1900001 [00:16<?, ?it/s]
Traceback (most recent call last):
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 139, in
train()
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 101, in train
loss, (params, aux, rng, optim_state) = update(params, aux, rng, optim_state, batch)
File "/usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/jax/_src/api.py", line 419, in cache_miss
donated_invars=donated_invars, inline=inline)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1632, in bind
return call_bind(self, fun, *args, **params)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1623, in call_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1635, in process
return trace.process_call(self, fun, tracers, params)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 627, in process_call
return primitive.impl(f, *tracers, **params)
File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 690, in _xla_call_impl
out = compiled_fun(*args)
File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 1100, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 139, in
train()
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 101, in train
loss, (params, aux, rng, optim_state) = update(params, aux, rng, optim_state, batch)
File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 1100, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
RuntimeError: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED
2021-12-07 03:51:14.389335: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1047] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
_PyModule_ClearDict
PyImport_Cleanup
Py_FinalizeEx
_Py_UnixMain
__libc_start_main
_start
*** End stack trace ***
I guess this issue comes from mismatch version of requirements.
Could you please define your specific version of dependencies or update requirements ?
Dự án này tốt quá, mình muốn nhờ anh chuyển giao kỹ thuật. Thông tin cá nhân anh inbox giúp [email protected], để mình liên hệ lại.
I tried, but It can't seem to handle the number in the text.
thanks. a great library
Hi,
I have trained with my own dataset, and the results look good. However, I face a problem with special phonemes. The TTS does not stop when meeting special phonemes (in case setting silence_duration = 0) and sometimes it speaks some noise. When setting silence_duration =0.1 or higher, the TTS always speaks noise.
I guess something goes wrong with my *.textgrid files because there are no sil or sp phones in these. (https://drive.google.com/drive/folders/1RAsq-qPMjHMn-seJy3iapWAmLmjwz9nb?usp=sharing) And I have no idea to fix it.
This is my demo: http://6640-113-161-90-253.ngrok.io/
Can you give me some advice?
Thank you.
Hi, I'm new in ML. I trained about 100K step (your pre-trained is 800K) in HiFi-GAN vocoder and the sound is acceptable. Now I want to using different dataset to train model. Should I train new HiFi-GAN model or continue to train pre-trained model? I'm not sure. And when I choose options fine tune using vivos dataset:
%cd '/content/drive/MyDrive/vietTTS/hifi-gan'
!python3 train.py --fine_tuning True --config ../assets/hifigan/config.json --input_wavs_dir=data --input_training_file=train_files.txt --input_validation_file=val_files.txt
And I got this error:
checkpoints directory : cp_hifigan
Loading 'cp_hifigan/g_00105000'
Complete.
Loading 'cp_hifigan/do_00110000'
Complete.
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
2021-06-15 03:07:39.878886: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Epoch: 119
Traceback (most recent call last):
File "train.py", line 271, in
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 113, in train
for i, batch in enumerate(train_loader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/My Drive/vietTTS/hifi-gan/meldataset.py", line 144, in getitem
os.path.join(self.base_mels_path, os.path.splitext(os.path.split(filename)[-1])[0] + '.npy'))
File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'ft_dataset/VIVOSSPK46_184.npy'
I think when I trained vocoder before, my own dataset does not include these new files. How can I fix it ?
And the second question is what happen if I continue to train my pre-trained vocoder model with different dataset.
Thanks!
I tried to read this text "Xin mời bệnh nhân thứ 1"
function work ok, however it keep number at end centense
Xin chào tác giả. repos này bạn build trên linux ạ?
I am creating textgrid files for my dataset. Can you guide me how to create that file? Or you can give me information. Thank you so much!
Hi @NTT123
Thanks for your work!
What is the config of Hifi GAN with 24kHz? (The default is 16kHz)?
Or at least, could you share your tips to calculate the parameters.
Hello! How can I quickly create a TextGrid file with audio (.wav) and text (.txt) files? I want to create a personal speech dataset for a Vietnamese text-to-speech task similar to training and using your model. Can you please help me? Thank you very much.
ERROR: Could not find a version that satisfies the requirement jaxlib (from viettts) (from versions: none)
ERROR: No matching distribution found for jaxlib
I see an issue coming from text2mel using only 1 cpu and mel2wave being all cpu. Do you have any solution to this problem to optimize processing time? Thank you so much!
It's a great repo.
I have tried to train my own model, but I have still stuck at prepare a dataset. Can you instruct me how to create a lexicon.txt file corresponding to my dataset, then I can use MFA to create a grid file.
Thank you very much.
i checkout your project and build:bash ./scripts/quick_start.sh
FileNotFoundError: [Errno 2] No such file or directory: 'assets/infore/hifigan/hk_hifi.pickle'
help me?
Hi,
I'm trying to using resblock: 2
on HiFi-Gan config v3
. Could you please help me to fix the convert script so that it could inference correctly?
Using the model converted from the convert_torch_model_to_haiku.py not work with v3.
Here is the error I'm dealing with:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/vietTTS/vietTTS/synthesizer.py", line 38, in <module>
wave = mel2wave(mel)
File "/content/vietTTS/vietTTS/hifigan/mel2wave.py", line 40, in mel2wave
wav, aux = forward.apply(params, aux, rng, mel)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/transform.py", line 400, in apply_fn
out = f(*args, **kwargs)
File "/content/vietTTS/vietTTS/hifigan/mel2wave.py", line 32, in forward
return net(x)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 433, in wrapped
out = f(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 284, in run_interceptors
return bound_method(*args, **kwargs)
File "/content/vietTTS/vietTTS/hifigan/model.py", line 119, in __call__
xs = self.resblocks[i * self.num_kernels + j](x)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 433, in wrapped
out = f(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 284, in run_interceptors
return bound_method(*args, **kwargs)
File "/content/vietTTS/vietTTS/hifigan/model.py", line 72, in __call__
xt = c(xt)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 433, in wrapped
out = f(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 284, in run_interceptors
return bound_method(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/conv.py", line 200, in __call__
w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
File "/usr/local/lib/python3.7/dist-packages/haiku/_src/base.py", line 333, in get_parameter
name, bundle_name))
ValueError: Unable to retrieve parameter 'w' for module 'generator/~/res_block1_0/~/conv1_d'. All parameters must be created as part of `init`.
Làm sao để chạy với gpu anh nhỉ
Hi all, I see you say that use MFA to prepare dataset (textGrid file), I tried to use it but it has a lot of issues with Vietnamese , I generated a lexicon.txt file based on the g2p model, but when using the acoustic model to generate textGrid file, the error is :
"There were phones in the dictionary that do not have acoustic models: au_T4, au_T6, eu_T5, ieu_T5, ui2_T2, uoi2_T2, uoi3_T6, uou_T1, uou_T3".
And I tried using your lexicon.txt file but the error is : "There were phones in the dictionary that do not have acoustic models: a, c, d, e, i, o, q, u, y, à, á, â, ã, è, é, ê, ì, í, ò, ó, ô, õ, ù, ú, ý, ă, đ, ĩ, ũ, ơ, ư, ạ, ả, ấ, ầ, ẩ, ẫ, ậ, ắ, ằ, ẳ, ẵ, ặ, ẹ, ẻ, ẽ, ế, ề, ể, ễ, ệ, ỉ, ị, ọ, ỏ, ố, ồ, ổ, ỗ, ộ, ớ, ờ, ở, ỡ, ợ, ụ, ủ, ứ, ừ, ử, ữ, ự, ỳ, ỵ, ỷ, ỹ"
Could you share me about the g2p and acoustic model you used? Thank you so much
Hi @NTT123,
First of all thank you for your brilliant work! I have successfully trained my dataset with MFA, but it is not generated .TextGrid
as a marker for silence, space. Could you please help me on how we can detect and add these symbol to the TextGrid file?
hi
I want to train fastspeech2 on a Persian dataset with durations extracted from MFA. which version of mfa do U use for training? do you have any preprocessing on your datasets?
i got an error when continue train from my checkpoint
Resuming from latest checkpoint at /content/drive/MyDrive/vietTTS_Model/acoustic_latest_ckpt.pickle Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 183, in <module> train() File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 114, in train dic = pickle.load(f) EOFError: Ran out of input
What do I need to do to work with number, date,... ? normalize_text or config model
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.