ntt123 / viettts Goto Github PK

View Code? Open in Web Editor NEW

186.0 19.0 81.0 12.12 MB

Vietnamese Text to Speech library

License: MIT License

Shell 1.06% Python 88.59% Jupyter Notebook 10.34%

tts-engines deep-learning tacotron vocoder hifi-gan vietnam vietnamese text-to-speech

viettts's People

Contributors

Stargazers

Watchers

Forkers

marcohatran pysync chauthan minidivn hongson23 lethanhson9901 namph-sgn trungvan86 chauvv phanan9225 daotoan-hd nguyentrungtung thaivanan tranmduc taiinguyenn139 0h3r0 chuongnvk54 lynth29 vlinhd11 phvhao anhthoai alexblack2202 cuong3004 buiquangmanhhp1999 thorpham nampdn qtvhao ductho9799 dugduy hoangnv172566 hiennguyen92 kennytat trunghieulam sanglqsgu phatdatpq hung-nguyenvan canthailinh ddai00bit nvthanhuk maikhoigroup vatsci metafrasi-bible sonpxp haipn91 ntanhfai nguyentri1455 wordpressvn nmnduy longlinh truongscotl dinhchinh82 tuanlase02874 hieunguyen7337 duyanhpro cybertekworks ntxinh namlunkaka sonhm3029 saonam baothien1301 giangbang binhvq thanhpnvietis thinhptran khoanguyen1806 nquang1417 binhmuc thinhvn13 hoanghailethe hiepnm93 doxuanhop badpaybad bdx0 fptf8fpt2018 huydgd salamander97 hungnguyen251 tuyendam00 torohima duc20176723 tautobet

viettts's Issues

improve tts

Hi, me again.
I'm training your tts. My dataset is about 16 hours
First, because my dataset utterance is similar to yours, I'm training acoustic model use 2 approach:

Continue to train your acoustic_checkpoint to 1.46M step: val loss:0.227 and it's gonna converge.
- Here is my result:
Train from scratch: about 800k step - val loss: 0.301

Here is full detail : https://drive.google.com/drive/folders/1j0OT7KgJOk5hmcOVNPdcdkaekRRxHekk?usp=sharing
Second, I train Hifigan Vocoder (with acoustic 1.46M) about 290k step:
My transcript text: "xin chào tôi là phương anh bản thử số chín"

I got this : https://drive.google.com/file/d/1UtgE1gTC8mwo1SV1b7chauvWPC7uPjxM/view?usp=sharing
=> The result that speaker talk non-sense but intonation is quite good.
Here is 50k vocoder + 1.46M acoustic, just to compare:
https://drive.google.com/file/d/1InQ8ykYC_P7qaKhv_58SmTC0r-b_4_0h/view?usp=sharing
And from 50k vocoder + 800k from scratch: https://drive.google.com/file/d/1E-FjOfBqFf9vHTKXmAUhamtB2FsAlAMT/view?usp=sharing

I got stuck, should I focus on acoustic or vocoder or dataset to improve the result ?
Thanks!

Could not find a version that satisfies the requirement jaxlib

I cannot downlaod jaxlib in windows OS.
Please help

[Gen Loss and Mel-Spec Error]

Hi ,
Could you share with me about your Gen Loss and Mel-Spec Error when you got the final model?
I'm trying to train on another dataset for around 3 days and I see loss, error decrease but the quality voice when testing is quite creepy :)))))))
Thank you so much, I hope you can see my question

How to handle english words in vietnamese text

Hi,
Based on your repo and your answers, I have built successfully a Vietnamese text-to-speech app with my own dataset. It sounds so good in the majority of cases. But I am still stuck on how to handle some English words (e.g, vaccine, morning...) that appear in the text. I have created a list of English words and mapping it with Vietnamese pronounce (e.g, vaccine - vắc xin) and updated it when new English words appear. However, It seems inefficient way.
Do you have any advice for me in this case? Thank you so much.

Error when fine tune new dataset

Hi, I'm preprocess like your pipeline. But when I run fine tune code. I got error:

checkpoints directory : small_cp_hifigan
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
2021-06-18 07:29:10.395782: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Epoch: 1
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:581: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.)
normalized, onesided, return_complex)
Steps : 0, Gen Loss Total : 88.451, Mel-Spec. Error : 1.812, s/b : 2.910
train.py:199: UserWarning: Using a target size (torch.Size([1, 80, 305])) that is different to the input size (torch.Size([1, 80, 304])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
val_err_tot += F.l1_loss(y_mel, y_g_hat_mel).item()
Traceback (most recent call last):
File "train.py", line 271, in
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 199, in train
val_err_tot += F.l1_loss(y_mel, y_g_hat_mel).item()
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2897, in l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/usr/local/lib/python3.7/dist-packages/torch/functional.py", line 74, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore
RuntimeError: The size of tensor a (304) must match the size of tensor b (305) at non-singleton dimension 2

I don't know why, maybe error when you freeze tensor. What is dimension 2 ? Help!
Thanks!

How to keep voice with same person.

Hi,
I try to play with this model but how to keep voice is the same person for each sentence ?

Graphemes to phonemes

I saw that your synthesized audio clip is good!
But the phoneme is just a single "character" like "ô ư i u ...", how about complicated cases (maybe better as I think) such as: hươu -> h ươu, thái -> th ái
I am trying FastSpeech2, but it seems not good. Did you hear it?
It is great if we can contact further.
Thanks!

Feature: UnivNet implementation

Thanks for your great work!
I want to implement univnet vocoder but your model is written in haiku and jax numpy.
https://github.com/mindslab-ai/univnet

I followed your convert code but got stuck:

jax numpy has no function unfold or to or transpose ?

How can I fix that problems ?

Audio playback speed is too fast

Reading speed is too fast with 48k audio samples. Is there a way to reduce the audio speed? Looking forward to everyone's feedback. Thank you so much.

could not synchronize on CUDA context

Today, I ran your acoustic model on colab and I got this issues

training: 0% 0/1900001 [00:00<?, ?it/s]2021-12-07 03:51:13.659473: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2085] Execution of replica 0 failed: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED
training: 0% 0/1900001 [00:16<?, ?it/s]
Traceback (most recent call last):
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 139, in
train()
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 101, in train
loss, (params, aux, rng, optim_state) = update(params, aux, rng, optim_state, batch)
File "/usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/jax/_src/api.py", line 419, in cache_miss
donated_invars=donated_invars, inline=inline)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1632, in bind
return call_bind(self, fun, *args, **params)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1623, in call_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1635, in process
return trace.process_call(self, fun, tracers, params)
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 627, in process_call
return primitive.impl(f, *tracers, **params)
File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 690, in _xla_call_impl
out = compiled_fun(*args)
File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 1100, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 139, in
train()
File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 101, in train
loss, (params, aux, rng, optim_state) = update(params, aux, rng, optim_state, batch)
File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 1100, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
RuntimeError: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED
2021-12-07 03:51:14.389335: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1047] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***

_PyModule_ClearDict
PyImport_Cleanup
Py_FinalizeEx

_Py_UnixMain
__libc_start_main
_start

*** End stack trace ***

2021-12-07 03:51:14.389456: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_executable.cc:124] Check failed: pair.first->SynchronizeAllActivity()

I guess this issue comes from mismatch version of requirements.
Could you please define your specific version of dependencies or update requirements ?

Chuyển giao công nghệ

Dự án này tốt quá, mình muốn nhờ anh chuyển giao kỹ thuật. Thông tin cá nhân anh inbox giúp [email protected], để mình liên hệ lại.

the number in the text

I tried, but It can't seem to handle the number in the text.

thanks. a great library

load Pretrained hifi-gan error

Problems with Special Phonemes

Hi,
I have trained with my own dataset, and the results look good. However, I face a problem with special phonemes. The TTS does not stop when meeting special phonemes (in case setting silence_duration = 0) and sometimes it speaks some noise. When setting silence_duration =0.1 or higher, the TTS always speaks noise.
I guess something goes wrong with my *.textgrid files because there are no sil or sp phones in these. (https://drive.google.com/drive/folders/1RAsq-qPMjHMn-seJy3iapWAmLmjwz9nb?usp=sharing) And I have no idea to fix it.
This is my demo: http://6640-113-161-90-253.ngrok.io/
Can you give me some advice?
Thank you.

Error when fine tuning using new dataset.

Hi, I'm new in ML. I trained about 100K step (your pre-trained is 800K) in HiFi-GAN vocoder and the sound is acceptable. Now I want to using different dataset to train model. Should I train new HiFi-GAN model or continue to train pre-trained model? I'm not sure. And when I choose options fine tune using vivos dataset:
%cd '/content/drive/MyDrive/vietTTS/hifi-gan'
!python3 train.py --fine_tuning True --config ../assets/hifigan/config.json --input_wavs_dir=data --input_training_file=train_files.txt --input_validation_file=val_files.txt

And I got this error:
checkpoints directory : cp_hifigan
Loading 'cp_hifigan/g_00105000'
Complete.
Loading 'cp_hifigan/do_00110000'
Complete.
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
2021-06-15 03:07:39.878886: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Epoch: 119
Traceback (most recent call last):
File "train.py", line 271, in
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 113, in train
for i, batch in enumerate(train_loader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/My Drive/vietTTS/hifi-gan/meldataset.py", line 144, in getitem
os.path.join(self.base_mels_path, os.path.splitext(os.path.split(filename)[-1])[0] + '.npy'))
File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'ft_dataset/VIVOSSPK46_184.npy'

I think when I trained vocoder before, my own dataset does not include these new files. How can I fix it ?
And the second question is what happen if I continue to train my pre-trained vocoder model with different dataset.
Thanks!

need to tranning read number

I tried to read this text "Xin mời bệnh nhân thứ 1"
function work ok, however it keep number at end centense

Một vài thư viện không support Windows

Xin chào tác giả. repos này bạn build trên linux ạ?

[Textgrid for dataset]

I am creating textgrid files for my dataset. Can you guide me how to create that file? Or you can give me information. Thank you so much!

Hifi GAN with 24kHz

Hi @NTT123
Thanks for your work!
What is the config of Hifi GAN with 24kHz? (The default is 16kHz)?
Or at least, could you share your tips to calculate the parameters.

How to create textgrid files from any Vietnamese's datasets (.wav, .txt)?

Hello! How can I quickly create a TextGrid file with audio (.wav) and text (.txt) files? I want to create a personal speech dataset for a Vietnamese text-to-speech task similar to training and using your model. Can you please help me? Thank you very much.

ERROR: Could not find a version that satisfies the requirement jaxlib

ERROR: Could not find a version that satisfies the requirement jaxlib (from viettts) (from versions: none)
ERROR: No matching distribution found for jaxlib

Prediction time is too slow

I see an issue coming from text2mel using only 1 cpu and mel2wave being all cpu. Do you have any solution to this problem to optimize processing time? Thank you so much!

How to create lexicon.txt file

It's a great repo.
I have tried to train my own model, but I have still stuck at prepare a dataset. Can you instruct me how to create a lexicon.txt file corresponding to my dataset, then I can use MFA to create a grid file.
Thank you very much.

High RAM usage acoustic model

After MFA training, I've got 39778 Textgrids and 39778 wav file. The issue that when I ran acoustic trainer it cost 14GB RAM and keep increasing. How to fix this? Thank you

Training infinity ;))

Training not working run on colab

FileNotFoundError: [Errno 2] No such file or directory: 'assets/infore/hifigan/hk_hifi.pickle'

i checkout your project and build:bash ./scripts/quick_start.sh

FileNotFoundError: [Errno 2] No such file or directory: 'assets/infore/hifigan/hk_hifi.pickle'
help me?

How to convert hifigan V3 to haiku correctly?

Hi,
I'm trying to using resblock: 2 on HiFi-Gan config v3. Could you please help me to fix the convert script so that it could inference correctly?

Using the model converted from the convert_torch_model_to_haiku.py not work with v3.

Here is the error I'm dealing with:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/vietTTS/vietTTS/synthesizer.py", line 38, in <module>
    wave = mel2wave(mel)
  File "/content/vietTTS/vietTTS/hifigan/mel2wave.py", line 40, in mel2wave
    wav, aux = forward.apply(params, aux, rng, mel)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/transform.py", line 400, in apply_fn
    out = f(*args, **kwargs)
  File "/content/vietTTS/vietTTS/hifigan/mel2wave.py", line 32, in forward
    return net(x)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 433, in wrapped
    out = f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 284, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/content/vietTTS/vietTTS/hifigan/model.py", line 119, in __call__
    xs = self.resblocks[i * self.num_kernels + j](x)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 433, in wrapped
    out = f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 284, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/content/vietTTS/vietTTS/hifigan/model.py", line 72, in __call__
    xt = c(xt)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 433, in wrapped
    out = f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/module.py", line 284, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/conv.py", line 200, in __call__
    w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
  File "/usr/local/lib/python3.7/dist-packages/haiku/_src/base.py", line 333, in get_parameter
    name, bundle_name))
ValueError: Unable to retrieve parameter 'w' for module 'generator/~/res_block1_0/~/conv1_d'. All parameters must be created as part of `init`.

how to run with gpu and tpu

Làm sao để chạy với gpu anh nhỉ

Training with other dataset

Hi all, I see you say that use MFA to prepare dataset (textGrid file), I tried to use it but it has a lot of issues with Vietnamese , I generated a lexicon.txt file based on the g2p model, but when using the acoustic model to generate textGrid file, the error is :
"There were phones in the dictionary that do not have acoustic models: au_T4, au_T6, eu_T5, ieu_T5, ui2_T2, uoi2_T2, uoi3_T6, uou_T1, uou_T3".
And I tried using your lexicon.txt file but the error is : "There were phones in the dictionary that do not have acoustic models: a, c, d, e, i, o, q, u, y, à, á, â, ã, è, é, ê, ì, í, ò, ó, ô, õ, ù, ú, ý, ă, đ, ĩ, ũ, ơ, ư, ạ, ả, ấ, ầ, ẩ, ẫ, ậ, ắ, ằ, ẳ, ẵ, ặ, ẹ, ẻ, ẽ, ế, ề, ể, ễ, ệ, ỉ, ị, ọ, ỏ, ố, ồ, ổ, ỗ, ộ, ớ, ờ, ở, ỡ, ợ, ụ, ủ, ứ, ừ, ử, ữ, ự, ỳ, ỵ, ỷ, ỹ"

Could you share me about the g2p and acoustic model you used? Thank you so much

How to add marker of sil, sp to TextGrid after MFA?

Hi @NTT123,
First of all thank you for your brilliant work! I have successfully trained my dataset with MFA, but it is not generated .TextGrid as a marker for silence, space. Could you please help me on how we can detect and add these symbol to the TextGrid file?

mfa version

hi
I want to train fastspeech2 on a Persian dataset with durations extracted from MFA. which version of mfa do U use for training? do you have any preprocessing on your datasets?

EOFError: Ran out of input

i got an error when continue train from my checkpoint
Resuming from latest checkpoint at /content/drive/MyDrive/vietTTS_Model/acoustic_latest_ckpt.pickle Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 183, in <module> train() File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 114, in train dic = pickle.load(f) EOFError: Ran out of input

number, date, ... to speech

What do I need to do to work with number, date,... ? normalize_text or config model

acoustic training generate strange melspectrogram

Hi @NTT123,
After I pulled the latest PR #22 and retried MFA with new format and add the --disable_textgrid_cleanup
The mel I generate is like this even if I train at 61k steps:

Previously, at this step my dataset can speak, but this always generate buzz sound.
Should I keep training?

ntt123 / viettts Goto Github PK

viettts's People

Contributors

Stargazers

Watchers

Forkers

viettts's Issues

2021-12-07 03:51:14.389456: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_executable.cc:124] Check failed: pair.first->SynchronizeAllActivity()

Recommend Projects

Recommend Topics

Recommend Org

Jobs