isletennos / mmvc_trainer Goto Github PK
View Code? Open in Web Editor NEWAIを使ったリアルタイムボイスチェンジャー(Trainer)
AIを使ったリアルタイムボイスチェンジャー(Trainer)
すばらしいプロジェクトを公開してくださりありがとうございます!
WSL2+Dockerで学習してみたいので、もし既に作成済みのDockerfileをお持ちでしたら、共有していただくことは可能でしょうか?
よろしくお願いいたします!
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGSEGV
というエラーが出て実行できませんでした。
まだcolab上でしか動作しませんか?
I have some questions about MMVC_Trainer.
(1) G_180000.pth and D_180000.pth
In the fine_model, there are G_180000.pth and D_180000.pth model files.
What is G_180000.pth for?
What is D_180000.pth for?
(2) G_latest_99999999.pth and D_latest_99999999.pth
In the logs/20220306_24000, there are G_latest_99999999.pth and D_latest_99999999.pth model files.
What kind of training is done for G_latest_99999999.pth?
What kind of training is done for D_latest_99999999.pth?
READMEの「Open in Colab」を押下して00_Clone_Repo.ipynb
を実行後、00_Rec_Voice.ipynb
を実行して録音作業を行ったところ、録音終了時に下記のエラーが発生しました。
librosa.displayのwaveplotはlibrosa 0.9で削除されたメソッドのようですが、意図せず想定よりも新しすぎるバージョンがインストールされてしまっているということでしょうか?
※ waveplotをwaveshowに書き換えれば一応動くようです。これが意図した表示かはわからないですが…
えっ嘘でしょ。
えっうそでしょ。
---終了---
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-65ce3032714f> in <cell line: 1>()
----> 1 rec(3, "emotion001", "えっ嘘でしょ。", "えっうそでしょ。")
<ipython-input-10-0d9a847134a7> in rec(sec, filename, text, hira)
72 with open(mytext_dir + filename + ".txt", 'w') as mytext:
73 mytext.write(hira)
---> 74 librosa.display.waveplot(speecht, sr=rate)
75 plt.show()
76 display(Audio(speecht, rate=rate))
AttributeError: module 'librosa.display' has no attribute 'waveplot'
This is maybe a bad question but the start_https.bat file is not included in any zip files I dl. Where is that file?
In this repository, Which of the following options is the character to be studied?
(1) source character
(2) target character
(3) both source character and target character
UsageError: Cell magic %%ccapture
not found.
となる
I run 03_MMVC_Interface.ipynb and I have questions about it.
(1) SOURCE_SPEAKER_ID
SOURCE_SPEAKER_ID is preset as 107.
Then, I'd like to use many source speaker trained model.
How do I set the ID number for them?
(2) TARGET_ID
TARGET_ID is preset as 100.
Then, I'd like to use many target speaker trained model.
How do I set the ID number for them?
(3) TARGET_ID trained model
source speaker's trained model is saved in log folder.
Where should I put the target speakder trained model?
Google Colabにて Train_MMVC.ipynb
の 以下のセルを実行した際に、
!python train_ms.py -c configs/jsontest.json -m 20220311_24000 -fg fine_model/G_232000.pth -fd fine_model/D_232000.pth
以下のようなエラーが出てました。
[INFO] {'train': {'log_interval': 1000, 'eval_interval': 4000, 'seed': 1234, 'epochs': 10000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 4096, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/jsontest_textful.txt', 'validation_files': 'filelists/jsontest_textful_val.txt', 'training_files_notext': 'filelists/jsontest_textless.txt', 'validation_files_notext': 'filelists/jsontest_val_textless.txt', 'text_cleaners': ['japanese_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 24000, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 104, 'cleaned_text': False}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 8, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'fine_flag': True, 'fine_model_g': 'fine_model/G_232000.pth', 'fine_model_d': 'fine_model/D_232000.pth', 'model_dir': './logs/20220311_24000'}
0it [00:00, ?it/s]
0it [00:00, ?it/s]
[INFO] FineTuning : True
[INFO] Load model : fine_model/G_232000.pth
[INFO] Load model : fine_model/D_232000.pth
Traceback (most recent call last):
File "train_ms.py", line 303, in <module>
main()
File "train_ms.py", line 53, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/content/drive/MyDrive/MMVC_Trainer/train_ms.py", line 108, in run
_, _, _, epoch_str = utils.load_checkpoint(hps.fine_model_g, net_g, optim_g)
File "/content/drive/MyDrive/MMVC_Trainer/utils.py", line 38, in load_checkpoint
model.module.load_state_dict(new_state_dict)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SynthesizerTrn:
size mismatch for emb_g.weight: copying a param with shape torch.Size([106, 256]) from checkpoint, the shape in current model is torch.Size([104, 256]).
実行したノートブックのURLは以下にあります。なにかの参考になれば幸いです🙇
https://colab.research.google.com/drive/1VWYkTNjftG3MeCSdgesiPgw5NIE9E0WN?authuser=1
Hi! Thanks for the amazing open source work!
I was looking through onnx_export.py
and onnx_bench.py
and I was wondering how to run it end to end in a standalone Colab notebook.
Specifically, how do we replace dummy_specs = torch.rand(1, 257, 60)
with a mp3/wav
audio (of variable time length) converted to a torch Tensor (by rmvpe
model? I'm really new to speech model architectures so not sure) with the ONNX converted checkpoint.
Thanks
MMVC_Trainerの設定ファイルの作成のipynbをgoogle colabで実行した際、4番のconfig系Fileを作成する
セルの出力末尾に以下のエラーが出てbaseconfigではないファイルが生成されません
...(略)
['らーてゃん。']
WARNING: JPCommonLabel_insert_pause() in jpcommon_label.c: First mora should not be short pause.
sil-r-a-a-ty-a-N-sil
dataset/textful/00_myvoice/wav/emotion099.wav|0|sil-r-a-a-ty-a-N-sil
Errordataset/textful/01_target/wav に音声データがありません
5番の確認セル出力
Directory: filelists
Directory: configs
baseconfig.json
I've seen the tutorials to set up MMVC and everything installed but the tutorial videos never explain how to generate the .json files needed to then train with Colab again. I've looked in forums with no luck. Is there something I'm missing or is that information region-locked to Japan? I really want to get this software working, I don't want to have to get a new computer and GPU to use the original w-okada version.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.