archivoice / diff-svc-notebooks Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 137 KB

Jupyter Notebook 100.00%

diff-svc-notebooks's People

Contributors

Watchers

Forkers

sillylili musickiss

diff-svc-notebooks's Issues

If you get this error: 'HifiGAN' object has no attribute 'device'

#27
Got this error as well, but what I did was add a line under spec2wav:

self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

before device = self.device
and shift
/content/diff-svc/checkpoints/checkpoints/0109_hifigan_bigpopcs_hop128 to /content/diff-svc/checkpoints/0109_hifigan_bigpopcs_hop128

Step 5: Training AttributeError: 'HifiGAN' object has no attribute 'device'

;33;muse_spk_id: False, ;33;muse_split_spk_id: False, ;33;muse_uv: False, ;33;muse_var_enc: False, ;33;muse_vec: False,
;33;mval_check_interval: 1000, ;33;mvalid_num: 0, ;33;mvalid_set_name: valid, ;33;mvalidate: False, ;33;mvocoder: network.vocoders.hifigan.HifiGAN,
;33;mvocoder_ckpt: checkpoints/0109_hifigan_bigpopcs_hop128, ;33;mwarmup_updates: 2000, ;33;mwav2spec_eps: 1e-6, ;33;mweight_decay: 0, ;33;mwin_size: 512,
;33;mwork_dir: checkpoints/huji,
| Mel losses: {'ssim': 0.5, 'l1': 0.5}
12/14 03:50:58 AM gpu available: True, used: True
| load 'model' from '/content/diff-svc/pretrain/opencpop.ckpt'.
| model Trainable Parameters: 39.915M
Validation sanity check: 0% 0/1 [00:00<?, ?batch/s]
sample time step: 0% 0/100 [00:00<?, ?it/s]
sample time step: 4% 4/100 [00:00<00:02, 32.05it/s]
sample time step: 8% 8/100 [00:00<00:02, 31.75it/s]
sample time step: 12% 12/100 [00:00<00:02, 33.70it/s]
sample time step: 16% 16/100 [00:00<00:02, 34.73it/s]
sample time step: 20% 20/100 [00:00<00:02, 35.29it/s]
sample time step: 24% 24/100 [00:00<00:02, 35.64it/s]
sample time step: 28% 28/100 [00:00<00:02, 35.94it/s]
sample time step: 32% 32/100 [00:00<00:01, 35.98it/s]
sample time step: 36% 36/100 [00:01<00:01, 36.20it/s]
sample time step: 40% 40/100 [00:01<00:01, 36.23it/s]
sample time step: 44% 44/100 [00:01<00:01, 36.18it/s]
sample time step: 48% 48/100 [00:01<00:01, 36.21it/s]
sample time step: 52% 52/100 [00:01<00:01, 36.24it/s]
sample time step: 56% 56/100 [00:01<00:01, 36.24it/s]
sample time step: 60% 60/100 [00:01<00:01, 36.30it/s]
sample time step: 64% 64/100 [00:01<00:00, 36.21it/s]
sample time step: 68% 68/100 [00:01<00:00, 36.35it/s]
sample time step: 72% 72/100 [00:02<00:00, 36.35it/s]
sample time step: 76% 76/100 [00:02<00:00, 36.29it/s]
sample time step: 80% 80/100 [00:02<00:00, 36.30it/s]
sample time step: 84% 84/100 [00:02<00:00, 36.29it/s]
sample time step: 88% 88/100 [00:02<00:00, 36.29it/s]
sample time step: 92% 92/100 [00:02<00:00, 36.24it/s]
sample time step: 96% 96/100 [00:02<00:00, 36.26it/s]
sample time step: 100% 100/100 [00:02<00:00, 35.89it/s]
Traceback (most recent call last):
File "run.py", line 15, in
run_task()
File "run.py", line 11, in run_task
task_cls.start()
File "/content/diff-svc/training/task/base_task.py", line 234, in start
trainer.fit(task)
File "/content/diff-svc/utils/pl_utils.py", line 495, in fit
self.run_pretrain_routine(model)
File "/content/diff-svc/utils/pl_utils.py", line 571, in run_pretrain_routine
self.evaluate(model, self.get_val_dataloaders(), self.num_sanity_val_steps, self.testing)
File "/content/diff-svc/utils/pl_utils.py", line 1192, in evaluate
output = self.evaluation_forward(model,
File "/content/diff-svc/utils/pl_utils.py", line 1316, in evaluation_forward
output = model.validation_step(*args)
File "/content/diff-svc/training/task/SVC_task.py", line 155, in validation_step
self.plot_wav(batch_idx, sample['mels'], model_out['mel_out'], is_mel=True, gt_f0=gt_f0, f0=pred_f0)
File "/content/diff-svc/training/task/SVC_task.py", line 218, in plot_wav
gt_wav = self.vocoder.spec2wav(gt_wav, f0=gt_f0)
File "/content/diff-svc/network/vocoders/hifigan.py", line 63, in spec2wav
device = self.device
AttributeError: 'HifiGAN' object has no attribute 'device'

how to resume training

I saved the checkpoint with step 14000 in drive with Step 6: Package Model

then try to continue training by unzipping the zip in /content/diff-svc/checkpoints but the training starts from step 0 or the global step is also 0, how to continue from 14000?

ModuleNotFoundError: No module named 'parselmouth'

Trying to manually install it throws:

Collecting parselmouth
  Using cached parselmouth-1.1.1.tar.gz (33 kB)
  Preparing metadata (setup.py) ... done
Collecting googleads==3.8.0 (from parselmouth)
  Using cached googleads-3.8.0.tar.gz (23 kB)
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Training notebook TypeError

Today I attempted to use the training notebook, and everything was going alright until I got to the cell labeled "Step 5: Training". Upon running it, I get the following error:

Traceback (most recent call last):
  File "run.py", line 15, in <module>
    run_task()
  File "run.py", line 11, in run_task
    task_cls.start()
  File "/content/diff-svc/training/task/base_task.py", line 234, in start
    trainer.fit(task)
  File "/content/diff-svc/utils/pl_utils.py", line 489, in fit
    self.optimizers, self.lr_schedulers = self.init_optimizers(model.configure_optimizers())
  File "/content/diff-svc/training/task/base_task.py", line 174, in configure_optimizers
    optm = self.build_optimizer(self.model)
  File "/content/diff-svc/training/task/SVC_task.py", line 65, in build_optimizer
    weight_decay=hparams['weight_decay'])
  File "/usr/local/lib/python3.7/dist-packages/torch/optim/adamw.py", line 78, in __init__
    if not 0.0 <= lr:
TypeError: '<=' not supported between instances of 'float' and 'str'

Can I active pretrained model after 40k steps ?

Hi without using pretrained model I made 40k steps , now I want to use nyaru pertained model and continue training to 100k steps does this make voice better or I have to start from 0 again? Thank you .

Error: HifiGAN model file is not found!

i'm having problems in the 5° step of the training notebook.
i'm trying to train a voice from 0, using 44.1khz samples

complete output of step 5:

/content/diff-svc
| Hparams chains: ['/content/diff-svc/training/config_nsf.yaml']
| Hparams:
;33;mK_step: 1000, ;33;maccumulate_grad_batches: 1, ;33;maudio_num_mel_bins: 128, ;33;maudio_sample_rate: 44100, ;33;mbinarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False},
;33;mbinarizer_cls: preprocessing.SVCpre.SVCBinarizer, ;33;mbinary_data_dir: data/binary/dross, ;33;mcheck_val_every_n_epoch: 10, ;33;mchoose_test_manually: False, ;33;mclip_grad_norm: 1,
;33;mconfig_path: training/config_nsf.yaml, ;33;mcontent_cond_steps: [], ;33;mcwt_add_f0_loss: False, ;33;mcwt_hidden_size: 128, ;33;mcwt_layers: 2,
;33;mcwt_loss: l1, ;33;mcwt_std_scale: 0.8, ;33;mdatasets: ['opencpop'], ;33;mdebug: False, ;33;mdec_ffn_kernel_size: 9,
;33;mdec_layers: 4, ;33;mdecay_steps: 20000, ;33;mdecoder_type: fft, ;33;mdict_dir: , ;33;mdiff_decoder_type: wavenet,
;33;mdiff_loss_type: l2, ;33;mdilation_cycle_length: 4, ;33;mdropout: 0.1, ;33;mds_workers: 4, ;33;mdur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'],
;33;mdur_loss: mse, ;33;mdur_predictor_kernel: 3, ;33;mdur_predictor_layers: 5, ;33;menc_ffn_kernel_size: 9, ;33;menc_layers: 4,
;33;mencoder_K: 8, ;33;mencoder_type: fft, ;33;mendless_ds: False, ;33;mf0_bin: 256, ;33;mf0_max: 1100.0,
;33;mf0_min: 40.0, ;33;mffn_act: gelu, ;33;mffn_padding: SAME, ;33;mfft_size: 2048, ;33;mfmax: 16000,
;33;mfmin: 40, ;33;mfs2_ckpt: , ;33;mgaussian_start: True, ;33;mgen_dir_name: , ;33;mgen_tgt_spk_id: -1,
;33;mhidden_size: 256, ;33;mhop_size: 512, ;33;mhubert_gpu: True, ;33;mhubert_path: checkpoints/hubert/hubert_soft.pt, ;33;minfer: False,
;33;mkeep_bins: 128, ;33;mlambda_commit: 0.25, ;33;mlambda_energy: 0.0, ;33;mlambda_f0: 1.0, ;33;mlambda_ph_dur: 0.3,
;33;mlambda_sent_dur: 1.0, ;33;mlambda_uv: 1.0, ;33;mlambda_word_dur: 1.0, ;33;mload_ckpt: , ;33;mlog_interval: 100,
;33;mloud_norm: False, ;33;mlr: 0.0008, ;33;mmax_beta: 0.02, ;33;mmax_epochs: 3000, ;33;mmax_eval_sentences: 1,
;33;mmax_eval_tokens: 60000, ;33;mmax_frames: 42000, ;33;mmax_input_tokens: 60000, ;33;mmax_sentences: 12, ;33;mmax_tokens: 128000,
;33;mmax_updates: 1000000, ;33;mmel_loss: ssim:0.5|l1:0.5, ;33;mmel_vmax: 1.5, ;33;mmel_vmin: -6.0, ;33;mmin_level_db: -120,
;33;mno_fs2: True, ;33;mnorm_type: gn, ;33;mnum_ckpt_keep: 10, ;33;mnum_heads: 2, ;33;mnum_sanity_val_steps: 1,
;33;mnum_spk: 1, ;33;mnum_test_samples: 0, ;33;mnum_valid_plots: 10, ;33;moptimizer_adam_beta1: 0.9, ;33;moptimizer_adam_beta2: 0.98,
;33;mout_wav_norm: False, ;33;mpe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, ;33;mpe_enable: False, ;33;mperform_enhance: True, ;33;mpitch_ar: False,
;33;mpitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], ;33;mpitch_extractor: parselmouth, ;33;mpitch_loss: l2, ;33;mpitch_norm: log, ;33;mpitch_type: frame,
;33;mpndm_speedup: 10, ;33;mpre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, ;33;mpre_align_cls: data_gen.singing.pre_align.SingingPreAlign, ;33;mpredictor_dropout: 0.5, ;33;mpredictor_grad: 0.1,
;33;mpredictor_hidden: -1, ;33;mpredictor_kernel: 5, ;33;mpredictor_layers: 5, ;33;mprenet_dropout: 0.5, ;33;mprenet_hidden_size: 256,
;33;mpretrain_fs_ckpt: , ;33;mprocessed_data_dir: xxx, ;33;mprofile_infer: False, ;33;mraw_data_dir: data/raw/dross, ;33;mref_norm_layer: bn,
;33;mrel_pos: True, ;33;mreset_phone_dict: True, ;33;mresidual_channels: 384, ;33;mresidual_layers: 20, ;33;msave_best: False,
;33;msave_ckpt: True, ;33;msave_codes: ['configs', 'modules', 'src', 'utils'], ;33;msave_f0: True, ;33;msave_gt: False, ;33;mschedule_type: linear,
;33;mseed: 1234, ;33;msort_by_len: True, ;33;mspeaker_id: dross, ;33;mspec_max: [0.2816964089870453, 0.6110045313835144, 0.7528443932533264, 0.7719852328300476, 0.7578747868537903, 0.7870495319366455, 0.928855836391449, 0.915518581867218, 0.9106525182723999, 1.052038311958313, 1.0322246551513672, 0.9403936266899109, 1.0780105590820312, 1.022165298461914, 0.98377925157547, 1.0139509439468384, 1.0601212978363037, 0.9910836219787598, 0.9987587332725525, 0.8733547925949097, 0.8284812569618225, 0.8044165968894958, 0.8117375373840332, 0.7631716132164001, 0.8004911541938782, 0.8732689023017883, 0.8700592517852783, 0.837287425994873, 0.8866966366767883, 0.8396021127700806, 0.819175660610199, 0.9263454079627991, 0.880441427230835, 0.8278772234916687, 0.8070288300514221, 0.82593834400177, 0.9075272679328918, 0.7374939918518066, 0.7339229583740234, 0.5838717222213745, 0.7390212416648865, 0.5914533138275146, 0.6568486094474792, 0.7018999457359314, 0.650595486164093, 0.7557802200317383, 0.6265286803245544, 0.6484942436218262, 0.547179639339447, 0.5296093821525574, 0.40601256489753723, 0.37959158420562744, 0.4374527037143707, 0.3697531819343567, 0.30621394515037537, 0.3554210066795349, 0.3598262369632721, 0.3712518513202667, 0.216549813747406, 0.30987581610679626, 0.3893497586250305, 0.2443387508392334, 0.24721182882785797, 0.4849996268749237, 0.4686632752418518, 0.15373729169368744, 0.189516082406044, 0.1884053349494934, 0.16127777099609375, 0.3267746567726135, 0.22321538627147675, 0.12231604009866714, 0.19100888073444366, 0.06677097827196121, 0.15172165632247925, 0.004269076976925135, 0.07318542897701263, 0.0790969505906105, 0.045008596032857895, -0.0033863128628581762, -0.07382304221391678, -0.06529872864484787, -0.06318709254264832, -0.16331058740615845, -0.2981128394603729, -0.37530261278152466, -0.4302491545677185, -0.35962632298469543, -0.06664707511663437, -0.009034375660121441, -0.07002700865268707, -0.17129261791706085, -0.13444754481315613, -0.04389636218547821, 0.16330686211585999, -0.029020613059401512, -0.2405114322900772, -0.287506639957428, -0.23881807923316956, -0.22608397901058197, -0.3683353662490845, -0.4233062267303467, -0.40162914991378784, -0.3776197135448456, -0.39424625039100647, -0.4183795750141144, -0.599024772644043, -0.6727768182754517, -0.6512080430984497, -0.6985474824905396, -0.7823814749717712, -0.7961130738258362, -0.8495840430259705, -0.8956512212753296, -0.9007495045661926, -0.8376040458679199, -0.978445291519165, -0.9590984582901001, -1.0561996698379517, -1.038326621055603, -1.0919842720031738, -0.9782500267028809, -0.8888759016990662, -1.0536094903945923, -1.132426142692566, -1.1358226537704468, -1.2419252395629883, -1.0913069248199463], ;33;mspec_min: [-4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999879837036133, -4.999994277954102, -4.984480857849121, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102],
;33;mspk_cond_steps: [], ;33;mstop_token_weight: 5.0, ;33;mtask_cls: training.task.SVC_task.SVCTask, ;33;mtest_ids: [], ;33;mtest_input_dir: ,
;33;mtest_num: 0, ;33;mtest_prefixes: ['test'], ;33;mtest_set_name: test, ;33;mtimesteps: 1000, ;33;mtrain_set_name: train,
;33;muse_crepe: True, ;33;muse_denoise: False, ;33;muse_energy_embed: False, ;33;muse_gt_dur: False, ;33;muse_gt_f0: False,
;33;muse_midi: False, ;33;muse_nsf: True, ;33;muse_pitch_embed: True, ;33;muse_pos_embed: True, ;33;muse_spk_embed: False,
;33;muse_spk_id: False, ;33;muse_split_spk_id: False, ;33;muse_uv: False, ;33;muse_var_enc: False, ;33;muse_vec: False,
;33;mval_check_interval: 1000, ;33;mvalid_num: 0, ;33;mvalid_set_name: valid, ;33;mvalidate: False, ;33;mvocoder: network.vocoders.nsf_hifigan.NsfHifiGAN,
;33;mvocoder_ckpt: checkpoints/nsf_hifigan/model, ;33;mwarmup_updates: 2000, ;33;mwav2spec_eps: 1e-6, ;33;mweight_decay: 0, ;33;mwin_size: 2048,
;33;mwork_dir: /content/drive/MyDrive/diff-svc/dross,
| Mel losses: {'ssim': 0.5, 'l1': 0.5}
Error: HifiGAN model file is not found!
12/07 12:58:18 PM gpu available: True, used: True
| model Trainable Parameters: 33.709M
Validation sanity check: 0% 0/1 [00:00<?, ?batch/s]
sample time step: 0% 0/100 [00:00<?, ?it/s]
sample time step: 6% 6/100 [00:00<00:01, 52.70it/s]
sample time step: 12% 12/100 [00:00<00:01, 50.63it/s]
sample time step: 21% 21/100 [00:00<00:01, 64.14it/s]
sample time step: 29% 29/100 [00:00<00:01, 67.10it/s]
sample time step: 37% 37/100 [00:00<00:00, 69.55it/s]
sample time step: 46% 46/100 [00:00<00:00, 73.67it/s]
sample time step: 54% 54/100 [00:00<00:00, 73.50it/s]
sample time step: 62% 62/100 [00:00<00:00, 74.58it/s]
sample time step: 71% 71/100 [00:00<00:00, 77.50it/s]
sample time step: 79% 79/100 [00:01<00:00, 75.66it/s]
sample time step: 88% 88/100 [00:01<00:00, 77.15it/s]
sample time step: 100% 100/100 [00:01<00:00, 72.19it/s]
Traceback (most recent call last):
File "run.py", line 15, in
run_task()
File "run.py", line 11, in run_task
task_cls.start()
File "/content/diff-svc/training/task/base_task.py", line 234, in start
trainer.fit(task)
File "/content/diff-svc/utils/pl_utils.py", line 495, in fit
self.run_pretrain_routine(model)
File "/content/diff-svc/utils/pl_utils.py", line 571, in run_pretrain_routine
self.evaluate(model, self.get_val_dataloaders(), self.num_sanity_val_steps, self.testing)
File "/content/diff-svc/utils/pl_utils.py", line 1192, in evaluate
output = self.evaluation_forward(model,
File "/content/diff-svc/utils/pl_utils.py", line 1316, in evaluation_forward
output = model.validation_step(*args)
File "/content/diff-svc/training/task/SVC_task.py", line 155, in validation_step
self.plot_wav(batch_idx, sample['mels'], model_out['mel_out'], is_mel=True, gt_f0=gt_f0, f0=pred_f0)
File "/content/diff-svc/training/task/SVC_task.py", line 218, in plot_wav
gt_wav = self.vocoder.spec2wav(gt_wav, f0=gt_f0)
File "/content/diff-svc/network/vocoders/nsf_hifigan.py", line 48, in spec2wav
if self.h.sampling_rate != hparams['audio_sample_rate']:
AttributeError: 'NsfHifiGAN' object has no attribute 'h'

i'm using the latest notebook, i did not have a problem with the previous one.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble