The avocodo from ncsoft

The learning rate

Hello,

Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.

In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.

2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.

I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.

Can you explain the discrepancy?

Thank you

Would you please provide some pretrained model

Nice work! The example results sound promising. It would be better if you could provide some pretrained models.

Is teacher forcing training strategy needed for TTS?

As I saw in HiFiGAN, after training using GT mels, they further used teacher forcing mels from TTS inference to fine tune the model, and got better result.
Is this strategy also suitable for avocodo?

PQMF change for 32KHz version

Hi,

Thanks for sharing Avocodo, I'd like to use this vocoder on higher sampling rate, 32Khz. Can you give me some suggestions on how to change the PQMF part when training 32KHz Avocodo?

Thanks,
Bolong Wen

Feature matching loss increases

Hello, I'm training Avocodo Model with my own dataset consist of multiple datasets.

I touched some Generator's Parameter to change input and target sample rate. Generating 32kHz wave from 24kHz Mel. Hop size is 400.

When I train my avocodo model, Feature matching loss increases even Discriminator loss's descent stops.
As an aside, strangely enough, Mel Loss's descent, and the quality of the audio output is pretty good.

Is it normal while train vocoder? Will the feature matching loss`s acendent ever stop?

We'd love to hear about your experiences.

Thank you.

HYPER PARAMETERS
model:
  upsample_rates: '[[5], [5], [4], [4]]'
  upsample_kernel_sizes: '[[11], [11], [8], [8]]'
  upsample_initial_channel: 384
  resblock_kernel_sizes: '[3,7,11]'
  resblock_dilation_sizes: '[[1,3,5], [1,3,5], [1,3,5]]'
  projection_filters: '[0, 1, 1, 1]'
  projection_kernels: '[0, 5, 7, 11]'
  combd_h_u: '[[16, 64, 256, 1024, 1024, 1024], [16, 64, 256, 1024, 1024, 1024], [16,
    64, 256, 1024, 1024, 1024]]'
  combd_d_k: '[[7, 11, 11, 11, 11, 5], [11, 21, 21, 21, 21, 5], [15, 41, 41, 41, 41,
    5]]'
  combd_d_s: '[[1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1]]'
  combd_d_d: '[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]'
  combd_d_g: '[[1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256,
    1]]'
  combd_d_p: '[[3, 5, 5, 5, 5, 2], [5, 10, 10, 10, 10, 2], [7, 20, 20, 20, 20, 2]]'
  combd_op_f: '[1, 1, 1]'
  combd_op_k: '[3, 3, 3]'
  combd_op_g: '[1, 1, 1]'
  sbd_filters: '[[64, 128, 256, 256, 256],[64, 128, 256, 256, 256],[64, 128, 256,
    256, 256],[32, 64, 128, 128, 128]]'
  sbd_strides: '[[1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1]]'
  sbd_kernel_sizes: '[        [[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7]],        [[5,
    5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5]],        [[3, 3, 3],[3, 3, 3],[3,
    3, 3],[3, 3, 3],[3, 3, 3]],        [[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5,
    5, 5]]    ]'
  sbd_dilations: '[        [[5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7,
    11]],        [[3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7]],        [[1,
    2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]],        [[1, 2, 3], [1, 2,
    3], [1, 2, 3], [2, 3, 5], [2, 3, 5]]    ]'
  sbd_band_ranges: '[[0, 6], [0, 11], [0, 16], [0, 64]]'
  sbd_transpose: '[False, False, False, True]'
  model_pqmf_config: '{        ''sbd'': [16, 256, 0.03, 10.0],        ''fsbd'': [64,
    256, 0.1, 9.0]    }'
  segment_size: 32000
  pqmf_config: '{        ''lv1'': [4, 192, 0.25, 10.0],        ''lv2'': [16, 256,
    0.03, 10.0]    }'

ncsoft / avocodo Goto Github PK

avocodo's People

Contributors

Stargazers

Watchers

Forkers

avocodo's Issues

The learning rate

Would you please provide some pretrained model

Is teacher forcing training strategy needed for TTS?

PQMF change for 32KHz version

Feature matching loss increases

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs