GithubHelp home page GithubHelp logo

ncsoft / avocodo Goto Github PK

View Code? Open in Web Editor NEW
150.0 4.0 21.0 18 KB

Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)

License: Other

Python 100.00%
gan pytorch vocoder avocodo

avocodo's People

Contributors

daeun0921 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

avocodo's Issues

The learning rate

Hello,

Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.

In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.

2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.

I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.

Can you explain the discrepancy?

Thank you

PQMF change for 32KHz version

Hi,

Thanks for sharing Avocodo, I'd like to use this vocoder on higher sampling rate, 32Khz. Can you give me some suggestions on how to change the PQMF part when training 32KHz Avocodo?

Thanks,
Bolong Wen

Feature matching loss increases

Hello, I'm training Avocodo Model with my own dataset consist of multiple datasets.

I touched some Generator's Parameter to change input and target sample rate. Generating 32kHz wave from 24kHz Mel. Hop size is 400.

When I train my avocodo model, Feature matching loss increases even Discriminator loss's descent stops.
As an aside, strangely enough, Mel Loss's descent, and the quality of the audio output is pretty good.

Is it normal while train vocoder? Will the feature matching loss`s acendent ever stop?

avocodo training

We'd love to hear about your experiences.

Thank you.

HYPER PARAMETERS
model:
  upsample_rates: '[[5], [5], [4], [4]]'
  upsample_kernel_sizes: '[[11], [11], [8], [8]]'
  upsample_initial_channel: 384
  resblock_kernel_sizes: '[3,7,11]'
  resblock_dilation_sizes: '[[1,3,5], [1,3,5], [1,3,5]]'
  projection_filters: '[0, 1, 1, 1]'
  projection_kernels: '[0, 5, 7, 11]'
  combd_h_u: '[[16, 64, 256, 1024, 1024, 1024], [16, 64, 256, 1024, 1024, 1024], [16,
    64, 256, 1024, 1024, 1024]]'
  combd_d_k: '[[7, 11, 11, 11, 11, 5], [11, 21, 21, 21, 21, 5], [15, 41, 41, 41, 41,
    5]]'
  combd_d_s: '[[1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1]]'
  combd_d_d: '[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]'
  combd_d_g: '[[1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256,
    1]]'
  combd_d_p: '[[3, 5, 5, 5, 5, 2], [5, 10, 10, 10, 10, 2], [7, 20, 20, 20, 20, 2]]'
  combd_op_f: '[1, 1, 1]'
  combd_op_k: '[3, 3, 3]'
  combd_op_g: '[1, 1, 1]'
  sbd_filters: '[[64, 128, 256, 256, 256],[64, 128, 256, 256, 256],[64, 128, 256,
    256, 256],[32, 64, 128, 128, 128]]'
  sbd_strides: '[[1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1]]'
  sbd_kernel_sizes: '[        [[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7]],        [[5,
    5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5]],        [[3, 3, 3],[3, 3, 3],[3,
    3, 3],[3, 3, 3],[3, 3, 3]],        [[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5,
    5, 5]]    ]'
  sbd_dilations: '[        [[5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7,
    11]],        [[3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7]],        [[1,
    2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]],        [[1, 2, 3], [1, 2,
    3], [1, 2, 3], [2, 3, 5], [2, 3, 5]]    ]'
  sbd_band_ranges: '[[0, 6], [0, 11], [0, 16], [0, 64]]'
  sbd_transpose: '[False, False, False, True]'
  model_pqmf_config: '{        ''sbd'': [16, 256, 0.03, 10.0],        ''fsbd'': [64,
    256, 0.1, 9.0]    }'
  segment_size: 32000
  pqmf_config: '{        ''lv1'': [4, 192, 0.25, 10.0],        ''lv2'': [16, 256,
    0.03, 10.0]    }'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.