I think the upsampling network can be replaced by a single torch.nn.Upsample operation

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Upsampling network may be simplified about wavernn HOT 7 CLOSED

fatchord commented on July 17, 2024

Upsampling network may be simplified

from wavernn.

Comments (7)

geneing commented on July 17, 2024 1

@fatchord I find that it's almost like magic that upsampling network finds a reasonable interpolation method all by itself. Convolution with a gaussian kernel is a reasonable interpolation method.

I trained a network with simplified upsampling and it produces very similar results, it just has fewer trainable parameters.

from wavernn.

fatchord commented on July 17, 2024

@geneing I probably would've agreed had someone not mentioned to me that my upsampling "is basically a gaussian convolution with a time-shift". So I had a look at the kernel weights after training and this is what I found:

That first kernel is going to be more important overall and I reckon that does indeed look something like a guassian but shifted to the right a bit. I checked out another model and found more or less the same thing.

What do you make of it?

from wavernn.

bliep commented on July 17, 2024

@fatchord here's the kernel reponses and the total impulse response of the upsampling layers (my kernels are a bit smoother since my model differs a bit from yours):

from wavernn.

bliep commented on July 17, 2024

btw, linear interpolation is just convolution with a 'triangular' filter so linear upsampling might indeed be just as ok 👍

from wavernn.

hdmjdp commented on July 17, 2024

@geneing so you just use model="linear" interpolation method ? I think 1d-resnet need a larger computation resource compared to upsample.

from wavernn.

fatchord commented on July 17, 2024

@geneing Maybe I'm totally off the mark here but hear me out... the fact that the guassian is shifted forward in time intrigues me.

I mean - what if one tried shifting the conditioning features back in time by the same amount as the offset in the the guassian? Would the guassian end up centred? Could one interpret the offset in such a manner as to say 'this model is prioritising future conditioning features over current? Why not have two upsampling networks and give both a slice of current and future conditioning features?

Sorry, that's a lot of questions but I'm just curious what you'd think about that line of reasoning.

from wavernn.

G-Wang commented on July 17, 2024

@geneing , @fatchord Amazon recently had a paper on implementing a universal neural vocoder here: https://arxiv.org/abs/1811.06292.

their architecture is quite simple, unidirectional rnn followed by dense layer then softmax (with 10 bit mu-law), conditioned on outputs from an up-sampling network (they use rnns for upsampling). From reading the paper it seems the most important thing to have is a variety of dataset (74 speakers, 17 different languages, etc).

from wavernn.

Upsampling network may be simplified about wavernn HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs