This repo provides the network code and the processed samples of the manuscript "Glance and Gaze: A Collaborative Learning Framework for Single-channel Speech Enhancement", which was accepted by Elsevier Applied Acoustics.
For the InstanceNorm2d, the input shape is [batch, channel, num_frames, freq_feature_size], the mean and variance are calculated per [num_frames, freq_feature_size], which contain the all frames. So, the InstanceNorm2d seems to be non-causal.
Sorry to bother you and thank for your opening of the model.
I try to train the model and meet a little problem.
After training the model, I would get length of three list from model (GGMs, default=3)
If I want to execute istft to sythesize the result, which index's inferenced tensor should i choose?
Thanks!