GithubHelp home page GithubHelp logo

huyanxin / phasen Goto Github PK

View Code? Open in Web Editor NEW
212.0 8.0 49.0 2.12 MB

A unofficial Pytorch implementation of Microsoft's PHASEN

Python 93.63% Shell 6.37%
speech-enhancement pytorch-implementation pytorch

phasen's Introduction

PHASEN


Unofficial PyTorch implementation of MSRA's: PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network.


My resutls on real-world test

Noisy enh

Maybe there is something different with the paper, but it worked not bad.


how to use it?

  1. install dependency:
pip install -r requirements.txt
  1. download datasets

if you don't have WSJ0, you can follow this use aishell-1 by following this se-cldnn-torch

Attetion

There is something different from se-cldnn-torch: the two list for train (tr.lst, cv.lst ...) need duration information, but se-cldnn-torch dose not need it (because the two dataset.py are different).

So, in this repo, train and cross-validation list nead to be like this

/path/noisy1.wav /path/ref1.wav 3.0233
/path/noisy2.wav /path/ref2.wav 2.3213
/path/noisy2.wav /path/ref2.wav 8.8127
...

To add duration information, you can use tools/add_duration.py like:

python tools/add_duration.py data/tr_wsj0.lst

As for inference stage (decode stage, eval stage), the list only need the path of noisy path:

/path/noisy1.wav
/path/noisy2.wav
/path/noisy2.wav
...
  1. run. before you run it, please set the correct params in ./run_phasen.sh
bash run_phasen.sh

Reference:

funcwj's voice-filter

wangkenpu's Conv-Tasnet

pseeth's torch-stft

phasen's People

Contributors

huyanxin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phasen's Issues

How training with cpu

I want to run this script, but my computer does not have a GPU. I tried to use the CPU to train, but it failed. How can it be compatible with the CPU?

I got error:
outputs, wav = data_parallel(model, (inputs, ))
File "torch/nn/parallel/data_parallel.py", line 190, in data_parallel
output_device = device_ids[0]
IndexError: list index out of range

Mixloss 出现 nan

大佬,我用的是Mixloss,一运行loss就 nan.
1、LR 我已经设置很小了(0.00001);
2、没有/0 情况;请问还有可能是什么原因呢?

Loss does not decrease

你好,感谢您的复现工作,不过我使用自己的数据训练该模型,loss不会下降,请问我该如何排查原因?
我的数据为中英文均包含的干净录音,添加musan噪声后作为训练数据,使用mixloss,mixloss值稳定在40,sisnr值稳定在7~8之间,且不会下降和提升。

How to preprocess the data?

I am trying to reproduce the PHASEN, but I have a problem about data preprocessing. When the audio signal time is less than 4 seconds, what should I do?
I found this in your code
wave_inputs = np.concatenate([wave_inputs, wave_inputs[:segement_length-wave_inputs.shape[0]]])
wave_s1 = np.concatenate([wave_s1, wave_s1[:segement_length-wave_s1.shape[0]]])
What confused me is when segement_length-wave_inputs.shape[0]>wave_inputs.shape[0], the code won't work.

音频连接处有哒哒的声音或者消音的情况

你号,音频分成4秒每段进行语音增强后,在音频的连接处有哒哒的声音或者会出现消音的情况,将4s改成1s后的效果更加严重,这种情况可以采用什么方式去除呢?产生的原因是因为音频不连续吗?

Fix an error

I got an error (torch 1.10.0), and fix it by
phase_conv2(Conv1d) ------> phase_conv2(Conv2d)

Fix Nan loss

I got "Nan" when use Mix loss to train (not speech denoise task), and Fix it by adding grad clip as fellows:

loss.backward()
nn.utils.clip_grad_norm_(self.estimator.parameters(), 10.0) # add this to clip grad
self.optimizer.step()

time_dataset.py error

line 100:duration=item['duration']报错KeyError
查看了一下,target_list,没有‘duration’这一项,应该是数据处理那部分出错,但是用代码中的lst仍然报错,请问是哪一步出了错?

How to use tensorflow to conv_stft?

Hi,I use tensorflow to conv_stft like this:

def init_kernels(win_len, win_inc, fft_len, win_type=None, invers=False):
if win_type == 'None' or win_type is None:
window = np.ones(win_len)
else:
window = get_window(win_type, win_len, fftbins=True)**0.5

N = fft_len
fourier_basis = np.fft.rfft(np.eye(N))[:win_len]
real_kernel = np.real(fourier_basis)
imag_kernel = np.imag(fourier_basis)
kernel = np.concatenate([real_kernel, imag_kernel], 1).T

if invers :
    kernel = np.linalg.pinv(kernel).T 

kernel = kernel*window
kernel = kernel[:, None, :]
return tf.convert_to_tensor(kernel,tf.float32)

import torch.nn.functional as F

class ConvSTFT(tf.keras.layers.Layer):

def __init__(self, win_len=400, win_inc=200, fft_len=512, win_type='hanning', feature_type='real', fix=True):
    super(ConvSTFT, self).__init__() 
    
    self.fft_len = fft_len
    
    kernel= init_kernels(win_len, win_inc, self.fft_len, win_type)
    print('................',kernel.shape)
    self.weight = tf.Variable(kernel)
    self.feature_type = feature_type
    self.stride = win_inc
    self.win_len = win_len
    self.dim = self.fft_len

def call(self, inputs):

    outputs = F.conv1d(inputs, self.weight, stride=self.stride)
     
    output_list = []
    print("...............",outputs)
    dim = self.dim//2+1
    real = outputs[:, :dim, :]
    imag = outputs[:, dim:, :]
    output_list = [real,imag]
    return output_list

It is right?

Loss fitting

想问下这个模型较好的拟合,loss值要接近多少,用的是-5-20信噪比的aishell数据,目前相位loss有点大

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.