GithubHelp home page GithubHelp logo

seorim0 / dnn-based-speech-enhancement-in-the-frequency-domain Goto Github PK

View Code? Open in Web Editor NEW
49.0 2.0 14.0 878 KB

DNN-based SE in the frequency domain using Pytorch. You can test some state-of-the-art networks using T-F masking or spectral mapping method.

License: MIT License

Python 62.12% Jupyter Notebook 29.09% MATLAB 8.78%
deep-learning speech-enhancement pytorch dccrn crn fullsubnet

dnn-based-speech-enhancement-in-the-frequency-domain's Introduction

DNN-based Speech Enhancement in the frequency domain

You can do DNN-based speech enhancement(SE) in the frequency domain using various method through this repository.
First, you have to make noisy data by mixing clean speech and noise. The dataset is used for deep learning training.
And, you can adjust the type of the network and configuration in various ways, as shown below.
The results of the network can be evaluated through various objective metrics (PESQ, STOI, CSIG, CBAK, COVL).

You can change
  1. Networks
  2. Learning methods
  3. Loss functions

Requirements

This repository is tested on Ubuntu 20.04, and

  • Python 3.7
  • Cuda 11.1
  • CuDNN 8.0.5
  • Pytorch 1.9.0

Getting Started

  1. Install the necessary libraries
  2. Make a dataset for train and validation
    # The shape of the dataset
    [data_num, 2 (inputs and targets), sampling_frequency * data_length]   
    
    # For example, if you want to use 1,000 3-second data sets with a sampling frequency of 16k, the shape is,   
    [1000, 2, 48000]
  3. Set dataloader.py
    self.input_path = "DATASET_FILE_PATH"
  4. Set config.py
    # If you need to adjust any settings, simply change this file.   
    # When you run this project for the first time, you need to set the path where the model and logs will be saved. 
  5. Run train_interface.py

Tutorials

'SE_tutorials.ipynb' was made for tutorial.
You can simply train the CRN with the colab file without any preparation .

Networks

You can find a list that you can adjust in various ways at config.py, and they are:

  • Real network
    • convolutional recurrent network (CRN)
      it is a real version of DCCRN
    • FullSubNet [1]
  • Complex network
    • deep complex convolutional recurrent network (DCCRN) [2]

Learning Methods

  • T-F masking
  • Spectral mapping

Loss Functions

  • MSE
  • SDR
  • SI-SNR
  • SI-SDR

and you can join the loss functions with perceptual loss.

  • LMS
  • PMSQE

Tensorboard

As shown below, you can check whether the network is being trained well in real time through 'write_on_tensorboard.py'.

tensor

  • loss
  • pesq, stoi
  • spectrogram

Reference

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
[arXiv] [code]
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie
[arXiv] [code]
Other tools
https://github.com/usimarit/semetrics
https://ecs.utdallas.edu/loizou/speech/software.htm

dnn-based-speech-enhancement-in-the-frequency-domain's People

Contributors

seorim0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dnn-based-speech-enhancement-in-the-frequency-domain's Issues

Data process time

Hello. I am curious about how long does it take your model to process data? My model (DCCRN, LSTM: real, rnn_layers = 2, rnn_units = 256) takes about 1 second to process data, no matter it is 1 frame or 30 seconds audio, but I think 1 second is a bit long, especially for real-time processing. Is there any way to optimize the model to cut down the process time? Thank you.

如何将wav文件转为numpy?

你好,我看到你的dataloader里用torch直接导入np文件,但是数据的generation部分是直接生成的wav文件,可以分享一下如何将wav转为npy文件的过程吗? 感谢

pretrained model

Thanks for your template. Will you release pretrained checkpoint of FullsubNet? Cuz it seems like the official pretrained model does not fit your model.

how to prepare the dataset?

How to make a dataset for train and validation?
The instruction provide the shape of dataset, but which tool or command i need to run to prepare the dataset?

CRN训练问题

你好,请问你有做CRN的训练吗?我做CRN训练的时候发现数据如果不做归一化没法收敛,请问有遇到这种情况吗?或者CRN训练数据制作有哪些需要注意的,比如每个wav文件长度和归一化问题?

nested Unet

Hi, I see your new project of nested Unet in today afternoon, when I prepare to see more details, seems this project was deleted, that result is much better than DCCRN-E and Fullsubnet, hope to see it agian...

datasets

image
Hello, regarding the dataset, I don't understand how to do it? Do you have relevant code to process the dataset?

Colab Notebook With All Models

This is not an issue but more like an attempt to share something that may be helpful.

While I was working on a project, I created a notebook to experiment with different models using the DNS challenge dataset. I thought it would be helpful to share the notebook in case anyone else is interested in trying out the models on Colab. Feel free to give it a go! Notebook

error when running SE_tutorials.ipynb (maybe dataloading problem...)

안녕하세요 :) 한국사람이신것 같아서 한국어로 질문 남깁니다!
딥러닝 처음으로 돌려보는 학부생이라서,, 너무 무식한 질문이라고 생각하실수도 있지만,, 답변해주시면 정말 감사하겠습니다 :)

올려주신 SE_tutorials.ipynb그대로를 colab으로 돌려봤는데 마지막 셀인 Train_interface의 Train의 Validation부분에 다음과 같은 오류가 떴습니다

`2022-6-28 9:13:53

total params : 1703436 (1.70 M, 6.81 MBytes)

Load the data... Load the data... Starting new training run... 300/300: [================================================>.] - ETA 4.9s 10/50: [>.................................................] - ETA 0.0s --------------------------------------------------------------------------- NoUtterancesError Traceback (most recent call last) [](https://localhost:8080/#) in () 114 115 # Validation --> 116 vali_loss, vali_pesq, vali_stoi = model_validate(model, validation_loader, dir_to_save, epoch, DEVICE) 117 model_validate(model, validation_loader, dir_to_save, epoch, DEVICE) 118

5 frames
/usr/local/lib/python3.7/dist-packages/pesq/cypesq.pyx in cypesq.cypesq()

NoUtterancesError: b'No utterances detected'`

아마 data set의 load가 되지않아서 data가 인식이 안되지않았나 싶습니다. 코드를 다 살펴봤는데 data의 경로를 지정해준 부분이 없는 것 같아 보이더라구요..!

그래서 Dataloader 셀에서 data path부분에 있던 주석처리를 해제하고 경로 설정을 해주고, 경로설정을 하면서 glob를 사용해 Requirement 셀에 import glob, os를 추가해주었습니다. 그래도 data load 때문에 또 마지막 셀인 Train_interface에서 오류가 생기는데 무슨 문제인지 확인해주실수 있을까요?? 고친 코드는 https://colab.research.google.com/drive/1-Hw6kxeGe_hPkxpgBZ8r8Z-ZWTqz52RL?usp=sharing 여기에 있습니다!

질문 읽어주셔서 감사합니다! 좋은 하루 보내세요! :)

About octave

Hello, dear author.

I would like to use MOS tool under tools_for_estimate.py. But I am prompted that I need the octave component. It took me a long time to install it but it didn't work. I would like to ask the author how to install it. Or is there a dockerfile available?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.