GithubHelp home page GithubHelp logo

yxlu-0102 / mp-senet Goto Github PK

View Code? Open in Web Editor NEW
274.0 5.0 42.0 477.74 MB

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

License: MIT License

Python 100.00%
phase-estimation pytorch speech-enhancement mp-senet

mp-senet's Introduction

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

In our paper, we proposed MP-SENet: a TF-domain monaural SE model with parallel magnitude and phase spectra denoising.
We provide our implementation as open source in this repository.

Abstract: This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public VoiceBank+DEMAND dataset and outperforms existing advanced SE methods.

A long-version MP-SENet is available on arxiv now. Audio samples can be found here.

This source code is only for the MP-SENet accepted by Interspeech 2023.

Pre-requisites

  1. Python >= 3.6.
  2. Clone this repository.
  3. Install python requirements. Please refer requirements.txt.
  4. Download and extract the VoiceBank+DEMAND dataset. Resample all wav files to 16kHz, and move the clean and noisy wavs to VoiceBank+DEMAND/wavs_clean and VoiceBank+DEMAND/wavs_noisy, respectively. You can also directly download the downsampled 16kHz dataset here.

Training

CUDA_VISIBLE_DEVICES=0,1 python train.py --config config.json

Checkpoints and copy of the configuration file are saved in the cp_mpsenet directory by default.
You can change the path by adding --checkpoint_path option.

Inference

python inference.py --checkpoint_file [generator checkpoint file path]

You can also use the pretrained best checkpoint file we provide in best_ckpt/g_best.
Generated wav files are saved in generated_files by default.
You can change the path by adding --output_dir option.

Model Structure

model

Comparison with other SE models

comparison

Acknowledgements

We referred to HiFiGAN, NSPP and CMGAN to implement this.

Citation

@inproceedings{lu2023mp,
  title={{MP-SENet}: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra},
  author={Lu, Ye-Xin and Ai, Yang and Ling, Zhen-Hua},
  booktitle={Proc. Interspeech},
  pages={3834--3838},
  year={2023}
}

mp-senet's People

Contributors

yxlu-0102 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mp-senet's Issues

Question on Bandwidth extension task formulation

I have a question regarding the BWE. My apologies, if my question doesn't make sense.

It was mentioned in the journal that, "For BWE, we use the PRelu activation to predict an unbounded high-frequency magnitude mask".

Question:
Input narrow-band signal has no high-frequencies, means zeros in the high freq. If we multiply the high freq mask predicted by magnitude decoder with input magnitude,it'll result zeros in high freq. Then, how are we achieving bandwidth extension with this architecture?.

And also, all these 3 tasks( denoising, dereverberation and BWE) are trained independently?

Thanks for your time and patience in advance !

Real-time or causal system

Hello,

Thanks for releasing the code and other details.

Is the MP-SENet model causal? Is it possible to use it for real-time speech enhancement?

Error dividing by zero if noisy_audio is silence

If during training the noisy audio is silence (np.max == 0), the model will have NaN loss.
This is one of the ways it can be fixed in dataset.py:

import numpy as np
...
def __getitem__(self, index):
	filename = self.audio_indexes[index]
	if self._cache_ref_count == 0:
		clean_audio, _ = librosa.load(os.path.join(self.clean_wavs_dir, filename + '.wav'), sr=self.sampling_rate)
		noisy_audio, _ = librosa.load(os.path.join(self.noisy_wavs_dir, filename + '.wav'), sr=self.sampling_rate)
		
		if ( np.max(noisy_audio) == 0.0 ):
			noise = np.random.normal(0,0.00001,len(noisy_audio))
			noisy_audio = np.add(noisy_audio, noise)
...

请问有什么降低MetricLoss的好方法吗

作者您好,我最近复现了您的实验,首先是loss容易到nan,看起来是梯度爆炸了,而且我的metricloss久久下不来一直在0.7几左右,请问应该怎么办了

GPU memory

Hello Dear Author, Thank you for providing such clear and distinct code !

However I have some questions:

  1. I want to know how much GPU memory you used for this training.
  2. I encountered the problem of lacking GPU memory during training, so I tried to reduce the amount of dataset, but I am not sure where to place the dataset. Can I ask the author which place can be used to change ?

训练的话,training.txt的内容,十分困惑

您好,我下载了voicebank+demand数据集,28speakers这个,然后我把他们降采样到16khz,目录结构是train下面有11572的clean和noisy,test下面有824条的clean和noisy,

然后我把他们重新划分文件夹结构,如同项目中readme所说,我把test的824放到wavs_noisy和wavs_clean,然后把test.txt中是824条的路径(更新到本机的路径)。

但是如果我想要用voicebank训练的话,提示我没有training.txt文件,我想知道,我应该把11572条训练数据放到哪里,然后应该training.txt写什么内容,我有点没搞清楚。是不是放错文件夹了,还是怎么回事,我想知道作者是怎么设置文件夹的。

感谢!

Complex loss calculated using compressed magnitude

In the long-version paper, the magnitude loss is defined as a distance between the compressed magnitudes (Eq. 2). Meanwhile the complex loss is defined as a distance between the non-compressed complex representations of the signals (Eq. 8). However since the dataset class is returning a complex signal that uses the compressed magnitude here, it seems to me that the complex loss actually uses compressed versions of the signals. Can you confirm?

About Time Loss And STFT Consistency Loss

In Paper METHODOLOGY B has a STFT Consistency Loss.

The current code utilizes the MAE loss between synthesized audio and clear audio, known as Time Loss.

Do they have the same effect? Or would the current implementation version of the code be more effective?

The result I obtained from VoiceBank+DEMAND: 😇
image

a question about inference

Thank you very much for your paper and the disclosure of the code. I have a question about the inference. I used the provided ckpt and used inference. py to generate enhanced speech, and then used cal_metrics. py file calculates metrics, but the metrics differ significantly from those in the article. Throughout the process, I only replaced librosa.load with soundfile.read, as librosa.load does not work on my computer. Can you help me analyze the reasons for this result? Anyway, thank you again.

Out of Memory when inferencing with 60secs file

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 137.36 GiB. GPU 0 has a total capacity of 47.54 GiB of which 44.17 GiB is free. Process 1932274 has 3.36 GiB memory in use. Of the allocated memory 1.89 GiB is allocated by PyTorch, and 23.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Why it is expecting 137 GB to infer just 60 sec file? Is this model only for real time purpose?

question about inference

Thank you very much for releasing the code for your work.

I want to confirm that split=true in dataset.py means that when segment_size is defaulted to 32000, will it randomly select two seconds from the audio file for training?

I set split=true when implement both train and validation, so it can be executed successfully. However,when I implement inference.py, I find that the GPU memory of my computer is not enough for inference.

Is there any way I can resolve this problem?

Dereverberation

Hi, thank you for the nice work!

I tested the provided checkpoint on samples from your provided examples ( https://yxlu-0102.github.io/MP-SENet/ ). The model works as expected on noisy samples. However, it totally fails on samples with reverb. I assume these tasks were trained separately.

Could you please provide a checkpoint for the dereverberation task?

Sample rate (fs) - No default. Must select either 8000 or 16000.

I am training with 48k but in validation I get this

    Run model on reference(ref) and degraded(deg)
    Sample rate (fs) - No default. Must select either 8000 or 16000.
    Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

How to fix it or it is imposible

作者给的模型推理结果达不到论文结果,求问

您好,我从github下载您的项目,下载voicebank+demand数据集,降采样到16khz,28 speakers那个,把测试集824个wav音频分别放到main/VoiceBank+DEMAND/wavs_noisy和wavs_clean文件夹下面,使用readme里面python inference.py --checkpoint_file best_ckpt/g_best命令,

然后我把项目生成的语音,计算pesq指数,python cal_metrics.py --input_test_file /MP-SENet-main/VoiceBank+DEMAND/test.txt --clean_wav_dir /MP-SENet-main/VoiceBank+DEMAND/wavs_clean --noisy_wav_dir /MP-SENet-main/generated_files,

得到的结果
inference得到的wav进行计算pesq得分:
(mp-se) (base) i@node01:~/work/MP-SENet-main/cal_metrics$ python cal_metrics.py --input_test_file /MP-SENet-main/VoiceBank+DEMAND/test.txt --clean_wav_dir /MP-SENet-main/VoiceBank+DEMAND/wavs_clean --noisy_wav_dir /MP-SENet-main/generated_files
824
Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:03:39
pesq: 3.006312488207539 csig: 4.232832530393066 cbak: 3.694722579138527 covl: 3.678736791613473 ssnr: 10.314748606080455 stoi: 0.9564755445004552

pesq只有3.00,我不理解,数据集和论文一样,用的是作者给的预训练模型,为什么结果差异这么大。跟降采样降到16khz的方式有关系吗

performance scores

I use the g best model and evaluate on the VoiceBand DEMAND ONLY GET 3.08 pesq scores.

Weights trained on DNS

Hi!

Fantastic work with the model and thank you for releasing the code!
I'd really appreciate if you could make available the weights after the model has been fine-tuned in speech enhancement on the DNS-dataset.

Thanks!

Use for Music?

Thank you for open sourcing your research, highly appreciated!

After thoroughly reading the paper I have the feeling that the entire approach could be almost readily used for not only speech enhancement but also music enhancement, e.g. from smartphone recordings.
The FFT-window and hop size would need to be increased to make sense for 48 KHz and maybe also the length of the audio snippets to capture more context. Also, PESQ and other metrics are geared towards speech so I would rather look for something like FAD.

Do you see any other obstacles or maybe some valid reasons why this might be a bad idea?

Questions with regards to reproducing training and inference result

Dear authors,

Thank you for the great paper and for providing open source code. I would hope to clarify some detail with regards to your training and inference/evaluation.

inference/evaluation

I used your model checkpoint and the VB-demand dataset which you shared through google drive to perform evaluation on the VB-demand test set. Here are the results that I obtained:
pesq: 3.4957
csig: 4.65187
cbak: 3.86279
covl: 4.13774

The results seems to be slightly different from the results that you shared in #9 , which are:
pesq: 3.4957
csig: 4.72751
cbak: 3.95033
covl: 4.22494

I am unsure what causes the differences in the csig, cbak, covl scores, and wonder if you may have any clues about it?
For your information, I used librosa to load the test audios at 16k sampling rate, and used pysepm.composite to compute these scores. My pysepm version is 0.1.

Training,

In section 3.1 of your interspeech paper, it was written that " The learning rate was set initially to 0.0005 and halved every 30 epoch". With regards to this statement, may I clarify if you stopped training after every 30 epochs, halved the learning rate in the config file and then resumed training?

May I also clarify if the checkpoint you provided is the best checkpoint or the last checkpoint during the 100 epochs of training?

Thank you for reading and I hope to hear back from you.

Incoherent dimensions in the self-attention module

Thank you for sharing your very interesting work.

I have a question about the self-attention used in the conformer block.
Before applying the time_conformer, you reshape the tensor to a $(b \times f, t, c)$ shape :

x = x.permute(0, 3, 2, 1).contiguous().view(b*f, t, c)

Then, next line, you apply the time conformer :

x = self.time_conformer(x) + x

In the conformer block, you use a MultiHeadAttention to compute the self-attention. However, this pytorch module is initialized with the batch_first=False parameter (because it's the default paramater) :

self.attn = nn.MultiheadAttention(dim, n_head, dropout=dropout)

So, the self-attention module is expecting a shape of $(L, N, E)$ where $L$ is the sequence length, $N$ is the batch size and $E$ is the dimension (as explained in the pytorch documentation here : https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html )

As the tensor x is of shape $(b \times f, t, c)$, it means that the self-attention will process $b \times f$ as the sequence length instead of using $t$. It would make more sense to initialize the MultiHeadAttention with the parameter batch_first=True. However, when I tried that, the results are not good.

Can you explain ?

Thank you very much

OutOfMemoryError: CUDA out of memory. Tried to allocate 15.25 GiB. GPU 0 has a total capacty of 23.69 GiB of which 5.50 GiB is free.

I have 24 GB GPU but I get this error. I try to use enhance my own audio with yours best checkpoints what is problem or it needs more gpu?

OutOfMemoryError: CUDA out of memory. Tried to allocate 15.25 GiB. GPU 0 has a total capacty of 23.69 GiB of which 5.50 GiB is free. Process 2674591 has 16.90 GiB memory in use. Including non-PyTorch memory, this process has 1.26 GiB memory in use.

Result is bad

when I used " python inference.py --checkpoint_file best_ckpt/g_best " for tested my noisy.wav, The result.wav is bad .The noise is still preserved. Did I make a mistake in my steps?

torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGSEGV

Hello,
thank you for providing the code.
However, I encountered the following issues while reproducing the code. May I ask if you are available to answer the following?
Traceback (most recent call last): File "train.py", line 312, in <module> main() File "train.py", line 306, in main mp.spawn(train, nprocs=h.num_gpus, args=(a, h,)) File "/home/wrl/miniconda3/envs/python37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/wrl/miniconda3/envs/python37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/home/wrl/miniconda3/envs/python37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 136, in join signal_name=name torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGSEGV

NS and BE(SR) in one model design

Is it possible to change the multiplicative design of the mask for noise suppression/dereverb to a residual design like the band-width extension, e.g. replace the activation function of the last layer of the amplitude spectrum decoder with prelu or leakyrelu?

train

image
训练时一直是零,请问哪里可能影响这个问题?

Gradio Demo App on HuggingFace Spaces with ZeroGPU Support

Hello,

I've created a Gradio demo app for the MP-SENet model, hosted on HuggingFace Spaces with ZeroGPU support. It allows users to try out the model immediately in their browsers without requiring any local setup.

Additionally, a simple segment feature that splits long audio files into segments has been implemented for the app. For ZeroGPU (which uses an A100 with 40G memory under the hood), the maximum segment length is 10 seconds. When running locally, the segment length is limited to 3 seconds to prevent blowing up the memory on my MBP.

The demo app is here: https://huggingface.co/spaces/JacobLinCool/MP-SENet
And the repository is here: https://github.com/JacobLinCool/MP-SENet-Gradio

Any feedback is welcome!

conformer structure

Hi,
Thanks for the releasing code for your job. I have noticed that you have mentioned in he paper that the conformers borrowed from CMGAN, in the CMGAN code, in multi-head attention they use relative positional embedding, but in your code, i didn't see that. Are there any concern or reasons not include pe in the confomer.
Thanks

Inferior results trained from scratch

Hello! Your paper and codes are very enlightening to me and I tried to train the model from scratch on VCTK-DEMAND dataset to reproduce the results, but I found that the metrics are rather lower than those provided in the article. PESQ, CSIG, CBAK and COVL are merely about 3.39, 4.67, 3.84 and 4.14, respectively. I modified the following parts of the codes:

  1. I changed segment_size to 16000 (1s) in config.json and split=False in validset configuration to be suitable for my limited GPU (batch_size is 4 and codes are deployed on 2 x 2080Ti). Besides, I use CPU to carry out inference.py for non-split of testset.
  2. In my preference, I only validate the validset and save the checkpoint once per epoch. I'm not sure whether the above modifications will affect the performance of the model.
    Looking forward to your reply~

Wrong default path

parser.add_argument('--input_clean_wavs_dir', default='Voicebank+DEMAND/wavs_clean')

you may see that there should be "VoiceBank+DEMAND/wavs_clean" with the capital B instead of "Voicebank+DEMAND/wavs_clean" with lower-case b

OutOfMemoryError

I have 12 GB GPU but I get this error. I came across this problem during training. Initially, the training was fine, but after 1000 steps, this error occurred.

And the sample rate is 16KHZ.

Steps : 985, Gen Loss: 1.028, Disc Loss: 0.007, Metric loss: 0.649, Magnitude Loss : 0.110, Phase Loss : 2.710, Complex Loss : 0.293, Time Loss : 0.123, s/b : 0.213
Steps : 990, Gen Loss: 0.493, Disc Loss: 0.002, Metric loss: 0.168, Magnitude Loss : 0.025, Phase Loss : 1.417, Complex Loss : 0.084, Time Loss : 0.097, s/b : 0.213
Steps : 995, Gen Loss: 0.779, Disc Loss: 0.001, Metric loss: 0.283, Magnitude Loss : 0.046, Phase Loss : 2.181, Complex Loss : 0.200, Time Loss : 0.146, s/b : 0.232
Steps : 1000, Gen Loss: 1.113, Disc Loss: 0.003, Metric loss: 0.666, Magnitude Loss : 0.134, Phase Loss : 2.843, Complex Loss : 0.368, Time Loss : 0.164, s/b : 0.206
Traceback (most recent call last):
File "/media/MP-SENetmain/train.py", line 309, in
main()
File "/media/MP-SENetmain/train.py", line 305, in main
train(0, a, h)
File "/media/MP-SENetmain/train.py", line 233, in train
mag_g, pha_g, com_g = generator(noisy_mag.to(device), noisy_pha.to(device))
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/generator.py", line 139, in forward
x = self.TSConformeri
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/generator.py", line 113, in forward
x = self.freq_conformer(x) + x
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 73, in forward
x = x + self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 43, in forward
return self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 263, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 260, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 7.17 GiB (GPU 0; 10.75 GiB total capacity; 150.87 MiB already allocated; 7.19 GiB free; 1.53 GiB reserved in total by PyTorch)

Question about phase-domain loss employment

Thank you very much for fantastic work and code release!

I tried your anti-wrapping loss and other phase-domain loss (e.g. L1(clean_phase, est_phase)) on training other networks in a naive way, simply adding it to original losses with a scaling. However, it's very likely to result in that the model parameters ends up nan. Therefore, I'm wondering if you are used to face similar situation and solve this problem perfectly. If so, could you please share some experiences or tricks?

Thank you!

Training details

Hi, your paper and code are excellent! I have learned a lot about speech enhancement from the paper, and I find your code to be very well-structured and clear. Thank you so much!

I have some questions:

  1. Have you fulfilled the multi-gpu running. It seems the training will stuck in https://github.com/yxlu-0102/MP-SENet/blob/main/train.py#L146-L160. If not, maybe I can help you with this issue
  2. What is the batch size, gpu type and training time for your experiments?

Thanks in advance.

CUDA memory requirement

@yxlu-0102
I would like to check what is the minimum vram to run this model. I have a 2070Super and running into OOM error.

    skip = torch.cat([x, skip], dim=1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.84 GiB (GPU 0; 8.00 GiB total capacity; 5.63 GiB already allocated; 455.00 MiB free; 5.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Code Release

I loved the results and the ideas from the paper, will there be a code release soon for this speech enhancement framework?

Fail to reproduce the paper result when training from scratch

Hi,
i try to train the model from scratch, but it fail to reproduce the result in VoiceBank_DEMAND, but the training is unstable and result is bad, can you give some advice.
Thanks.

the tensorboard log is:
image

the only different is the config, and the diff result is:

 {
-    "batch_size": 4,
+    "batch_size": 8,
@@ -13,12 +13,12 @@
 
-    "segment_size": 32000,
+    "segment_size": 16000,
 
-    "num_workers": 4,
+    "num_workers": 8,

Performance with PCS

Hi again!
I'm wondering if you have considered evaluating MP-SENet with Perceptual Constrast Stretching? (PCS, see https://github.com/RoyChao19477/PCS)
I've seen it improve many models' performance on PESQ and COVL, so I'd be very interested in seeing how much the performance of MP-SENet improves as well.

Thank you!

Bandwidth extension checkpoint

Hi authors, thank you for your amazing work!

I was wondering: you mention in the full paper that MP-SENet can also be used for bandwidth extension. Do you have any plans to release the checkpoint / inference code for the bandwidth extension variant of MP-SENet? Thank you again for your work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.