yongxuustc / dnn-for-speech-enhancement Goto Github PK

View Code? Open in Web Editor NEW

171.0 171.0 69.0 3.15 MB

DNN-for-speech-enhancement

Cuda 32.88% C++ 51.10% C 6.42% Makefile 0.43% Perl 6.06% MATLAB 3.11%

dnn-for-speech-enhancement's People

Contributors

Stargazers

Watchers

dnn-for-speech-enhancement's Issues

how to normalize the input and target file?

Hi Dr. Xu,

Your paper mentioned the input and target file should be normalized to zero mean and unit variance.
Now i use the qnnorm command to prepare the ".fea_norm" file needed by .pl file. Am i right?
But, in the Interface.cc file, i found you use the following sentence to normalize the input file.
dataori[2+j +i*(para->fea_dim +2)] -= mean[j];
dataori[2+j +i*(para->fea_dim +2)] = dVar[j];
using following sentence to normalize the target file:
targori[2+j +i(para->layersizes[numlayers-1] +2)] -= 0;
targori[2+j +i*(para->layersizes[numlayers-1] +2)] *= 1;
It seems the target file does not be normalized with the above two sentence.

Could you tell me how to normalize the target file?

How to generate the ''timit_noisy_SNR5.lsp'' file?

Hi Dr.Xu,
How do i generate the Corresponding my_own_noisy_speech.lsp file with my own noisy speech data in testing?

Is it convenient for you to share the pretrain model with me

Hi Dr.Xu,

Is it convenient for you to share the pretrain model with me?

Can you share the source code of "Wav2LogSpec.exe“"?

Dear Dr. Xu:
Can you share the source code of "Wav2LogSpec.exe“? It would help a lot for us to know how you transfer the .raw file to .lsp file. And if possible, can you also share the source code of "LogSpec2raw_16bit_withoutXF.exe" in the decoding process?

Any help from you would be greatly appreciated!

where is the Relu function in the training code?

Hi Dr.Xu,

In the step1_DNNenh_for 16kHz.m file, i found you use relu function to do decoding, however, i don't found relu function in the training code, if i want to use relu function as active function, should i add it myself?

pfile tail is Not correct.

Hi, Dr. Xu, I meet a problem when I used the training demo. Would you please give me some help?
According to the "README.md", I used the Quicknet tool to generate pfile, including "fea.pfile", "targ.pfile" and "fea.norm". And I only used two pair of wavs to test like this:

feacalc sa1.wav sa2.wav -output targ.pfile
feacalc sa1_car_snr1.wav sa2_car_snr1.wav -output fea.pfile
pfile_norm -i fea.pfile -o fea.norm

And then I used these three files to config the "finetune_DNN_speech_enhancement_dropout_NAT_linux.pl" and run in linux. But all the log file show the message "pfile tail is Not correct." .
Is it wrong the way I prepared the pfiles? How can I prepare my pfiles? and I don't quite understand tips in "how_to_get_pfile.txt" and there are two files ".len" and ".scp" should be prepared first. What the exactly format of pfiles? Would you please give a small example?
I am very grateful.

the "tails in target pfile and data pfile is not consistent" error

Hi Dr Xu,

Now i use to wave files "clean1.wav" and "clean2.wav" plus a pink noise file "pink.wav" to synthesize two traing files "tran1.wave" and "train2.wav".Then use the steps you provided to generate the “train.pfile” and the "clean.pfile"; use pfile_norm -i train.pfile -o train.norm to gernerate .norm file .

And then I used these three files to config the "finetune_DNN_speech_enhancement_dropout_NAT_linux.pl" and run in linux. But all the log file show the error message "tails in target pfile and data pfile is not consistent." .

Could you just quickly point me to the direction how to fix this error.

step1_DNNenh_for16kHz.exe can't run

when i run step1_DNNenh_for16kHz.exe ,it report error, my os is WINDOWS7.

Is there a way to make this work on mac/ubuntu? Has it been tried out?

error

'perl' is not recognized as an internal or external command,
operable program or batch file.
Error using fread
Invalid file identifier. Use fopen to generate a valid file identifier.

Error in readhtk_new (line 12)
nframes = fread(fid,1,'int32');

Error in step1_DNNenh_for16kHz (line 61)
[htkdata,nframes,sampPeriod,sampSize,paramKind]=readhtk_new(tline,'le');

I cleared the wav_lsp folder and kept the timit_noisy_SNR5.wav file to test and then get that error above. Please help. Thanks

Can this program run without GPU?

I checked the makefile for this project, and found that cuda is a dependency. I don't have a GPU, but only CPU. So is there a more general version of this project?

your 15 homegrown noises link zip2 is not valid

can not download zip2

how to change the code in BP_GPU.cu file to make the output unit is linear

your paper mentioned that "The type of the hidden units is sigmoid, and the output unit is linear." So i think i should change the following code in the BP_GPU.cu file
else{
DevSigmoid(streams[0],cur_layer_size, cur_layer_x, cur_layer_y);
}
as
else{
DevLinearOutCopy(streams[0],n_frames, cur_layer_units, cur_layer_x, cur_layer_y);
cudaMemcpy(dev[0].out,cur_layer_y,n_framescur_layer_unitssizeof(float),cudaMemcpyDeviceToDevice);
}
Am i right?

Number of epochs

Hi Dr.Xu,
I am writing your code in python and using random initializing for layers. I also read your papers. I could find the number of epochs 50 and learning rate 0.1 for the 10 first epochs,then decreased by 10% after each epoch .

Are these numbers correct?
I plot the histogram of log-power spectra for each bin and they were like Gaussian distribution, but I also want to know why you say we should use standard normalization? we can't use for instance max-min normalization ?
Thanks for your help

higher quality enhancement?

What if I want to train and test on 44.1kHz wav files, how do I modify the code to do that? It works for 16kHz, but I am interested in higher sampling rates for my applications. Thanks!

Is there a tool for speech enhancement based on Chinese training?

drawing correpondance figure

There is a little mistake in your "le2be_for_all_files_func.m" file

Hi Dr. Xu,

Considering the HTK file format, in think the code in "le2be_for_all_files_func.m" should be modified like this

function []=le2be_for_all_files_func(infile, outfile)
% infile='clean_FBI_22123A.08';
% outfile='clean_FBI_22123A.08_be';

fn = fopen(infile, 'r','ieee-le');
fid = fopen(outfile,'wb','ieee-be');
Y = fread(fn, 2, 'long');
fwrite(fid,Y,'long');
Y = fread(fn, 2, 'short');
fwrite(fid,Y,'short');
Y = fread(fn, inf, 'float');
fwrite(fid,Y,'float');
fclose(fn); %%%关闭当前文件句柄，否则最后会提示打开了太多文件
fclose(fid);

end
Following the steps you provide, i can get the .pfile now.

mclmcrrt710.dll not found

Hi Yong,

if I try to execute .exe have this: The code execution cannot proceed because mclmcrt710.dll was not found.

OS: Win 10 Pro

framesBeforeSent[] does not read correctly

Dear Dr. Xu:
I am using your model to train with my data. However, it seems that the framesBeforeSent[] does not read the correct number. In my understanding, the number in framesBeforeSent[] should be the number of frames before each sentence and should correspond with the sum of the number in lens file. However, I get big numbers like 1101260349 in framesBeforeSent[] although I only have 6379 frames. And because of that, the program ran into an endless loop in :
while(cur_chunk_frames >= para->traincache ){
next_st = cur_frame_id -(cur_chunk_frames - para->traincache);
if(next_st < total_frames){
chunk_frame_st[count_chunk] = next_st;
count_chunk++;
cur_chunk_frames = (cur_frame_id - next_st > para->fea_context -1)?(cur_frame_id - next_st - para->fea_context +1):0;
}
The reason that I feel frameBeforeSent was misread is that it cause the error "tails in target pfile and data pfile is not consistent". When I use my noisy.pfile and clean.pfile to train, this error pop up. However, when I check my noisy.pfile and clean.pfile, the tails are the same.

I tried to fix this bug by changing the calculation of offset in read_tail(fp_data, offset, total_sents, framesBeforeSent) but failed since I am not quite familiar with the data structure of pfile. So, can you help fix this bug?

Any help from you would be greatly appreciated!!

yongxuustc / dnn-for-speech-enhancement Goto Github PK

dnn-for-speech-enhancement's People

Contributors

Stargazers

Watchers

Forkers

dnn-for-speech-enhancement's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs