GithubHelp home page GithubHelp logo

dnn-for-speech-enhancement's People

Contributors

yongxuustc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dnn-for-speech-enhancement's Issues

how to normalize the input and target file?

Hi Dr. Xu,

Your paper mentioned the input and target file should be normalized to zero mean and unit variance.
Now i use the qnnorm command to prepare the ".fea_norm" file needed by .pl file. Am i right?
But, in the Interface.cc file, i found you use the following sentence to normalize the input file.
dataori[2+j +i*(para->fea_dim +2)] -= mean[j];
dataori[2+j +i*(para->fea_dim +2)] = dVar[j];
using following sentence to normalize the target file:
targori[2+j +i
(para->layersizes[numlayers-1] +2)] -= 0;
targori[2+j +i*(para->layersizes[numlayers-1] +2)] *= 1;
It seems the target file does not be normalized with the above two sentence.

Could you tell me how to normalize the target file?

Can you share the source code of "Wav2LogSpec.exe“"?

Dear Dr. Xu:
Can you share the source code of "Wav2LogSpec.exe“? It would help a lot for us to know how you transfer the .raw file to .lsp file. And if possible, can you also share the source code of "LogSpec2raw_16bit_withoutXF.exe" in the decoding process?

Any help from you would be greatly appreciated!

where is the Relu function in the training code?

Hi Dr.Xu,

In the step1_DNNenh_for 16kHz.m file, i found you use relu function to do decoding, however, i don't found relu function in the training code, if i want to use relu function as active function, should i add it myself?

pfile tail is Not correct.

Hi, Dr. Xu, I meet a problem when I used the training demo. Would you please give me some help?
According to the "README.md", I used the Quicknet tool to generate pfile, including "fea.pfile", "targ.pfile" and "fea.norm". And I only used two pair of wavs to test like this:

feacalc sa1.wav sa2.wav -output targ.pfile
feacalc sa1_car_snr1.wav sa2_car_snr1.wav -output fea.pfile
pfile_norm -i fea.pfile -o fea.norm

And then I used these three files to config the "finetune_DNN_speech_enhancement_dropout_NAT_linux.pl" and run in linux. But all the log file show the message "pfile tail is Not correct." .
Is it wrong the way I prepared the pfiles? How can I prepare my pfiles? and I don't quite understand tips in "how_to_get_pfile.txt" and there are two files ".len" and ".scp" should be prepared first. What the exactly format of pfiles? Would you please give a small example?
I am very grateful.

the "tails in target pfile and data pfile is not consistent" error

Hi Dr Xu,

Now i use to wave files "clean1.wav" and "clean2.wav" plus a pink noise file "pink.wav" to synthesize two traing files "tran1.wave" and "train2.wav".Then use the steps you provided to generate the “train.pfile” and the "clean.pfile"; use pfile_norm -i train.pfile -o train.norm to gernerate .norm file .

And then I used these three files to config the "finetune_DNN_speech_enhancement_dropout_NAT_linux.pl" and run in linux. But all the log file show the error message "tails in target pfile and data pfile is not consistent." .

Could you just quickly point me to the direction how to fix this error.

error

'perl' is not recognized as an internal or external command,
operable program or batch file.
Error using fread
Invalid file identifier. Use fopen to generate a valid file identifier.

Error in readhtk_new (line 12)
nframes = fread(fid,1,'int32');

Error in step1_DNNenh_for16kHz (line 61)
[htkdata,nframes,sampPeriod,sampSize,paramKind]=readhtk_new(tline,'le');

I cleared the wav_lsp folder and kept the timit_noisy_SNR5.wav file to test and then get that error above. Please help. Thanks

Can this program run without GPU?

I checked the makefile for this project, and found that cuda is a dependency. I don't have a GPU, but only CPU. So is there a more general version of this project?

how to change the code in BP_GPU.cu file to make the output unit is linear

your paper mentioned that "The type of the hidden units is sigmoid, and the output unit is linear." So i think i should change the following code in the BP_GPU.cu file
else{
DevSigmoid(streams[0],cur_layer_size, cur_layer_x, cur_layer_y);
}
as
else{
DevLinearOutCopy(streams[0],n_frames, cur_layer_units, cur_layer_x, cur_layer_y);
cudaMemcpy(dev[0].out,cur_layer_y,n_framescur_layer_unitssizeof(float),cudaMemcpyDeviceToDevice);
}
Am i right?

Number of epochs

Hi Dr.Xu,
I am writing your code in python and using random initializing for layers. I also read your papers. I could find the number of epochs 50 and learning rate 0.1 for the 10 first epochs,then decreased by 10% after each epoch .

  1. Are these numbers correct?
  2. I plot the histogram of log-power spectra for each bin and they were like Gaussian distribution, but I also want to know why you say we should use standard normalization? we can't use for instance max-min normalization ?
    Thanks for your help

higher quality enhancement?

What if I want to train and test on 44.1kHz wav files, how do I modify the code to do that? It works for 16kHz, but I am interested in higher sampling rates for my applications. Thanks!

There is a little mistake in your "le2be_for_all_files_func.m" file

Hi Dr. Xu,

Considering the HTK file format, in think the code in "le2be_for_all_files_func.m" should be modified like this

function []=le2be_for_all_files_func(infile, outfile)
% infile='clean_FBI_22123A.08';
% outfile='clean_FBI_22123A.08_be';

fn = fopen(infile, 'r','ieee-le');
fid = fopen(outfile,'wb','ieee-be');
Y = fread(fn, 2, 'long');
fwrite(fid,Y,'long');
Y = fread(fn, 2, 'short');
fwrite(fid,Y,'short');
Y = fread(fn, inf, 'float');
fwrite(fid,Y,'float');
fclose(fn); %%%关闭当前文件句柄,否则最后会提示打开了太多文件
fclose(fid);

end
Following the steps you provide, i can get the .pfile now.

mclmcrrt710.dll not found

Hi Yong,

if I try to execute .exe have this: The code execution cannot proceed because mclmcrt710.dll was not found.

OS: Win 10 Pro

framesBeforeSent[] does not read correctly

Dear Dr. Xu:
I am using your model to train with my data. However, it seems that the framesBeforeSent[] does not read the correct number. In my understanding, the number in framesBeforeSent[] should be the number of frames before each sentence and should correspond with the sum of the number in lens file. However, I get big numbers like 1101260349 in framesBeforeSent[] although I only have 6379 frames. And because of that, the program ran into an endless loop in :
while(cur_chunk_frames >= para->traincache ){
next_st = cur_frame_id -(cur_chunk_frames - para->traincache);
if(next_st < total_frames){
chunk_frame_st[count_chunk] = next_st;
count_chunk++;
cur_chunk_frames = (cur_frame_id - next_st > para->fea_context -1)?(cur_frame_id - next_st - para->fea_context +1):0;
}
The reason that I feel frameBeforeSent was misread is that it cause the error "tails in target pfile and data pfile is not consistent". When I use my noisy.pfile and clean.pfile to train, this error pop up. However, when I check my noisy.pfile and clean.pfile, the tails are the same.

I tried to fix this bug by changing the calculation of offset in read_tail(fp_data, offset, total_sents, framesBeforeSent) but failed since I am not quite familiar with the data structure of pfile. So, can you help fix this bug?

Any help from you would be greatly appreciated!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.