jefflai108 / contrastive-predictive-coding-pytorch Goto Github PK
View Code? Open in Web Editor NEWContrastive Predictive Coding for Automatic Speaker Verification
License: MIT License
Contrastive Predictive Coding for Automatic Speaker Verification
License: MIT License
@jefflai108 Could you provide your spk2idx file when train the spk_class.py?
Thanks for your sharing of CPC code.
I read the code and found that the provided Dataset class reads .h5 files. From open ASR website and the information provided from the paper, I can only download those files with extension .frac or .txt.
Can you explicitly explain the configuration of your dataset?
In the calculation of the NCE loss, the softmax does not have a dimension to compute the result and by default, PyTorch uses dim=1 with 2D input.
The Loss in the paper highlights that the c_t (context) remains constant, and we 'match' this context to the actual values of z_t. By using dim=1 instead of dim=0 we actually compute the 'match' between a constant z_t and c_ts that are generated by each example in the batch.
The softmax should be performed on the columns of the 8x8 matrix to capture the true loss function defined in the CPC paper.
In the implementation, a random position of the sequence is used to compute the NCE loss. The paper mentions that the GRU output at each step is used to predict 12 timesteps in the future.
"The output of the GRU at every timestep is used as the context c from which we predict 12 timesteps in
the future using the contrastive loss"
Was this decision made to reduce training time?
I see in your implementation that you feed entire signal into the encoder,
while the paper has noted that each timestemp should be insert seperatly.
When you feed the entire signal into the encoder, you get some overlapping features with the Conv kernel (except for the case that the stride equal to the kernel size).
Why did you implement like that? do you think it does not matter ?
Thanks!
https://arxiv.org/pdf/1807.03748.pdf
If you look at equation 4 from the paper, the log softmax would be over N-1 negative samples and 1 positive sample. From your implementation, the N-1 negative samples are actually self.time_step-1. Taking log_softmax over batch seems wrong. We switched it to log_softmax over time and training is more stable and accuracy has gone up for our toy dataset. However that is only a partial fix.
Hi , Thank you again for this share coding.
I found something might wrong in validation.py.
When you doing validation, initialing GRU hidden again, this might cause validation loss in log is more than itself. And since it intials GRU hideen every epoch, I think it might impair the performance slightly.
NVM, I mis-read the equation. You are right.
At Line 310, you have the following code
output, hidden = self.gru(forward_seq, hidden) # output size e.g. 8*100*256
c_t = output[:,t_samples,:].view(batch, 256) # c_t e.g. size 8*256
So you are using the second last timestep as c_t
? Since the last timestep should be output[:,t_samples+1,:]
, or just simply hidden
.
As far as I understand from the original paper, c_t
should be the last timestep. Am I missing anything here?
I had some trouble to understand the realization of infoNCE loss function. I don't understand the How torch.diag() could represent infoNCE loss.
Hello @jefflai108,
I think the accuracy also should have its denominator as batch*self.timestep at
Possibly for other models too although I did not check them.
Hi, thanks for sharing your implementation of CPC. I've been trying to run it out of the box but am having issues shaping the input data correctly. Is there another script that encodes the wav file directories into .h5?
Dear Jeff,
Thank you so much for providing this great repository! Sincerely appreciate your great implementation!
However, after reading all the closed issues and trying out for initializing the training, I am still a bit confused about the training and test dataset. I try to run run.sh and the following error reported:
May I request what might be the possible solution of this? Thank you so much for your clarification!
Sincerely,
Martin
The paper does not mention the use of Batch Normalization in the case of the audio task.
In the case of the Vision task, it mentions that '' We did not use Batch-Norm [38]."
Thank you for sharing your code, I have meet some problem.
When we use CPC, it is [128,256] but mfcc is [frame,39],
as you result, I wonder how to combine it in [frame, 39 + 256] dims.
Thanks again
I don't understand how to update the self.softmax() when training, because the acc don't backward. How does it work?
In model.py line 113 : output2, hidden1 = self.gru2(forward_seq, hidden1)
perhaps it should be : output2, hidden2 = self.gru2(forward_seq, hidden1)
?
Meaning, should that correct
variable be += ?
@jefflai108 Is negative sample from other batch at the same t (time-step) ?
I saw list files such as "LibriSpeech/list/train.txt" are required parameters for main.py
. It seems such files are not provided by librispeech officially. What is the format of them? Could you provide them or the script to generate them?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.