krishnadn / x-vector-pytorch Goto Github PK

Implementation of the paper "Spoken Language Recognition using X-vectors" in Pytorch

Python 100.00%

x-vector x-vector-pytorch language-recognition language-identification speech

x-vector-pytorch's Introduction

x-vector-pytorch

This repo contains the implementation of the paper "Spoken Language Recognition using X-vectors" in Pytorch Paper: https://danielpovey.com/files/2018_odyssey_xvector_lid.pdf Tutorial : https://www.youtube.com/watch?v=8nZjiXEdMH0

Installation

I suggest you to install Anaconda3 in your system. First download Anancoda3 from https://docs.anaconda.com/anaconda/install/hashes/lin-3-64/

bash Anaconda2-2019.03-Linux-x86_64.sh

Clone the repo

https://github.com/KrishnaDN/x-vector-pytorch.git

Once you install anaconda3 successfully, install required packges using requirements.txt

pip iinstall -r requirements.txt

Create manifest files for training and testing

This step creates training and testing files.

python datasets.py --processed_data  /media/newhd/youtube_lid_data/download_data --meta_store_path meta/

Training

This steps starts training the X-vector model for language identification

python training_xvector.py --training_filepath meta/training.txt --testing_filepath meta/testing.txt --validation_filepath meta/validation.txt
                             --input_dim 40 --num_classes 8 --batch_size 32 --use_gpu True --num_epochs 100

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. For any queries contact : [email protected]

License

MIT

x-vector-pytorch's People

Contributors

Stargazers

Watchers

x-vector-pytorch's Issues

Extracting x-vecs from custom dataset

Hi, I would like to extract x-vector features from a custom set of utterances. What part of the code can be used to do so?

Dataset do not work in 'valid' or 'test' mode

In 'train' mode you make exactly [400,257] data frames.
In other mode you make [x,257] data frames, where x - length of mag phase spectrogramm.

    if mode=='train':
        randtime = np.random.randint(0, mag_T.shape[1]-spec_len)
        spec_mag = mag_T[:, randtime:randtime+spec_len]
    else:
        spec_mag = mag_T

In validating mode you get list of different-sized tensors and of course - error during

features = torch.from_numpy(np.asarray([torch_tensor.numpy().T for torch_tensor in sample_batched[0]])).float()

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64

I am wonder about you train\valid approach in dataset. Can you repair project and clarify it?

Access to dataset

Hi,
Can you share the drive of your dataset ? i dont have the dataset for the indian languages.

thanks,
Satish

Hi, nice work!
Got one minor question. In the paper cited, Snyder et. al use MFCC features.
Here, however, it seems like you are using linear spectrograms instead. Is that on purpose?
Are they performing better than MFCCs for the DNN case?

strange results of training and validation

In "aishell"and "Merged_Arabic_Corpus_of_Isolated_Words" datasets, the predicted results looks like a random guess when training, and will always get a same result when validation. Have you encountered this problem? Thanks.
""" The validation results as follows, left: label, right: prediction------
[25] [11]
[39] [11]
[41] [11]
[130] [11]
[2] [11]
"""

Inference function

Dear sir,
Would you please offer the inference function? thanks!

about forward function in x_vector.py

I wonder why in forward step, it returns the tdnn1_out.
I suppose it should be commented else the model training will not work?

def forward(self, inputs):
    tdnn1_out = self.tdnn1(inputs)
    **return tdnn1_out**

also, what is the difference to x_vector_Indian_LID.py?
what does _Indian_LID meant?

Thanks

You use TRAIN dataset for validation.

In file training_xvector.py codeline 46

dataloader_val = DataLoader(dataset_train, batch_size=args.batch_size,shuffle=True,collate_fn=speech_collate)

You use dataset_TRAIN dataset for validation. This is clearly copy-paste error.

question about dataset

hi, i have a question .i use windows to run you code ,how to achieve you dataset ? because i see you dataset is .txt

模型效果如何

8语种准确率大概能到多少