xuanjihe Goto Github PK

followers: 34.0 following: 0.0 repos: 43.0 gists: 0.0

Type: User

xuanjihe's Projects

additive-margin-softmax

This is the implementation of paper <Additive Margin Softmax for Face Verification>

air-asvspoof

Implementation of the paper "One-class Learning towards Generalized Voice Spoofing Detection"

assert

JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

auto-tuning-spectral-clustering

This repo is for the SPL paper "Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap"

automatic_speech_recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

circleloss

Pytorch implementation of the paper "Circle Loss: A Unified Perspective of Pair Similarity Optimization"

cmu-thesis

Code for Yun Wang's PhD Thesis: Polyphonic Sound Event Detection with Weak Labeling

dcase2018_pooling

Repo for our pooling approach on the DCASE2018 task4

deep-voice-conversion

Deep neural networks for voice conversion (voice style transfer) in Tensorflow

factorized-tdnn

PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi

gradientreversal

Gradient Reversal Layer for Domain Adaptation

kaldi

This is now the official location of the Kaldi project.

learning_invariances_in_speech_recognition

In this work I investigate the speech command task developing and analyzing deep learning models. The state of the art technology uses convolutional neural networks (CNN) because of their intrinsic nature of learning correlated represen- tations as is the speech. In particular I develop different CNNs trained on the Google Speech Command Dataset and tested on different scenarios. A main problem on speech recognition consists in the differences on pronunciations of words among different people: one way of building an invariant model to variability is to augment the dataset perturbing the input. In this work I study two kind of augmentations: the Vocal Tract Length Perturbation (VTLP) and the Synchronous Overlap and Add (SOLA) that locally perturb the input in frequency and time respectively. The models trained on augmented data outperforms in accuracy, precision and recall all the models trained on the normal dataset. Also the design of CNNs has impact on learning invariances: the inception CNN architecture in fact helps on learning features that are invariant to speech variability using different kind of kernel sizes for convolution. Intuitively this is because of the implicit capability of the model on detecting different speech pattern lengths in the audio feature.

lplda

Local Pairwise Linear Discriminant Analysis

momentumcontrast.pytorch

Reproduction of Momentum Contrast for Unsupervised Visual Representation Learning

multi-channel-speech-extraction-using-dnn

A tensorflow implementation of my paper Combining beamforming and deep neural networks for multi-channel speech extraction

netvlad

netVLAD implementation in TensorFlow

pix2pix-tensorflow

TensorFlow implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

prefetch_generator

Simple package that makes your generator work in background thread

pyaudioanalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

pytorch_xvectors

Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196

qamface

Pytorch implementation of Quadratic Additive Angular Margin Loss for Face Recognition

robustfisherlda

Robust Fisher Linear Discriminant Analysis

scaper

A library for soundscape synthesis and augmentation

segan

Speech Enhancement Generative Adversarial Network in TensorFlow

self-attentive-emb-tf

Simple Tensorflow Implementation of "A Structured Self-attentive Sentence Embedding" (ICLR 2017)

speaker_embedding_moco

speaker_verification

Tensorflow implementation of generalized end-to-end loss for speaker verification

xuanjihe Goto Github PK

xuanjihe's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs