ddxk Goto Github PK
Type: User
Type: User
Attention mechanism for processing sequential data that considers the context for each timestamp.
A Keras implementation of YOLOv3 (Tensorflow backend)
Lingvo
图像、人脸、OCR、语音相关算法整理
练习和打Kaggle时记录的笔记和心得,用的colab
End-to-end ASR implementation with pytorch.
Pytorch implementation of Octave convolution
语音识别 MFCCs特征处理 cnn神经网络
拼音转汉字, 拼音输入法引擎, pin yin -> 拼音
AI实战-practicalAI 中文版
This library provides common speech features for ASR including MFCCs and filterbank energies.
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
未来杯高校AI挑战赛
scikit-learn: machine learning in Python
SincNet is a neural architecture for efficiently processing raw audio samples.
Tensorflow implementation of generalized end-to-end loss for speaker verification
中文语音识别
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
腾讯开源作品整理
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
It is a complete project of voiceprint recognition or speaker recognition.Before, I upload a very classic VGG based model for speaker recognition . The model simply use softmax-loss to train super-parameters. But during testing stage,we found the model is not very reliable。for example, the model can easily distinguish man-man group, and man-woman group, but difficultly in woman-woman. So, we try another method called triplet-group to retrain our model, of course, we use triplet-loss as the loss for back propagation. The I upload our core code, and training curve for the two training stage. Why, I refer to "two training stage"? That need you to understand the triplet-group method. And very very welcome to my mailbox: [email protected]
Utterance-level Aggregation For Speaker Recognition In The Wild
It is a complete project of voiceprint recognition or speaker recognition.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.