Thesis Speaker Verification
As with any machine learning models, Automatic Speaker Recognition requires training data and testing data.Speaker Identification is to identify whether the speaker of a testing utterance matches any training utterances, and hence it is a closed-set problem.Speaker Recognition at its core is to optimize a Sequence-to-One mapping function.From the task perspective, it is supposedly easier than Sequence-to-Sequence tasks since it only outputs once per sequence.Upon hearing a speech, in addition to identify what its content, it is natural for us to ask: Who is the speaker? Figure is an overview of the speaker information in speech.Speaker information is embedding in speech, but it is often corrupted by channel effects to some degree.However, from the data perspective, it is much harder.Comparing to automatic speech recognition or machine translation, which are Sequence-to-Sequence mappings, there is very little data for automatic speaker recognition.
This will constitute the data set of 300 features for each LPC coefficient. For more information kindly check out the rbfnn toolbox in MATLAB 6. Ans: used audioread function of signal processing toolbox 8.
This code is written in MATLAB 2017a version for speaker recognition using LPC and MFCC features.
Results of recognition accuracy by both features set are compared and it is analysed that MFCC features perform well for speaker recognition.
Feature Processing is to get low-level feature descriptors from the speech waveforms, such as Mel-Frequency Cepstral Coefficients (MFCC), Filter Bank, Perceptual Linear Predictive (PLP) Analysis, or bottleneck features.
Clustering is the process to differentiate different acoustic units and process them separately, and it is commonly adopted in speaker recognition, such as Gaussian Mixture Model (GMM).