Monday, July 13, 2020

Feature Extraction from Speech

This is the most important step in building a speech recognizer because after converting the speech signal into the frequency domain, we must convert it into the usable form of feature vector. We can use different feature extraction techniques like MFCC, PLP, PLP-RASTA etc. for this purpose.

Speech Emotion Recognition Using Deep Learning Techniques: A Review

Example

In the following example, we are going to extract the features from signal, step-by-step, using Python, by using MFCC technique.

Import the necessary packages, as shown here:

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from python_speech_features import mfcc, logfbank

Now, read the stored audio file. It will return two values: the sampling frequency and the audio signal. Provide the path of the audio file where it is stored.

frequency_sampling, audio_signal = wavfile.read("/Users/admin/audio_file.wav")

Note that here we are taking first 15000 samples for analysis.

audio_signal = audio_signal[:15000]

Use the MFCC techniques and execute the following command to extract the MFCC features:

features_mfcc = mfcc(audio_signal, frequency_sampling)

Now, print the MFCC parameters, as shown:

print('\nMFCC:\nNumber of windows =', features_mfcc.shape[0])
print('Length of each feature =', features_mfcc.shape[1])


Now, plot and visualize the MFCC features using the commands given below:

features_mfcc = features_mfcc.T
plt.matshow(features_mfcc)
plt.title('MFCC')


In this step, we work with the filter bank features as shown:

Extract the filter bank features:

filterbank_features = logfbank(audio_signal, frequency_sampling)

Now, print the filterbank parameters.

print('\nFilter bank:\nNumber of windows =', filterbank_features.shape[0])
print('Length of each feature =', filterbank_features.shape[1])


Now, plot and visualize the filterbank features.

filterbank_features = filterbank_features.T
plt.matshow(filterbank_features)
plt.title('Filter bank')
plt.show()

As a result of the steps above, you can observe the following outputs: Figure1 for MFCC and Figure2 for Filter Bank



In the next post the topic of discussion would be Recognition of Spoken Words.
Share:

0 comments:

Post a Comment