Friday, July 10, 2020

Visualizing Audio Signals - Reading from a File and Working on it

This is the first step in building speech recognition system as it gives an understanding of how an audio signal is structured. Some common steps that can be followed to work with audio signals are as follows:

Recording

When you have to read the audio signal from a file, record it using a microphone.

Sampling

When recording with microphone, the signals are stored in a digitized form. But to work upon it, the machine needs them in the discrete numeric form. Hence, we should perform sampling at a certain frequency and convert the signal into the discrete numerical form. Choosing the high frequency for sampling implies that when humans listen to the signal, they feel it as a continuous audio signal.

Example

The following example shows a stepwise approach to analyze an audio signal, using Python, which is stored in a file. The frequency of this audio signal is 44,100 HZ.

Import the necessary packages as shown here:

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile


Now, read the stored audio file. It will return two values: the sampling frequency and the audio signal. Provide the path of the audio file where it is stored, as shown here:

frequency_sampling, audio_signal = wavfile.read("/Users/admin/audio_file.wav")

Display the parameters like sampling frequency of the audio signal, data type of signal and its duration, using the commands shown:

print('\nSignal shape:', audio_signal.shape)
print('Signal Datatype:', audio_signal.dtype)
print('Signal duration:', round(audio_signal.shape[0] / float(frequency_sampling), 2), 'seconds')


This step involves normalizing the signal as shown below:

audio_signal = audio_signal / np.power(2, 15)

In this step, we are extracting the first 100 values from this signal to visualize. Use the following commands for this purpose:

audio_signal = audio_signal [:100]
time_axis = 1000 * np.arange(0, len(signal), 1) / float(frequency_sampling)

Now, visualize the signal using the commands given below:

plt.plot(time_axis, signal, color='blue')
plt.xlabel('Time (milliseconds)')
plt.ylabel('Amplitude')
plt.title('Input audio signal')
plt.show()


You would be able to see an output graph and data extracted for the above audio signal as shown in the image here:


Signal shape: (132300,)
Signal Datatype: int16
Signal duration: 3.0 seconds

This is how we read from a file. In the next post we'll discuss about characterizing an audio signal.
Share:

0 comments:

Post a Comment