Monday, July 6, 2020

Slicing and Extracting Statistic from Time Series Data

Slicing involves retrieving only some part of the time series data. As a part of the example, we are slicing the data only from 1980 to 1990. Observe the following code that performs this task:

timeseries['1980':'1990'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0xa0e4b00>
plt.show()

When you run the code for slicing the time series data, you can observe the following graph as shown in the image here:


Extracting Statistic from Time Series Data

You will have to extract some statistics from a given data, in cases where you need to draw some important conclusion. Mean, variance, correlation, maximum value, and minimum value are some of such statistics. You can use the following code if you want to extract such statistics from a given time series data:

Mean

You can use the mean() function, for finding the mean, as shown here:

timeseries.mean()

Then the output that you will observe for the example discussed is:

-0.11143128165238671

Maximum

You can use the max() function, for finding maximum, as shown here:

timeseries.max()

Then the output that you will observe for the example discussed is:

3.4952999999999999

Minimum

You can use the min() function, for finding minimum, as shown here:

timeseries.min()

Then the output that you will observe for the example discussed is:

-4.2656999999999998

If you want to calculate all statistics at a time, you can use the describe() function as shown here:

timeseries.describe()

Then the output that you will observe for the example discussed is:

count 817.000000
mean -0.111431
std 1.003151
min -4.265700
25% -0.649430
50% -0.042744
75% 0.475720
max 3.495300
dtype: float64

Re-sampling

You can resample the data to a different time frequency. The two parameters for performing re-sampling are:
  1. Time period
  2. Method

Re-sampling with mean()

You can use the following code to resample the data with the mean()method, which is the default method:

timeseries_mm = timeseries.resample("A").mean()
timeseries_mm.plot(style='g--')
plt.show()

Then, you can observe the following graph as the output of resampling using mean():


Re-sampling with median()

You can use the following code to resample the data using the median()method:

timeseries_mm = timeseries.resample("A").median()
timeseries_mm.plot()
plt.show()

Then, you can observe the following graph as the output of re-sampling with median():


Rolling Mean

You can use the following code to calculate the rolling (moving) mean:

timeseries.rolling(window=12, center=False).mean().plot(style='-g')
plt.show()


Then, you can observe the following graph as the output of the rolling (moving) mean:


Here I am ending this post, the next post deals in detail with analyzing sequential data using Hidden Markov Model (HMM).


Share:

0 comments:

Post a Comment