Monday, May 13, 2019

Pandas - 35 (Data Visualization- Handling Date Values)

Handling data of the date-time type is one of the most common problems encountered when doing data analysis as displaying that data along an axis (normally the x-axis) can be problematic, especially when managing ticks. The following program supports this argument:

import matplotlib.pyplot as plt
import math
import numpy as np
import datetime

events = [datetime.date(2015,1,23),datetime.
date(2015,1,28),datetime.date(2015,2,3),datetime.
date(2015,2,21),datetime.date(2015,3,15),datetime.
date(2015,3,24),datetime.date(2015,4,8),datetime.date(2015,4,24)]

readings = [12,22,25,20,18,15,17,14]
plt.plot(events,readings)
plt.show()

The above program is the display of a linear chart with a dataset of eight points in which we represent date values on the x-axis with the following format: day-month-year. The output of the program is shown below:
As seen in the above chart automatic management of ticks, especially the tick labels, can be a disaster. The dates expressed in this way are difficult to read, there are no clear time intervals elapsed between one point and another, and there is also overlap.

To manage dates it is therefore advisable to define a time scale with appropriate objects. First we  need to import matplotlib.dates, a module specialized for this type of data. Then we define the scales of the times, as in this case, a scale of days and one of the months, through the MonthLocator() and DayLocator() functions. In these cases, the formatting is also very important, and to avoid overlap or unnecessary references, we have to limit the tick labels to the essential, which in this case is year-month. This format can be passed as an argument to the DateFormatter() function.

After we defined the two scales, one for the days and one for the months, we can set two different kinds of ticks on the x-axis, using the set_major_locator() and set_minor_locator() functions on the xaxis object. Instead, to set the text format of the tick labels referred to the months we have to use the set_major_formatter() function.

Let's implement the changes in our previous program and improve the plot:

import matplotlib.dates as mdates

months = mdates.MonthLocator()
days = mdates.DayLocator()
timeFmt = mdates.DateFormatter('%Y-%m')

events = [datetime.date(2015,1,23),datetime.
date(2015,1,28),datetime.date(2015,2,3),datetime.
date(2015,2,21),datetime.date(2015,3,15),datetime.
date(2015,3,24),datetime.date(2015,4,8),datetime.date(2015,4,24)]

readings = [12,22,25,20,18,15,17,14]
fig, ax = plt.subplots()
plt.plot(events,readings)
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(timeFmt)
ax.xaxis.set_minor_locator(days)
plt.show()

The output of the program is shown below:

As we can see in the chart now the tick labels of the x-axis refer only to the months, making the
plot more readable.


Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment