Friday, September 27, 2019

Adding dates to our graph using The datetime Module

In the previous post we drew a simple temperature data plot . Let’s add dates to our graph to make it more useful. The first date from the weather data file is in the second row of the file:

"USW00025333","SITKA AIRPORT, AK US","2018-07-01","0.25",,"62","50"

The data will be read in as a string, so we need a way to convert the string "2018-07-01" to an object representing this date. We can construct an object representing July 1, 2018 using the strptime() method from the datetime module. Let’s see how strptime() works with the help of the following program:

from datetime import datetime
first_date = datetime.strptime('2019-09-26', '%Y-%m-%d')
print(first_date)


We first import the datetime class from the datetime module. Then we call the method strptime() using the string containing the date we want to work with as its first argument. The second argument tells Python how the date is formatted. In this example, Python interprets '%Y-' to mean the part
of the string before the first dash is a four-digit year; '%m-' means the part of the string before the second dash is a number representing the month; and '%d' means the last part of the string is the day of the month, from 1 to 31.

When we run the program we get the following:

2019-09-26 00:00:00
------------------
(program exited with code: 0)

Press any key to continue . . .


The strptime() method can take a variety of arguments to determine how to interpret the date. Some of these arguments are:





Now let's improve our temperature data plot by extracting dates for the daily highs and passing those highs and dates to plot(). See the following program:

import csv
import matplotlib.pyplot as plt
from datetime import datetime
filename = 'data/sitka_weather_07-2018_simple.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
       
    # Get dates and high temperatures from this file.
    dates, highs = [], []
       
    for row in reader:
        current_date = datetime.strptime(row[2], '%Y-%m-%d')
        high = int(row[5])
        dates.append(current_date)
        highs.append(high)
      
# Plot the high temperatures.
plt.style.use('seaborn')
fig, ax = plt.subplots()
ax.plot(dates,highs, c='red')

# Format plot.
plt.title("Daily high temperatures, July 2018", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()

We create two empty lists to store the dates and high temperatures from the file.  We then convert the data containing the date information (row[2]) to a datetime object and append it to dates. We pass the dates and the high temperature values to plot(). The call to fig.autofmt_xdate() draws the date labels diagonally to prevent them from overlapping. The Figure shown blow shows the improved graph:


If we add more data we can get a more complete picture of the weather in Sitka. The file sitka_weather_2018_simple.csv, contains a full year’s worth of weather data for Sitka. So we'll use this file now. Just copy this file into the same folder where you saved the previous data source file. The following code generates a graph for the entire year’s weather:

import csv
import matplotlib.pyplot as plt
from datetime import datetime
filename = 'data/sitka_weather_2018_simple.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
       
    # Get dates and high temperatures from this file.
    dates, highs = [], []
       
    for row in reader:
        current_date = datetime.strptime(row[2], '%Y-%m-%d')
        high = int(row[5])
        dates.append(current_date)
        highs.append(high)       
       
# Plot the high temperatures.
plt.style.use('seaborn')
fig, ax = plt.subplots()
ax.plot(dates,highs, c='red')

# Format plot.
plt.title("Daily high temperatures - 2018", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()


We've only changed the data source file name and the plot title. The output is as shown below:


So far our plots were made using the high temperature values. In order to make our informative graph even more useful we can include the low temperatures. To do so, we need to extract the low temperatures from the data file and then add them to our graph as shown in the following program:

import csv
import matplotlib.pyplot as plt
from datetime import datetime
filename = 'data/sitka_weather_2018_simple.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
       
    # Get dates and high temperatures from this file.
    dates, highs, lows = [], [],[]
       
    for row in reader:
        current_date = datetime.strptime(row[2], '%Y-%m-%d')
        high = int(row[5])
        low = int(row[6])
        dates.append(current_date)
        highs.append(high)
        lows.append(low)
       
       
# Plot the high and low temperatures.
plt.style.use('seaborn')
fig, ax = plt.subplots()
ax.plot(dates,highs, c='red')
ax.plot(dates,lows, c='green')

# Format plot.
plt.title("Daily high and low temperatures - 2018", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()

In our previous program we add an empty list lows to hold low temperatures, and then extract and store the low temperature for each date from the seventh position in each row (row[6]). Next we add a call to plot() for the low temperatures and color these values green. Finally we update the title of the plot and add low temperature values. The output chart is shown below:


Our next step will be to add a finishing touch to the graph by using shading to show the range between each day’s high and low temperatures. To do so, we’ll use the fill_between() method, which takes a series of x-values and two series of y-values, and fills the space between the two y-value series as shown in the program below:

import csv
import matplotlib.pyplot as plt
from datetime import datetime
filename = 'data/sitka_weather_2018_simple.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
       
    # Get dates and high temperatures from this file.
    dates, highs, lows = [], [],[]
       
    for row in reader:
        current_date = datetime.strptime(row[2], '%Y-%m-%d')
        high = int(row[5])
        low = int(row[6])
        dates.append(current_date)
        highs.append(high)
        lows.append(low)
       
       
# Plot the high and low temperatures.
plt.style.use('seaborn')
fig, ax = plt.subplots()
ax.plot(dates,highs, c='red',alpha=0.5)
ax.plot(dates,lows, c='green',alpha=0.5)
plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.3)

# Format plot.
plt.title("Daily high and low temperatures - 2018", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()

The alpha argument controls a color’s transparency. An alpha value of 0 is completely transparent, and 1 (the default) is completely opaque. By setting alpha to 0.5, we make the red and green plot lines appear lighter. Next we pass fill_between() the list dates for the x-values and then the two y-value series highs and lows. The facecolor argument determines the color of the shaded region; we give it a low alpha value of 0.3 so the filled region connects the two data series without distracting from the information they represent. Following figure shows the plot with the shaded region between the highs and lows:



The shading helps make the range between the two data sets immediately apparent.

Here I am ending this post. In the next post we shall consider the condition when missing data can result in exceptions that crash our programs.



Share:

0 comments:

Post a Comment