Tuesday, July 30, 2019

Data visualization using Matplotlib (Plotting a Simple Line Graph)

Data visualization is closely associated with data analysis, which uses code to explore the patterns and connections in a data set. A data set can be made up of a small list of numbers that fits in one line of code or it can be many gigabytes of data.

Data visualization involves exploring data through visual representations. When a representation of a data set is simple and visually appealing, its meaning becomes clear to viewers. People will see patterns and significance in your data sets that they never knew existed. With Python’s efficiency, we can quickly explore data sets made of millions of individual data points on just a laptop. Also, the data points don’t have to be numbers. We can analyze nonnumerical data as well. Generally we use Python for data-intensive work in genetics, climate research, political and economic analysis, and much more.

Data scientists have written an impressive array of visualization and analysis tools in Python, one of the most popular tools is Matplotlib, a mathematical plotting library. We’ve used Matplotlib in previous post and will continue to use it to make simple plots, such as line graphs and scatter plots.

Plotting a Simple Line Graph

Our objective is to plot a simple line graph using Matplotlib, and then customize it to create a more informative data visualization. We’ll use the cubic number sequence 1,8,27,64,125 as the data for the graph. See the code below:

import matplotlib.pyplot as plt

cubic_values = [1,8,27,64,125]
fig,ax = plt.subplots()
ax.plot(cubic_values)
plt.show()

Now let's understand our code, the first line imports the pyplot module using the alias plt so we don’t have to type pyplot repeatedly. The pyplot module is required as it contains a number of functions that generate charts and plots.

Next we create a list cubic_values which contains the data to be plotted. As per Matplotlib convention we call the subplots() function which can generate one or more plots in the same figure. The variable fig represents the entire figure or collection of plots that are generated. The variable ax represents a single plot in the figure and is the variable we’ll use most of the time to plot the data using the plot() method. Finally the function plt.show() opens Matplotlib’s viewer and displays the plot as shown in the figure below:


The output shows the cubic values in the increasing order. If you notice the label type is too small and the line is a little thin to read easily.Let's adjust every feature of this visualization. See the code below:

import matplotlib.pyplot as plt

cubic_values = [1,8,27,64,125]
fig,ax = plt.subplots()
ax.plot(cubic_values, linewidth=3)

# Setting chart title and label axes
ax.set_title("Cube Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Cube of Value", fontsize=14)
plt.show()

The linewidth parameter controls the thickness of the line that plot() generates. The set_title() method sets a title for the chart. The fontsize parameters, which appear repeatedly throughout the code, control the size of the text in various elements on the chart.

The set_xlabel() and set_ylabel() methods allow you to set a title for each of the axes as shown in the figure below:
The following code style the tick marks on both the axes:

# Set size of tick labels.
ax.tick_params(axis='both', labelsize=14)
plt.show()


The method tick_params() styles the tick marks depending on the argument for axis variable which in our case is both. Hence it affect the tick marks on both the x- and y-axes (axis='both') and also we set the font size of the tick mark labels to 14 using labelsize variable (labelsize=14). The final output is shown below:

Did you notice from the above plot that Cube of 4 is 125, Phew! Let's fix this. When you give plot() a sequence of numbers, it assumes the first data point corresponds to an x-coordinate value of 0, but our first point corresponds to an x-value of 1. We can override the default behavior by giving plot() the input and output values used to calculate the cubes. See the code below:

import matplotlib.pyplot as plt

input_values = [1, 2, 3, 4, 5]
cubic_values = [1,8,27,64,125]
fig,ax = plt.subplots()
ax.plot(input_values,cubic_values, linewidth=3)

# Setting chart title and label axes
ax.set_title("Cube Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Cube of Value", fontsize=14)

# Set size of tick labels.
ax.tick_params(axis='both', labelsize=14)
plt.show()

Now plot() will graph the data correctly because we’ve provided the input and output values, so it doesn’t have to assume how the output numbers were generated. The resulting plot, shown in Figure below, is correct.

We can specify numerous arguments when using plot() and use a number of functions to customize your plots. We’ll continue to explore these customization functions as we work with more interesting data sets in the coming posts.




Share:

0 comments:

Post a Comment