Wednesday, May 15, 2019

Pandas - 36 (Data Visualization- Chart Typology)

Knowing how to choose the proper type of chart is a fundamental choice. Remember that excellent data analysis represented incorrectly can lead to a wrong interpretation of the experimental results. After getting familiar with the use of the main graphic elements in a chart, it is time to see a series of examples treating different types of charts, starting from the most common ones such as linear charts, bar charts, and pie charts, up to a discussion about some that are more sophisticated but commonly used nonetheless.

1. Line Charts

The linear chart is the simplest among all the chart types. A line chart is a sequence of data points connected by a line. Each data point consists of a pair of values (x,y), which will be reported in the chart according to the scale of values of the two axes (x and y).

The following program creates a line chart:

import matplotlib.pyplot as plt
import math
import numpy as np

x = np.arange(-2*np.pi,2*np.pi,0.01)
y = np.sin(3*x)/x
plt.plot(x,y)
plt.show()


We begin to plot the points generated by a mathematical function. Then, we can consider a generic mathematical function:

y = sin (3 * x) / x

To create a sequence of data points, we need to create two NumPy arrays. First we create an array containing the x values to be referred to the x-axis. In order to define a sequence of increasing values we use the np.arange() function. Since the function is sinusoidal we should refer to values that are multiples and submultiples of the Greek pi (np.pi). Then, using these sequence of values, we can
obtain the y values applying the np.sin() function directly to these values. Finally we plot them by calling the plot() function. The output of the program is shown below: 
Now we display a family of functions: y = sin (n * x) / x , varying the parameter n. See the following program:

x = np.arange(-2*np.pi,2*np.pi,0.01)
y = np.sin(3*x)/x
y2 = np.sin(2*x)/x
y3 = np.sin(4*x)/x
plt.plot(x,y)
plt.plot(x,y2)
plt.plot(x,y3)
plt.show()


The output of the program is shown below:
All the plots are represented on the same scale; that is, the data points of each series refer to the same x-axis and y-axis. This is because each call of the plot() function takes into account the previous calls to same function, so the Figure applies the changes keeping memory of the previous commands until the Figure is not displayed. Did you notice a different color is automatically assigned to each line. We can select the type of stroke, color, etc. As the third argument of the plot() function we can specify some codes that correspond to the color and other codes that correspond to line styles, all included in the same string. Another possibility is to use two kwargs separately, color to define the color, and linestyle to define the stroke.

See the following program:
x = np.arange(-2*np.pi,2*np.pi,0.01)
y = np.sin(3*x)/x
y2 = np.sin(2*x)/x
y3 = np.sin(4*x)/x
plt.plot(x,y,'k--',linewidth=3)
plt.plot(x,y2,'m-.')
plt.plot(x,y3,color='#87a3cc',linestyle='--')
plt.show()

The output of the program is shown below:
As seen from the above chart we can define colors and line styles using character codes. Some common Color Codes are:

 Code Color
     b    blue
     g    green
     r     red
     c    cyan
     m   magenta
     y    yellow
     k    black
     w   white

In the previous programs we defined a range from -2π to 2π on the x-axis, but by default, values on
ticks are shown in numerical form. Therefore we need to replace the numerical values with multiple of π. We can also replace the ticks on the y-axis. To do all this, we have to use xticks() and yticks() functions, passing to each of them two lists of values.

The first list contains values corresponding to the positions where the ticks are to be placed, and the second contains the tick labels. In this particular case, we have to use strings containing LaTeX format in order to correctly display the symbol π. We define them within two $ characters and to add a r as the prefix. See the following program:

x = np.arange(-2*np.pi,2*np.pi,0.01)
y = np.sin(3*x)/x
y2 = np.sin(2*x)/x
y3 = np.sin(x)/x
plt.plot(x,y,color='b')
plt.plot(x,y2,color='r')
plt.plot(x,y3,color='g')
plt.xticks([-2*np.pi, -np.pi, 0, np.pi, 2*np.pi],
[r'$-2\pi$',r'$-\pi$',r'$0$',r'$+\pi$',r'$+2\pi$'])
plt.yticks([-1,0,1,2,3],
[r'$-1$',r'$0$',r'$+1$',r'$+2$',r'$+3$'])
plt.show()

The output of the program is a line chart showing Greek characters as shown below:

The charts we plotted so far always have the x-axis and y-axis placed at the edge of the figure (corresponding to the sides of the bounding border box). Another way of displaying axes is to have the two axes passing through the origin (0, 0), i.e., the two Cartesian axes.

To do this, we must first capture the Axes object through the gca() function. Then through this object, you can select each of the four sides making up the bounding box, specifying for each one its position: right, left, bottom, and top. Crop the sides that do not match any axis (right and bottom) using the set_color() function and indicating none for color. Then, the sides corresponding to the x- and y-axes are moved to pass through the origin (0,0) with the set_position() function. See the following program:

x = np.arange(-2*np.pi,2*np.pi,0.01)
y = np.sin(3*x)/x
y2 = np.sin(2*x)/x
y3 = np.sin(x)/x
plt.plot(x,y,color='b')
plt.plot(x,y2,color='r')
plt.plot(x,y3,color='g')
plt.xticks([-2*np.pi, -np.pi, 0, np.pi, 2*np.pi],
[r'$-2\pi$',r'$-\pi$',r'$0$',r'$+\pi$',r'$+2\pi$'])
plt.yticks([-1,0,1,2,3],
[r'$-1$',r'$0$',r'$+1$',r'$+2$',r'$+3$'])
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))

plt.show()

The output of the program is shown below which shows the chart with the two axes crossing in the middle of the figure, that is, the origin of the Cartesian axes: 
Sometimes it is very useful to be able to specify a particular point of the line using a notation and optionally add an arrow to better indicate the position of the point. For example, this notation may be a LaTeX expression, such as the formula for the limit of the function sinx/x with x tends to 0.

In this regard, matplotlib provides a function called annotate(), which is especially useful in these cases, even if the numerous kwargs needed to obtain a good result can make its settings quite complex. The first argument is the string to be represented containing the expression in LaTeX; then we can add the various kwargs. The point of the chart to note is indicated by a list containing the coordinates of the point [x, y] passed to the xy kwarg. The distance of the textual notation from the point to be highlighted is defined by the xytext kwarg and represented by means of a curved arrow whose characteristics are defined in the arrowprops kwarg. See the following program:

x = np.arange(-2*np.pi,2*np.pi,0.01)
y = np.sin(3*x)/x
y2 = np.sin(2*x)/x
y3 = np.sin(x)/x
plt.plot(x,y,color='b')
plt.plot(x,y2,color='r')
plt.plot(x,y3,color='g')
plt.xticks([-2*np.pi, -np.pi, 0, np.pi, 2*np.pi],
[r'$-2\pi$',r'$-\pi$',r'$0$',r'$+\pi$',r'$+2\pi$'])
plt.yticks([-1,0,1,2,3],
[r'$-1$',r'$0$',r'$+1$',r'$+2$',r'$+3$'])

plt.annotate(r'$\lim_{x\to 0}\frac{\sin(x)}{x}= 1$', xy=[0,1],
xycoords='data',xytext=[30,30],fontsize=16,textcoords='offset points',
arrowprops=dict(arrowstyle="->",connectionstyle="arc3,rad=.2"))

ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))

plt.show()

 
The output of the program is shown below:

We get the chart with the mathematical notation of the limit, which is the point shown by the arrow.

2.  Line Charts with pandas

The visualization of the data in a dataframe as a linear chart is a very simple operation. We have to pass the dataframe as an argument to the plot() function to obtain a multiseries linear chart as shown in the following program:

import matplotlib.pyplot as plt
import math
import numpy as np
import pandas as pd

data = {'series1':[1,3,4,3,5],
'series2':[2,4,5,2,4],
'series3':[3,2,3,1,3]}

df = pd.DataFrame(data)
x = np.arange(5)
plt.axis([0,5,0,7])
plt.plot(x,df)
plt.legend(data, loc=2)
plt.show()


The output of the program is shown below which shows the multiseries line chart displaying the data within a pandas dataframe:

Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment