Pandas - 37 Data Visualization- Chart Typology (Histograms and other charts) ~ Python is easy to learn

Histogram

A histogram is commonly used in statistical studies about distribution of samples. It consists of adjacent rectangles erected on the x-axis, split into discrete intervals called bins, and with an area proportional to the frequency of the occurrences for that bin.

pyplot provides a special function called hist() to represent a histogram. The hist() function, in addition to drawing the histogram, returns a tuple of values that are the results of the calculation of the histogram. The hist() function can also implement the calculation of the histogram, that is, it is sufficient to provide a series of samples of values as an argument and the number of bins in which to be divided, and it will take care of dividing the range of samples in many intervals (bins), and then calculate the occurrences for each bin. The result of this operation, in addition to being shown in graphical form, will be returned in the form of a tuple. See the following program:

pop = np.random.randint(0,100,100)
n,bins,patches = plt.hist(pop,bins=20)
plt.show()

In the above program we generate a population of 100 random values from 0 to 100 using the random.randint() function. Then we create the histogram of these samples by passing as an argument the hist() function. For instance, we want to divide the occurrences in 20 bins (if not specified,
the default value is 10 bins) and to do that we have to use the kwarg bin as shown below in the output of the program :

Bar Charts

Vertical Bar Charts

A Bar Chart is very similar to a histogram but in this case the x-axis is not used to reference numerical values but categories. The realization of the bar chart is done using the bar() function as shown in the following program:

index = [0,1,2,3,4]
values = [5,7,3,4,6]
plt.bar(index,values)
plt.show()

The output of the program is shown below:

We can see from the output that the indices are drawn on the x-axis at the beginning of each bar. As each bar corresponds to a category, it would be better if we specify the categories through the tick label, defined by a list of strings passed to the xticks() function. As for the location of these tick labels,we have to pass a list containing the values corresponding to their positions on the x-axis as the
first argument of the xticks() function. See the following program:

index = np.arange(5)
values = [5,7,3,4,6]
plt.bar(index,values)
plt.xticks(index+0.4,['A','B','C','D','E'])
plt.show()

The output of the program is shown below:

We can refine the bar chart by adding a specific kwarg as an argument in the bar() function. For example, We can add the standard deviation values of the bar through the yerr kwarg along with a list containing the standard deviations. This kwarg is usually combined with another kwarg called error_kw, which, in turn, accepts other kwargs specialized for representing error bars. Two very specific kwargs used in this case are eColor, which specifies the color of the error bars, and capsize, which defines the width of the transverse lines that mark the ends of the error bars.

Another kwarg that we can use is alpha, which indicates the degree of transparency of the colored bar. Alpha is a value ranging from 0 to 1. When this value is 0 the object is completely transparent to become gradually more significant with the increase of the value, until arriving at 1, at which the color is fully represented.

The use of a legend is recommended, so in this case we use a kwarg called label to identify the series that you are representing. See the following program:

index = np.arange(5)
values = [5,7,3,4,6]
std1 = [0.8,1,0.4,0.9,1.3]
plt.title('A Bar Chart')

plt.bar(index,values,yerr=std1,error_kw={'ecolor':'0.1',
'capsize':6},alpha=0.7,label='First')
plt.xticks(index+0.4,['A','B','C','D','E'])
plt.legend(loc=2)
plt.show()

The output of the program is shown below:

Horizontal Bar Charts

Just like vertically oriented bar charts there are also bar chart oriented horizontally. This mode is implemented by a special function called barh(). The arguments and the kwargs valid for the bar() function remain the same for this function.

The only change that we have to take into account is that the roles of the axes are reversed. See the following program:

index = np.arange(5)
values = [5,7,3,4,6]
std1 = [0.8,1,0.4,0.9,1.3]
plt.title('A Horizontal Bar Chart')

plt.barh(index,values,xerr=std1,error_kw={'ecolor':'0.1',
'capsize':6},alpha=0.7,label='First')

plt.yticks(index+0.4,['A','B','C','D','E'])

plt.legend(loc=5)
plt.show()

Now, the categories are represented on the y-axis and the numerical values are shown on the x-axis as shown in output of the program:

Multiserial Bar Charts

We can use bar charts to simultaneously display larger series of values. But in this case it is necessary to make some clarifications on how to structure a multiseries bar chart.

So far we have defined a sequence of indexes, each corresponding to a bar, to be assigned to the x-axis. These indices should represent categories. In this case, however, you have more bars that must share the same category.

One approach used to overcome this problem is to divide the space occupied by an index (for convenience its width is 1) in as many parts as are the bars sharing that index and that we want to display. It is advisable to add space, which will serve as a gap to separate a category with respect to the next. See the following program:

index = np.arange(5)
values1 = [5,7,3,4,6]
values2 = [6,6,4,5,7]
values3 = [5,6,5,4,6]
bw = 0.3
plt.axis([0,5,0,8])
plt.title('A Multiseries Bar Chart',fontsize=20)
plt.bar(index,values1,bw,color='b')
plt.bar(index+bw,values2,bw,color='g')
plt.bar(index+2*bw,values3,bw,color='r')
plt.xticks(index+1.5*bw,['A','B','C','D','E'])
plt.show()

The output of the program is shown below:

To have a multiseries horizontal bar chart we have to replace the bar() function with the corresponding barh() function and replace the xticks() function with the yticks() function. We also need to reverse the range of values covered by the axes in the axis() function. See the following program:

index = np.arange(5)
values1 = [5,7,3,4,6]
values2 = [6,6,4,5,7]
values3 = [5,6,5,4,6]
bw = 0.3
plt.axis([0,8,0,5])
plt.title('A Multiseries Horizontal Bar Chart',fontsize=20)
plt.barh(index,values1,bw,color='b')
plt.barh(index+bw,values2,bw,color='g')
plt.barh(index+2*bw,values3,bw,color='r')
plt.yticks(index+0.4,['A','B','C','D','E'])
plt.show()

The output of the program is shown below:

The matplotlib library also provides the ability to directly represent the dataframe objects containing the results of data analysis in the form of bar charts. We need to use the plot() function applied to the dataframe object and specify inside a kwarg called kind to which we have to assign the type of chart we want to represent, which in this case is bar. See the following program:

data = {'series1':[1,3,4,3,5],
        'series2':[2,4,5,2,4],
        'series3':[3,2,3,1,3]}

df = pd.DataFrame(data)
df.plot(kind='bar')
plt.show()

The output of the program is shown below:

If required we can extract portions of the dataframe as NumPy arrays and use them by passing them separately as arguments to the matplotlib functions.

To get the horizontal bar chart, set barh as the value of the kind kwarg as shown in the program below:

data = {'series1':[1,3,4,3,5],
        'series2':[2,4,5,2,4],
        'series3':[3,2,3,1,3]}

df = pd.DataFrame(data)
df.plot(kind='barh')
plt.show()

The output of the program is shown below:

Multiseries Stacked Bar Charts

A multiseries Stacked Bar Charts as the name suggests, is a form to represent a multiseries bar chart is in the stacked form which is especially useful when you want to show the total value obtained by the sum of all the bars.

We can transform a simple multiseries bar chart in a stacked one, by adding the bottom kwarg to each bar() function. Each series must be assigned to the corresponding bottom kwarg. See the following program:

series1 = np.array([3,4,5,3])
series2 = np.array([1,2,2,5])
series3 = np.array([2,3,3,4])
index = np.arange(4)

plt.axis([-0.5,3.5,0,15])
plt.title('A Multiseries Stacked Bar Chart')
plt.bar(index,series1,color='r')
plt.bar(index,series2,color='b',bottom=series1)
plt.bar(index,series3,color='g',bottom=(series2+series1))
plt.xticks(index+0.4,['Jan18','Feb18','Mar18','Apr18'])
plt.show()

The output of the program is shown below:

As we've been doing, in order to create the equivalent horizontal stacked bar chart, we need
to replace the bar() function with barh() function and change the other parameters as well. Also the xticks() function should be replaced with the yticks() function because the labels of the categories must now be reported on the y-axis.

See the following program:

index = np.arange(4)
series1 = np.array([3,4,5,3])
series2 = np.array([1,2,2,5])
series3 = np.array([2,3,3,4])

plt.axis([0,15,-0.5,3.5])
plt.title('A Multiseries Horizontal Stacked Bar Chart')
plt.barh(index,series1,color='r')
plt.barh(index,series2,color='g',left=series1)
plt.barh(index,series3,color='b',left=(series1+series2))
plt.yticks(index+0.4,['Jan18','Feb18','Mar18','Apr18'])
plt.show()

The output of the program is shown below:

Apart from using colors, another mode of distinction between the various series is to use hatches that allow us to fill the various bars with strokes drawn in a different way.

To do this, we first set the color of the bar as white and then you have to use the hatch kwarg to define how the hatch is to be set. The various hatches have codes distinguishable among these characters (|, /, -, \, *, -) corresponding to the line style filling the bar. The more a symbol is replicated, the denser the lines forming the hatch will be. For example, /// is more dense than //, which is more dense than / . See the following program :

index = np.arange(4)
series1 = np.array([3,4,5,3])
series2 = np.array([1,2,2,5])
series3 = np.array([2,3,3,4])

plt.axis([0,15,-0.5,3.5])
plt.title('A Multiseries Horizontal Stacked Bar Chart')
plt.barh(index,series1,color='w',hatch='xx')
plt.barh(index,series2,color='w',hatch='///', left=series1)
plt.barh(index,series3,color='w',hatch='\\\\\\',left=(series1+series2))
plt.yticks(index+0.4,['Jan18','Feb18','Mar18','Apr18'])
plt.show()

The output of the program is shown below:

To represent the values contained in the dataframe object we use the plot() function. Also we need to add as an argument the stacked kwarg set to True as shown in the following program :

data = {'series1':[1,3,4,3,5],
        'series2':[2,4,5,2,4],
        'series3':[3,2,3,1,3]}

df = pd.DataFrame(data)
df.plot(kind='bar', stacked=True)
plt.show()

The output of the program is shown below:

Another very useful representation is that of a bar chart for comparison, where two series of values sharing the same categories are compared by placing the bars in opposite directions along the y-axis. In order to do this, you have to put the y values of one of the two series in a negative form.

There is the possibility of coloring the inner color of the bars in a different way by setting the two different colors on a specific kwarg: facecolor.

See the following program :

x0 = np.arange(8)
y1 = np.array([1,3,4,6,4,3,2,1])
y2 = np.array([1,2,5,4,3,3,2,1])
plt.ylim(-7,7)
plt.bar(x0,y1,0.9,facecolor='r')
plt.bar(x0,-y2,0.9,facecolor='b')
plt.xticks(())
plt.grid(True)
for x, y in zip(x0, y1):
    plt.text(x + 0.4, y + 0.05, '%d' % y, ha='center', va= 'bottom')

for x, y in zip(x0, y2):
    plt.text(x + 0.4, -y - 0.05, '%d' % y, ha='center', va= 'top')

plt.show()

The output of the program is shown below:

For pie chart also we can represent the values contained within a dataframe object but the pie chart can represent only one series at a time. See the following program :

data = {'series1':[1,3,4,3,5],
        'series2':[2,4,5,2,4],
        'series3':[3,2,3,1,3]}

df = pd.DataFrame(data)
plt.title('Pie Charts with a pandas Dataframe')
df['series1'].plot(kind='pie',figsize=(6,6))
plt.show()

In the above program we display only the values of the first series specifying df['series1']. We have to specify the type of chart we want to represent through the kind kwarg in the plot() function, which in this case is pie. As we want to represent a pie chart as perfectly circular, it is necessary to add the figsize kwarg. The output of the program is shown below which shows that the values in a pandas dataframe can be directly drawn as a pie chart:

Contour Plots

Contour Plots are suitable for displaying three-dimensional surfaces through a contour map composed of curves closed showing the points on the surface that are located at the same level, or that have the same z value.

To implement these plots, first, we need the function z = f (x, y) for generating a three-dimensional surface. Then, once we have defined a range of values x, y that will define the area of the map to be displayed, we can calculate the z values for each pair (x, y), applying the function f (x, y) just defined in order to obtain a matrix of z values. Finally, using the contour() function, we can generate the contour map of the surface. It is often desirable to add also a color map along with a contour map.
That is, the areas delimited by the curves of level are filled by a color gradient, defined by a color map as shown in the following program :

dx = 0.01; dy = 0.01
x = np.arange(-2.0,2.0,dx)
y = np.arange(-2.0,2.0,dy)

X,Y = np.meshgrid(x,y)
def f(x,y):
return (1 - y**5 + x**5)*np.exp(-x**2-y**2)

C = plt.contour(X,Y,f(X,Y),8,colors='black')
plt.contourf(X,Y,f(X,Y),8)
plt.clabel(C, inline=1, fontsize=10)
plt.show()

The output of the program is shown below:

We can choose among a large number of color maps available just specifying them with the
cmap kwarg. Sometimes adding a color scale as a reference to the side of the graph is almost a must. This is done by adding the colorbar() function at the end of the code. See the following program :

dx = 0.01; dy = 0.01
x = np.arange(-2.0,2.0,dx)
y = np.arange(-2.0,2.0,dy)

X,Y = np.meshgrid(x,y)
def f(x,y):
return (1 - y**5 + x**5)*np.exp(-x**2-y**2)

C = plt.contour(X,Y,f(X,Y),8,colors='black')
plt.contourf(X,Y,f(X,Y),8,cmap=plt.cm.hot)
plt.clabel(C, inline=1, fontsize=10)
plt.colorbar()
plt.show()

The output of the program is shown below:

Polar Charts

Polar Charts are characterized by a series of sectors that extend radially; each of these areas will
occupy a certain angle. Thus we can display two different values assigning them to the magnitudes that characterize the polar chart: the extension of the radius r and the angle θ occupied by the sector.

These in fact are the polar coordinates (r, θ), an alternative way of representing functions at the coordinate axes. We can say that it as a kind of chart that has characteristics both of the pie chart and of the bar chart.

In fact as the pie chart, the angle of each sector gives percentage information represented by that category with respect to the total. As for the bar chart, the radial extension is the numerical value of that category.

We have used the standard set of colors using single characters as the color code (e.g., r to indicate red). In fact we can use any sequence of colors we want by defining a list of string values that contain RGB codes in the #rrggbb format corresponding to the colors we want.

Finally to get a polar chart we use the bar() function and pass the list containing the angles θ and a list of the radial extension of each sector. The following program shows how to create a polar chart:

N = 8
theta = np.arange(0.,2 * np.pi, 2 * np.pi / N)
radii = np.array([4,7,5,3,1,5,6,7])
plt.axes([0.025, 0.025, 0.95, 0.95], polar=True)
colors = np.array(['#4bb2c5', '#c5b47f', '#EAA228', '#579575',
'#839557', '#958c12', '#953579', '#4b5de4'])
bars = plt.bar(theta, radii, width=(2*np.pi/N), bottom=0.0,
color=colors)
plt.show()

The output of the program is shown below:

We can also specify a sequence of colors as strings with their actual name rather than using the format #rrggbb as shown in the following program :

N = 8
theta = np.arange(0.,2 * np.pi, 2 * np.pi / N)
radii = np.array([4,7,5,3,1,5,6,7])
plt.axes([0.025, 0.025, 0.95, 0.95], polar=True)
colors = np.array(['lightgreen', 'darkred', 'navy', 'brown',
'violet', 'plum', 'yellow', 'darkgreen'])
bars = plt.bar(theta, radii, width=(2*np.pi/N), bottom=0.0,
color=colors)
plt.show()

The output of the program is shown below:

Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!

Python is easy to learn

Wednesday, May 15, 2019

Pandas - 37 Data Visualization- Chart Typology (Histograms and other charts)

0 comments:

Post a Comment