Tuesday, May 7, 2019

Pandas - 31 (Data Visualization- The matplotlib Library)

matplotlib is a Python library specializing in the development of two-dimensional charts (including 3D charts). It is the most preferred tool in the graphical representation of data because  of:

• Extreme simplicity in its use
• Gradual development and interactive data visualization
• Expressions and text in LaTeX
• Greater control over graphic elements
• Export to many formats, such as PNG, PDF, SVG, and EPS

matplotlib takes full advantage of the potential that Python programming languages offer. matplotlib looks like a graphics library that allows us to programmatically manage the graphic elements that make up a chart so that the graphical display can be controlled in its entirety. The ability to program
the graphical representation allows management of the reproducibility of the data representation across multiple environments and especially when we make changes or when the data is updated. With regard to data analysis, matplotlib normally cooperates with a set of other libraries such as NumPy and pandas, but many other libraries can be integrated without any problem.

The graphical representations obtained through encoding with this library can be exported in the most common graphic formats (such as PNG and SVG) and then be used in other applications, documentation, web pages, etc.

Installation

There are many options for installing the matplotlib library. If you choose to use a
distribution of packages like Anaconda or Enthought Canopy, installing the matplotlib
package is very simple. For example, with the conda package manager, you have to enter
the following:
conda install matplotlib
If you want to directly install this package, the commands to insert vary depending
on the operating system.
On Debian-Ubuntu Linux systems, use this:
sudo apt-get install python-matplotlib
On Fedora-Redhat Linux systems, use this:
sudo yum install python-matplotlib
On Windows or MacOS, you should use pip for installing matplotlib.

The matplotlib Architecture

matplotlib provide a set of functions and tools that allow representation and manipulation of a Figure (the main object), along with all internal objects of which it is composed. However, matplotlib not only deals with graphics but also provides all the tools for the event handling and the ability to animate graphics. matplotlib proves to be capable of producing interactive charts based on the events triggered by pressing a key on the keyboard or on mouse movement.

The architecture of matplotlib is logically structured into three layers, which are placed at three different levels-



The communication is unidirectional, that is, each layer can communicate with the underlying layer, while the lower layers cannot communicate with the top ones. The three layers are as follows:

• Scripting
• Artist
• Backend

Backend Layer

In the diagram of the matplotlib architecture, the layer that works at the lowest level is the Backend layer. This layer contains the matplotlib APIs, a set of classes that play the role of implementation of the graphic elements at a low level.

• FigureCanvas is the object that embodies the concept of drawing area.
• Renderer is the object that draws on FigureCanvas.
• Event is the object that handles user inputs (keyboard and mouse events).

Artist Layer

The intermediate layer is called Artist. All the elements that make up a chart, such as the title, axis labels, markers, etc., are instances of the Artist object. Each of these instances plays its role within a hierarchical structure as shown below:




There are two Artist classes: primitive and composite.

• The primitive artists are individual objects that constitute the basic elements to form a graphical representation in a plot, for example a Line2D, or as a geometric figure such as a Rectangle or Circle, or even pieces of text.

• The composite artists are those graphic elements present in a chart that are composed of several base elements, namely, the primitive artists. Composite artists are for example the Axis, Ticks, Axes, and Figures as shown below-



When working at this level we often deal with objects in higher hierarchy as Figure, Axes, and Axis. So it is important to fully understand what these objects are and what role they play within the graphical representation. The diagram above shows the three main Artist objects (composite artists) that are generally used in all implementations performed at this level. These objects are:

• Figure is the object with the highest level in the hierarchy. It corresponds to the entire graphical representation and generally can contain many Axes.

• Axes is generally what you mean as plot or chart. Each Axis object belongs to only one Figure, and is characterized by two Artist Axis (three in the three-dimensional case). Other objects, such as the title,the x label, and the y label, belong to this composite artist.

• Axis objects that take into account the numerical values to be represented on Axes, define the limits and manage the ticks (the mark on the axes) and tick labels (the label text represented on each tick). The position of the tick is adjusted by an object called a Locator while the formatting tick label is regulated by an object called a Formatter.

Scripting Layer (pyplot)

Artist classes and their related functions (the matplotlib API) are particularly suitable to all developers, especially for those who work on web application servers or develop the GUI. But for purposes of calculation, and in particular for the analysis and visualization of data, the scripting layer is best. This layer consists of an interface called pyplot. The pyplot is an internal module of matplotlib.

There is another module pylab that is installed along with matplotlib. Pylab combines the functionality of pyplot with the capabilities of NumPy in a single namespace, and therefore we do not need to import NumPy separately. Also ,if we import pylab, pyplot and NumPy functions can be called directly without any reference to a module (namespace), making the environment more similar to MATLAB.

The pyplot package provides the classic Python interface for programming the matplotlib library, has its own namespace, and requires the import of the NumPy package separately. The pyplot module is a collection of command-style functions that allow you to use matplotlib much like MATLAB. Each pyplot function will operate or make some changes to the Figure object, for example, the creation of the Figure itself, the creation of a plotting area, representation of a line, decoration of the plot with a label, etc.

Pyplot also is stateful, in that it tracks the status of the current figure and its plotting area. The functions called act on the current figure.

Now that we have enough information about the matplotlib library, let's use it to create a simple interactive chart. See the following program:

import matplotlib.pyplot as plt

plt.plot([1,2,3,4])


plt.show()

First we need to import the pyplot package and rename it as plt. In Python, the constructors generally are not necessary; everything is already implicitly defined. Thus when we import the package, the plt object with all its graphics capabilities have already been instantiated and ready to use. We use the plot() function to pass the values to be plotted. Thus we simply pass the values we want to represent as a sequence of integers. If you are curious to know what kinda object has been created just use the print(plt.plot([1,2,3,4])) code. The output will be something like [<matplotlib.lines.Line2D at 0xa3eb438>] which indicates a Line2D object has been generated. The object is a line that represents the linear trend of the points included in the chart. To show the plot on screen we use the show() function.

The output of the program is shown below. It looks just a window, called the plotting window, with a toolbar and the plot represented within it.







If only a list of numbers or an array is passed to the plt.plot() function, matplotlib assumes it is the sequence of y values of the chart, and it associates them to the natural sequence of values x: 0,1,2,3, ... .

Generally a plot represents value pairs (x, y), so to define a chart correctly, we must define two arrays, the first containing the values on the x-axis and the second containing the values on the y-axis. Moreover, the plot() function can accept a third argument, which describes the specifics of how you want the point to be represented on the chart.

Setting the Properties of the Plot

The plot is represented taking into account a default configuration of the plt.plot() function:

• The size of the axes matches perfectly with the range of the input data
• There is neither a title nor axis labels
• There is no legend
• A blue line connecting the points is drawn

Let's change this representation to have a real plot in which each pair of values (x, y) is represented by a red dot See the following program:

import matplotlib.pyplot as plt

plt.plot([1,2,3,4],[1,4,9,16],'ro')
plt.show()



The output of the program is shown below:
Now let's set some other properties like define the range both on the x-axis and on the y-axis, enter a title for the plot. See the following program:

import matplotlib.pyplot as plt

plt.axis([0,5,0,20])
plt.title('First pyplot')
plt.plot([1,2,3,4],[1,4,9,16],'ro')
plt.show()


We can define the range both on the x-axis and on the y-axis by defining the details of a list [xmin, xmax, ymin, ymax] and then passing it as an argument to the axis() function. So our program defined the range of (0,5) for the x axis and (0,20) for the y axis. The title that can be entered using the title() function.

The output of the program is shown below:

Thus by setting properties we can make the plot more readable as now the end points of the dataset are now represented within the plot rather than at the edges. Also the title of the plot is now visible at the top.


Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment