Tuesday, April 30, 2019

Pandas - 26 (Permutation and Random Sampling)

Permutation is random reordering of a series or the rows of a dataframe. These operations are easy to do using the numpy.random.permutation() function. Let's see an example of how permutation is performed. See the following program: import pandas as pdimport numpy as npmydataframe = pd.DataFrame(np.arange(25).reshape(5,5))print('\nThe original dataframe\n')print(mydataframe)new_order = np.random.permutation(5)print('\nThe...
Share:

Monday, April 29, 2019

Pandas - 25 (Discretization and Binning)

Discretization is a complex process of transformation which is used in experimental cases, to handle large quantities of data generated in sequence. To carry out an analysis of the data, it is necessary to transform this data into discrete categories, for example - 1. by dividing the range of values of such readings into smaller intervals and counting the occurrence or statistics in them. 2. Another...
Share:

Sunday, April 28, 2019

Pandas - 24 (Data Transformation)

By now we have a decent background on how to prepare data for analysis and ready to begin the second stage of data manipulation: the data transformation. After we arrange the form of data and their disposal within the data structure, it is important to transform their values. We will see some common issues and the steps required to overcome them using functions of the pandas library. Some of these...
Share:

Friday, April 26, 2019

Pandas - 23 (Pivoting)

Pivoting is a common operation  just like data assembling. In the context of pivoting, we have two basic operations: • Stacking—Rotates or pivots the data structure converting columns to rows • Unstacking—Converts rows into columns Using the stack() function on the dataframe, we will get the pivoting of the columns in rows, thus producing a series. From this hierarchically indexed series,...
Share:

Thursday, April 25, 2019

Pandas - 22 (Concatenating)

Concatenation is another type of data combination and NumPy provides a concatenate() function to do this kind of operation with arrays. See the following program : import pandas as pdimport numpy as nparray1 = np.arange(9).reshape((3,3))print('Array 1\n')print(array1)array2 = np.arange(9).reshape((3,3))+6print('\nArray 2\n')print(array2)print('\nConcatenated array axis=1\n')print(np.concatenate([array1,array2],axis=1))print('\nConcatenated...
Share:

Wednesday, April 24, 2019

Pandas - 21 (Data Manipulation)

In the previous posts we learned how to acquire data from data sources such as databases and files. Once we have the data in the dataframe format, they are ready to be manipulated. It’s important to prepare the data so that they can be more easily subjected to analysis and manipulation. Especially in preparation for the next phase, the data must be ready for visualization. In the coming posts we'll...
Share:

Tuesday, April 23, 2019

Pandas - 20 (Interacting with Databases)

Usually data are stored in databases (SQL-based relational database and NoSQL databases). For loading data from SQL in a dataframe pandas has some functions to simplify the process. The pandas.io.sql module provides a unified interface independent of the DB, called sqlalchemy. This interface simplifies the connection mode, since regardless of the DB, the commands will always be the same. To make...
Share: