Monday, March 25, 2019

Pandas -1 (Introduction)

pandas is an open source Python library for highly specialized data analysis, currently the reference point that all professionals using the Python language need to study for the statistical purposes of analysis and decision making. It is designed on the basis of the NumPy library and provides all the instruments for data processing, data extraction, and data manipulation.

In pandas two ad hoc data structures for data analysis were designed which instead of using existing data structures built into Python or provided by other libraries, work with relational data or labeled data, thus allowing us to manage data with features similar to those designed for SQL relational
databases and Excel spreadsheets.

For data analysis there is a series of basic operations, which are normally used on database tables and spreadsheets and pandas further provides an extended set of functions and methods that allow us to perform these operations efficiently.

Let's move on the the installation of pandas now for which we'll use Anaconda. Make sure you have Anaconda installed in your environment. Now open the Anaconda prompt and do the following:


(base) C:\Users\Python>conda install pandas
Solving environment: done

## Package Plan ##

  environment location: C:\Users\Python\Anaconda3

  added / updated specs:
    - pandas


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.6.8                |           py37_0         1.7 MB

The following packages will be UPDATED:

    conda: 4.5.12-py37_0 --> 4.6.8-py37_0

Proceed ([y]/n)? y


Downloading and Extracting Packages
conda-4.6.8          | 1.7 MB    | #################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(base) C:\Users\Python>


pandas can also be installed by PyPI using this command:

pip install pandas

Once the installation is over the pandas library can run a check after it’s installed to verify the internal controls by use the nose module which needs to be installed (pip install nose) before we start the pandas installation test. We can start the test by entering the following command:

nosetests pandas

The test will take several minutes and in the end it will show a list of any problems encountered. In order to use pandas in programs we need to import the module as

import pandas as pd

or

from pandas import *

the two primary data structures on which all transactions, which are generally made during the analysis of data, are centralized Series and Dataframes. The series constitutes the data structure designed to accommodate a sequence of one-dimensional data, while the dataframe is designed to contain cases with several dimensions.

These data structures provide a valid and robust tool for most applications and many cases of more complex data structures can still be traced to these simple two cases.

In the next post we'll start with the Series data structure. Till then keep learning and practicing Python as Python is easy to learn!






Share:

0 comments:

Post a Comment