Monday, December 9, 2019

Extracting Information from Data

The .info() command will help you derive information from your data sets. The syntax is as follows:

squad_df.info()

You will have the following output:

<class ‘pandas.core.frame.DataFrame’>
Index: 20 entries, Manchester United to Swansea
Data Columns (total 2 columns):
Position 20 non-null int64
Designation 20 non-null object
dtypes: int64 (1), object (1)
memory usage: 35.7+ KB


The .info() command will deliver all the important information you need about the dataset, including how many non-null values are available, the number of columns and rows, memory used by the DataFrame, and the type of data available in every column.

The dataset you are using might contain missing values in some columns. You will need to learn how to address these, to help in cleaning the data for final presentation.

Why do you need to determine the datatype? 

Without this, you might struggle to interpret data correctly. If, for example, you are using a JSON file but the integers are stored as strings, most of your operations will not work. This is because it is impossible to perform mathematical computations with strings. This is why the .info() is useful. You know the kind of content present in every column.

The .shape attribute can also help you because it delivers the tuple of rows and columns in the dataset. In the example above, you can have it as follows:

squad_df.shape

Your output will be as follows:

(20, 2)

It is also important to remember that there are no parentheses used in the .shape attribute. It basically returns the tuple format for rows and columns. In the example above, we have 20 rows and 2 columns in the squad DataFrame. As you work with different sets of data, you will use the .shape attribute a lot to transform and clean data.

Share:

0 comments:

Post a Comment