Monday, December 16, 2019

Describing Variables

There is so much more information you can get from your DataFrames. A summary of the continuous variables can be arrived at using the following syntax:

squad_df.describe()

This will return information about continuous numbers. This information is useful when you are uncertain about the kind of plot diagram to use for visual representation. .describe() is a useful attribute because it returns the number of rows, categories, and frequency of the top category about a specific column.


squad_df['position'].describe()

The syntax above will return an output in the following format:

count xx
unique xx
top xx
freq xx
Name: genre, dtype: object


What we can deduce from this output is that the selected column contains xx number of unique values, the top value in that column, and the fact that the top column shows up xx number of times (freq) . To determine the frequency of all the values in the position column, you use the syntax below:

squad_df['position'].value_counts().head(10)

You can also find out the relationship between different continuous variables using the .corr() syntax as shown below:

squad_df.corr()

The output is a correlation table that represents different relationships in your dataset. You will notice positive and negative values in the output table. Positive results show a positive correlation between the variables. This means that one variable rises as the other rises and vice versa. Negative results show an inverse correlation between the variables. This means that one variable will rise as the other falls. A perfect correlation is represented by 1.0. A perfect correlation is obvious for each column with itself.
Share:

0 comments:

Post a Comment