Wednesday, December 11, 2019

Cleaning Data in a Column

We often come across datasets that have varying names for their columns, encounter typos, spaces, and a mixture of upper and lower-case words.

Cleaning up these columns will make it easier for you to choose the correct column for your computations.


In the example shown in the previous post, the syntax below will help us print the column names:


squad_df.columns

You will have the following output:

Index ([‘Position’, ‘Designation’])

Once you have this information, you can use a simple command .rename() to rename some or all the columns in your data. Since we do not need to use any parentheses, we will rename the content as follows:

Assuming the Designation Column was named Designation (Next Season), you would have it renamed as follows

squad_df.rename(columns={
‘Designation (Next Season)': 'Designation_next_season',
}, inplace=True)
squad_df.columns

Our output would look like this:

Index ([‘Position’, ‘Designation_next_season’])

You can also use the same process to change the column content from upper to lower case without having to enter all the connotations individually. A list comprehension will help you instead of manually changing the name of each item on the column list as shown below:

squad_df.columns = [col.lower() for col in squad_df]
squad_df.columns

You will have the following output:

Index ([‘position’, ‘designation_next_season’])

Over time, you will use a lot of dict and list attributes in Pandas. To make your work easier, it is advisable to do away with special characters and use lower case connotations instead. You should also use underscores instead of spaces.

Share:

0 comments:

Post a Comment