Monday, August 10, 2020

Subsetting Columns

Subset ,filter, Select multiple rows and columns from a pandas ...

If we want to use these techniques to just subset columns, we must use Python’s slicing syntax. We
need to do this because if we are subsetting columns, we are getting all the rows for the specified column. So, we need a method to capture all the rows.

The Python slicing syntax uses a colon, :. If we have just a colon, the attribute refers to everything. So, if we just want to get the first column using the loc or iloc syntax, we can write something like df.loc[:,
[columns]] to subset the column(s).

subset = df.loc[:, ['year', 'pop']]
print(subset.head())

 

subset = df.iloc[:, [2, 4, -1]]
print(subset.head())

 

What if we don’t specify loc and iloc correctly? If I try the code shown below, can you predict the output?

subset = df.loc[:, [2, 4, -1]]
print(subset.head())

subset = df.iloc[:, ['year', 'pop']]
print(subset.head())

Yes, we will get an error if we don’t specify loc and iloc correctly.

subset = df.loc[:, [2, 4, -1]]
print(subset.head())

Output:

Traceback (most recent call last):
File "<ipython-input-1-719bcb04e3c1>", line 2, in <module>
subset = df.loc[:, [2, 4, -1]]
KeyError: 'None of [[2, 4, -1]] are in the [columns]'

subset = df.iloc[:, ['year', 'pop']]
print(subset.head())

Output:

Traceback (most recent call last):
File "<ipython-input-1-43f52fceab49>", line 2, in <module>
subset = df.iloc[:, ['year', 'pop']]
TypeError: cannot perform reduce with flexible type

Subsetting Columns by Range

We can use the built-in range function to create a range of values in Python. This way you can specify
beginning and end values, and Python will automatically create a range of values in between. By default, every value between the beginning and the end will be created, unless you specify a step. If
you are using Python 2, the range function returns a list, and the xrange function returns a
generator.

We already say that we subset columns using a list of integers. Since range returns a generator, we have to convert the generator to a list first. Note that when range(5) is called, five integers are returned: 0 – 4.

# create a range of integers from 0 to 4 inclusive
small_range = list(range(5))
print(small_range) 

Output - [0, 1, 2, 3, 4]
 

# subset the dataframe with the range
subset = df.iloc[:, small_range]
print(subset.head())

Output-

 

# create a range from 3 to 5 inclusive
small_range = list(range(3, 6))
print(small_range)
 

Output- [3, 4, 5] 

subset = df.iloc[:, small_range]
print(subset.head()) 

 

Please note that the values are specified in a way such that the range is inclusive on the left, and exclusive on the right. 

# create a range from 0 to 5 inclusive, every other integer
small_range = list(range(0, 6, 2))
subset = df.iloc[:, small_range]
print(subset.head())

Output-

In the next post we'll discuss about Slicing Columns .


Share:

0 comments:

Post a Comment