Now that we’re able to load a simple data file, we want to be able to inspect its contents. We could print out the contents of the dataframe, but with today’s data, there are often too many cells to make sense of all the printed information. Instead, the best way to look at our data is to inspect it in parts by looking at various subsets of the data. We already saw that we can use the head method of a dataframe to look at the first five rows of our data. This is useful to see if our data loaded properly and to get a sense of each of the columns, its name, and its contents. Sometimes, however, we may want to see only particular rows, columns, or values from our data.
Subsetting Columns
If we want to examine multiple columns, we can specify them by names, positions, or ranges.
1. If we want only a specific column from our data, we can access the data using square brackets. See the following program:
# just get the country column and save it to its own variable
country_df = df['country']
# show the first 5 observations
print(country_df.head())
Output:
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
Name: country, dtype: object
# show the last 5 observations
print(country_df.tail())
Output:
1699 Zimbabwe
1700 Zimbabwe
1701 Zimbabwe
1702 Zimbabwe
1703 Zimbabwe
Name: country, dtype: object
2. To specify multiple columns by the column name, we need to pass in a Python list between the square brackets. This may look a bit strange since there will be two sets of square brackets. See the following program:
# Looking at country, continent, and year
subset = df[['country', 'continent', 'year']]
print(subset.head())
Output:
country continent year
0 Afghanistan Asia 1952
1 Afghanistan Asia 1957
2 Afghanistan Asia 1962
3 Afghanistan Asia 1967
4 Afghanistan Asia 1972
print(subset.tail())
Output:
country continent year
1699 Zimbabwe Africa 1987
1700 Zimbabwe Africa 1992
1701 Zimbabwe Africa 1997
1702 Zimbabwe Africa 2002
1703 Zimbabwe Africa 2007
We can opt to print the entire subset dataframe. I won’t use this option here, as it would take up an unnecessary amount of space, but feel free to try at your end.
In the next post we'll deal with Subsetting Columns by Index Position.
0 comments:
Post a Comment