A DataFrame is an enhanced two-dimensional array. Like Series, DataFrames can have custom row and column indices, and offer additional operations and capabilities that make them more convenient for many data-science oriented tasks. DataFrames also support missing data. Each column in a DataFrame is a Series. The Series representing each column may contain different element types, as you’ll soon see when we discuss loading datasets into DataFrames.
Let’s create a DataFrame from a dictionary that represents student grades on three exams:
In [1]: import pandas as pd
In [2]: grades_dict = {'Wally': [87, 96, 70],
'Eva': [100, 87, 90],
...: 'Sam': [94, 77, 90], 'Katie': [100, 81,
82],
...: 'Bob': [83, 65, 85]}
...:
In [3]: grades = pd.DataFrame(grades_dict)
In [4]: grades
Out[4]:
Wally Eva Sam Katie Bob
0 87 100 94 100 83
1 96 87 77 81 65
2 70 90 90 82 85
Pandas displays DataFrames in tabular format with the indices left aligned in the index column and the remaining columns’ values right aligned. The dictionary’s keys become the column names and the values associated with each key become the element values in the corresponding column. Shortly, we’ll show how to “flip” the rows and columns. By default, the row indices are auto-generated integers starting from 0.
We could have specified custom indices with the index keyword argument when we created the DataFrame, as in:
pd.DataFrame(grades_dict, index=['Test1', 'Test2', 'Test3'])
Let’s use the index attribute to change the DataFrame’s indices from sequential integers to labels:
In [5]: grades.index = ['Test1', 'Test2','Test3']
In [6]: grades
Out[6]:
Wally Eva Sam Katie Bob
Test1 87 100 94 100 83
Test2 96 87 77 81 65
Test3 70 90 90 82 85
When specifying the indices, you must provide a one dimensional collection that has the same number of elements as there are rows in the DataFrame; otherwise, a ValueError occurs. Series also provides an index attribute for changing an existing Series’ indices.
One benefit of pandas is that you can quickly and conveniently look at your data in many different ways, including selecting portions of the data. Let’s start by getting Eva’s grades by name, which displays her column as a Series:
In [7]: grades['Eva']
Out[7]:
Test1 100
Test2 87
Test3 90
Name: Eva, dtype: int64
If a DataFrame’s column-name strings are valid Python identifiers, you can use them as attributes. Let’s get Sam’s grades with the Sam attribute:
In [8]: grades.Sam
Out[8]:
Test1 94
Test2 77
Test3 90
Name: Sam, dtype: int64
The next post will focus on Selecting Rows via the loc and iloc Attributes.
0 comments:
Post a Comment