Hierarchical Indexing allows us to have multiple levels of indexes on a single axis. It gives a way to work with data in multiple dimensions while continuing to work in a two-dimensional structure.
Let’s make a program which creates a series containing two arrays of indexes,that is, creates a structure with two levels:
import pandas as pd
ser = pd.Series( np.random.rand(8),index=[['white','white','white','blue','blue','red','red','red'],['up','down','right','up','down','up','down','left']])
print('\nOriginal series\n')
print(ser)
print('\nIndex of series\n')
print(ser.index)
The output of the program is shown below:
Original series
white up 0.072994
down 0.934169
right 0.224944
blue up 0.368885
down 0.502154
red up 0.714370
down 0.900322
left 0.626923
dtype: float64
Index of series
MultiIndex(levels=[['blue', 'red', 'white'], ['down', 'left', 'right', 'up']],
codes=[[2, 2, 2, 0, 0, 1, 1, 1], [3, 0, 2, 3, 0, 3, 0, 1]])
------------------
(program exited with code: 0)
Press any key to continue . . .
The output shows a series containing two arrays of indexes, that is, a structure with two levels. Through the specification of hierarchical indexing, selecting subsets of values is simplified. Thus we can select the values for a given value of the first index as shown in the following program:
import pandas as pd
import numpy as np
ser = pd.Series( np.random.rand(8),index=[['white','white','white','blue','blue','red','red','red'],['up','down','right','up','down','up','down','left']])
print('\nValues for a given value of the first index\n')
print(ser['white'])
print('\nValues for a given value of the second index\n')
print(ser[:,'up'])
print('\nA specific value by specifying both indexes\n')
print(ser['white','up'])
The output of the program is shown below:
Values for a given value of the first index
up 0.829193
down 0.066195
right 0.403016
dtype: float64
Values for a given value of the second index
white 0.829193
blue 0.148104
red 0.558666
dtype: float64
A specific value by specifying both indexes
0.8291926932314726
------------------
(program exited with code: 0)
Press any key to continue . . .
Hierarchical indexing plays a critical role in reshaping data and group-based operations such as a pivot-table. In the following program we'll use the unstack() and stack() functions. The unstack() function converts the series with a hierarchical index to a simple dataframe, where the second set of indexes is converted into a new set of columns. To perform the reverse operation, which is to convert a dataframe to a series, we use the stack() function. See the following program:
import pandas as pd
import numpy as np
ser = pd.Series( np.random.rand(8),index=[['white','white','white','blue','blue','red','red','red'],['up','down','right','up','down','up','down','left']])
f1=ser.unstack()
print('\nConverting the series with a hierarchical index to a simple dataframe\n')
print(f1)
print('\nConverting a dataframe to a series\n')
print(f1.stack())
The output of the program is shown below:
Converting the series with a hierarchical index to a simple dataframe
down left right up
blue 0.656229 NaN NaN 0.063722
red 0.828567 0.735408 NaN 0.118687
white 0.938660 NaN 0.120901 0.687507
Converting a dataframe to a series
blue down 0.656229
up 0.063722
red down 0.828567
left 0.735408
up 0.118687
white down 0.938660
right 0.120901
up 0.687507
dtype: float64
------------------
(program exited with code: 0)
Press any key to continue . . .
With the dataframe we can define a hierarchical index both for the rows and for the columns. To do so, at the time the dataframe is declared, define an array of arrays for the index and columns options as shown in the following program:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
print('\nThe dataframe\n')
print(frame1)
The output of the program is shown below:
The dataframe
ball pen pencil paper
1 2 1 2
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
------------------
(program exited with code: 0)
Press any key to continue . . .
Sometimes it is required to rearrange the order of the levels on an axis or sort for values at a specific level. This is done using the swaplevel() function which accepts as arguments the names assigned to the two levels that we want to interchange and returns a new object with the two levels interchanged between them, while leaving the data unmodified. The following program shows how to use this function:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nUsing swaplevel()\n')
print(frame1.swaplevel('colors','status'))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
Using swaplevel()
objects ball pen pencil paper
id 1 2 1 2
status colors
up red 0 1 2 3
down blue 4 5 6 7
up yellow 8 9 10 11
down white 12 13 14 15
------------------
(program exited with code: 0)
Press any key to continue . . .
There is this sort_index() function which orders the data considering only those of a certain level by specifying it as parameter. This is used in the following program:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nUsing sort_index()\n')
print(frame1.sort_index(level='colors'))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
Using sort_index()
objects ball pen pencil paper
id 1 2 1 2
colors status
blue down 4 5 6 7
red up 0 1 2 3
white down 12 13 14 15
yellow up 8 9 10 11
------------------
(program exited with code: 0)
Press any key to continue . . .
Many descriptive statistics and summary statistics performed on a dataframe or on a series have a level option, with which we can determine at what level the descriptive and summary statistics should be determined.
In the following program we'll create a summary statistic at row level for which we have to simply specify the level option with the level name (level='colors'):
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nSummary Statistic by Level\n')
print(frame1.sum(level='colors'))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
Summary Statistic by Level
objects ball pen pencil paper
id 1 2 1 2
colors
red 0 1 2 3
blue 4 5 6 7
yellow 8 9 10 11
white 12 13 14 15
------------------
(program exited with code: 0)
Press any key to continue . . .
In the next program we create a statistic for a given level of the column, the id, here we must specify the second axis as an argument through the axis option set to 1.
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nA statistic for a level=id\n')
print(frame1.sum(level='id', axis=1))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
A statistic for a level=id
id 1 2
colors status
red up 2 4
blue down 10 12
yellow up 18 20
white down 26 28
------------------
(program exited with code: 0)
Press any key to continue . . .
Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Let’s make a program which creates a series containing two arrays of indexes,that is, creates a structure with two levels:
import pandas as pd
ser = pd.Series( np.random.rand(8),index=[['white','white','white','blue','blue','red','red','red'],['up','down','right','up','down','up','down','left']])
print('\nOriginal series\n')
print(ser)
print('\nIndex of series\n')
print(ser.index)
The output of the program is shown below:
Original series
white up 0.072994
down 0.934169
right 0.224944
blue up 0.368885
down 0.502154
red up 0.714370
down 0.900322
left 0.626923
dtype: float64
Index of series
MultiIndex(levels=[['blue', 'red', 'white'], ['down', 'left', 'right', 'up']],
codes=[[2, 2, 2, 0, 0, 1, 1, 1], [3, 0, 2, 3, 0, 3, 0, 1]])
------------------
(program exited with code: 0)
Press any key to continue . . .
The output shows a series containing two arrays of indexes, that is, a structure with two levels. Through the specification of hierarchical indexing, selecting subsets of values is simplified. Thus we can select the values for a given value of the first index as shown in the following program:
import pandas as pd
import numpy as np
ser = pd.Series( np.random.rand(8),index=[['white','white','white','blue','blue','red','red','red'],['up','down','right','up','down','up','down','left']])
print('\nValues for a given value of the first index\n')
print(ser['white'])
print('\nValues for a given value of the second index\n')
print(ser[:,'up'])
print('\nA specific value by specifying both indexes\n')
print(ser['white','up'])
The output of the program is shown below:
Values for a given value of the first index
up 0.829193
down 0.066195
right 0.403016
dtype: float64
Values for a given value of the second index
white 0.829193
blue 0.148104
red 0.558666
dtype: float64
A specific value by specifying both indexes
0.8291926932314726
------------------
(program exited with code: 0)
Press any key to continue . . .
Hierarchical indexing plays a critical role in reshaping data and group-based operations such as a pivot-table. In the following program we'll use the unstack() and stack() functions. The unstack() function converts the series with a hierarchical index to a simple dataframe, where the second set of indexes is converted into a new set of columns. To perform the reverse operation, which is to convert a dataframe to a series, we use the stack() function. See the following program:
import pandas as pd
import numpy as np
ser = pd.Series( np.random.rand(8),index=[['white','white','white','blue','blue','red','red','red'],['up','down','right','up','down','up','down','left']])
f1=ser.unstack()
print('\nConverting the series with a hierarchical index to a simple dataframe\n')
print(f1)
print('\nConverting a dataframe to a series\n')
print(f1.stack())
The output of the program is shown below:
Converting the series with a hierarchical index to a simple dataframe
down left right up
blue 0.656229 NaN NaN 0.063722
red 0.828567 0.735408 NaN 0.118687
white 0.938660 NaN 0.120901 0.687507
Converting a dataframe to a series
blue down 0.656229
up 0.063722
red down 0.828567
left 0.735408
up 0.118687
white down 0.938660
right 0.120901
up 0.687507
dtype: float64
------------------
(program exited with code: 0)
Press any key to continue . . .
With the dataframe we can define a hierarchical index both for the rows and for the columns. To do so, at the time the dataframe is declared, define an array of arrays for the index and columns options as shown in the following program:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
print('\nThe dataframe\n')
print(frame1)
The output of the program is shown below:
The dataframe
ball pen pencil paper
1 2 1 2
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
------------------
(program exited with code: 0)
Press any key to continue . . .
Sometimes it is required to rearrange the order of the levels on an axis or sort for values at a specific level. This is done using the swaplevel() function which accepts as arguments the names assigned to the two levels that we want to interchange and returns a new object with the two levels interchanged between them, while leaving the data unmodified. The following program shows how to use this function:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nUsing swaplevel()\n')
print(frame1.swaplevel('colors','status'))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
Using swaplevel()
objects ball pen pencil paper
id 1 2 1 2
status colors
up red 0 1 2 3
down blue 4 5 6 7
up yellow 8 9 10 11
down white 12 13 14 15
------------------
(program exited with code: 0)
Press any key to continue . . .
There is this sort_index() function which orders the data considering only those of a certain level by specifying it as parameter. This is used in the following program:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nUsing sort_index()\n')
print(frame1.sort_index(level='colors'))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
Using sort_index()
objects ball pen pencil paper
id 1 2 1 2
colors status
blue down 4 5 6 7
red up 0 1 2 3
white down 12 13 14 15
yellow up 8 9 10 11
------------------
(program exited with code: 0)
Press any key to continue . . .
Many descriptive statistics and summary statistics performed on a dataframe or on a series have a level option, with which we can determine at what level the descriptive and summary statistics should be determined.
In the following program we'll create a summary statistic at row level for which we have to simply specify the level option with the level name (level='colors'):
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nSummary Statistic by Level\n')
print(frame1.sum(level='colors'))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
Summary Statistic by Level
objects ball pen pencil paper
id 1 2 1 2
colors
red 0 1 2 3
blue 4 5 6 7
yellow 8 9 10 11
white 12 13 14 15
------------------
(program exited with code: 0)
Press any key to continue . . .
In the next program we create a statistic for a given level of the column, the id, here we must specify the second axis as an argument through the axis option set to 1.
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=[['red','blue','yellow','white'],['up','down','up','down']],
columns=[['ball','pen','pencil','paper'],[1,2,1,2]])
frame1.columns.names = ['objects','id']
frame1.index.names = ['colors','status']
print('\nThe dataframe\n')
print(frame1)
print('\nA statistic for a level=id\n')
print(frame1.sum(level='id', axis=1))
The output of the program is shown below:
The dataframe
objects ball pen pencil paper
id 1 2 1 2
colors status
red up 0 1 2 3
blue down 4 5 6 7
yellow up 8 9 10 11
white down 12 13 14 15
A statistic for a level=id
id 1 2
colors status
red up 2 4
blue down 10 12
yellow up 18 20
white down 26 28
------------------
(program exited with code: 0)
Press any key to continue . . .
Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
0 comments:
Post a Comment