Monday, April 1, 2019

Pandas - 5 (The Index Objects)


The majority of excellent characteristics of series and the dataframe are due to the presence of an Index object that’s integrated in these data structures. The Index objects are responsible for the labels on the axes and other metadata as the name of the axes. We have already seen in the post

https://pythoniseasytolearn.blogspot.com/2019/03/pandas-2-series-i.html

how an array containing labels is converted into an Index object and that you need to specify the index option in the constructor. Just to refresh the following program shows how we convert an array containing labels into an Index object:

import pandas as pd

ser = pd.Series([5,0,3,8,4], index=['red','blue','yellow','white','green'])
print(ser.index)


The output of the program is shown below:

Index(['red', 'blue', 'yellow', 'white', 'green'], dtype='object')
------------------
(program exited with code: 0)

Press any key to continue . . .


Remember that the Index objects are immutable. Once declared, they cannot be changed. This ensures their secure sharing between the various data structures. Each Index object has a number of methods and properties that are useful when you need to know the values they contain.

1.  Methods on Index

There are some specific methods for indexes available to get some information about indexes from a data structure. For example, idmin() and idmax() are two functions that return, respectively, the index with the lowest value and the index with the highest value. See the following program:

import pandas as pd

ser = pd.Series([5,0,3,8,4], index=['red','blue','yellow','white','green'])
print(ser.index)

print('\nThe index with the lowest value:\n')
print (ser.idxmin())
print('\nThe index with the highest value:\n')
print (ser.idxmax())


The output of the program is shown below:

Index(['red', 'blue', 'yellow', 'white', 'green'], dtype='object')

The index with the lowest value:

blue

The index with the highest value:

white
------------------
(program exited with code: 0)

Press any key to continue . . .



2.  Index with Duplicate Labels

Usually indexes within a single data structure have a unique label. Although many functions require this condition to run, this condition is not mandatory on the data structures of pandas. If there are more values in correspondence of the same label, you will get a series in place of a single element. See the following program:

import pandas as pd

ser = pd.Series(range(6), index=['white','white','blue','green','green','yellow'])
print(ser)

print('\nThe index with white label:\n')
print (ser['white'])
print('\nThe index with green label:\n')
print (ser['green'])


The output of the program is shown below:

white     0
white     1
blue      2
green     3
green     4
yellow    5
dtype: int64

The index with white label:

white    0
white    1
dtype: int64

The index with green label:

green    3
green    4
dtype: int64
------------------
(program exited with code: 0)

Press any key to continue . . .



The same logic applies to the dataframe, with duplicate indexes that will return the dataframe. With small data structures, it is easy to identify any duplicate indexes, but if the structure becomes gradually larger, this starts to become difficult. For such cases pandas provides us with the is_unique attribute belonging to the Index objects. This attribute will tell you if there are indexes with duplicate labels inside the structure data (both series and dataframe). See the following program:

import pandas as pd

ser = pd.Series(range(6), index=['white','white','blue','green','green','yellow'])
print(ser)
print('\n')
print('\nThe indexes with duplicate labels inside series:\n')
print (ser.index.is_unique)
print('\n')
my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)

print(df)

print('\nThe indexes with duplicate labels inside dataframe:\n')
print (df.index.is_unique)



The output of the program is shown below:

white     0
white     1
blue      2
green     3
green     4
yellow    5
dtype: int64

The indexes with duplicate labels inside series:

False


  name  age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

The indexes with duplicate labels inside dataframe:

True
------------------
(program exited with code: 0)

Press any key to continue . . .



Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment