Sunday, March 31, 2019

Pandas -4 (The dataframe - 2)


In the previous post we learned how to access the various elements that make up a dataframe, now we'll follow the same logic to add or change the values in it. 

1. Assigning Values

Within the dataframe structure, an array of indexes is specified by the index attribute, and the row containing the name of the columns is specified with the columns attribute. We can also assign a label, using the name attribute, to these two substructures to identify them. See the following program: 

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'id'
df.columns.name = 'item'
print('The dataframe:\n')
print(df)


The output of the program is shown below:

The dataframe:

item name  age designation
id
0       a        20          VP
1       b        27         CEO
2       c        35         CFO
3       d        55          VP
4       e        18          VP
5       f         21         CEO
6       g        35          MD
------------------
(program exited with code: 0)

Press any key to continue . . .


2. Adding a new column

We can add a new column by assigning a value to the instance of the dataframe and specifying a new column name. See the following program: 

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']='Yes'
print('The dataframe:\n')
print(df)


The output of the program is shown below:

The dataframe:

S   name  age designation Employed
No.
0      a      20          VP        Yes
1      b      27         CEO      Yes
2      c      35         CFO       Yes
3      d      55          VP        Yes
4      e      18          VP         Yes
5      f      21         CEO       Yes
6      g      35          MD       Yes
------------------
(program exited with code: 0)

Press any key to continue . . .


We can see from the output, there is a new column called Employed with the value Yes replicated for each of its elements. 


3. Updating a column


In order to update the contents of a column we use an array as shown in the following program: 

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']=['Yes','No','Yes','No','Yes','No','No']
print('The dataframe:\n')
print(df)


The output of the program is shown below:

The dataframe:

S   name  age designation Employed
No.
0      a   20          VP      Yes
1      b   27         CEO       No
2      c   35         CFO      Yes
3      d   55          VP       No
4      e   18          VP      Yes
5      f   21         CEO       No
6      g   35          MD       No
------------------
(program exited with code: 0)

Press any key to continue . . . 


To update an entire column with a predetermined sequence we can use the np.arange() function as shown in the following program in which we'll create a new column EMPID and assign Ids: 

import pandas as pd
import numpy as np


my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']=['Yes','No','Yes','No','Yes','No','No']
Ids = pd.Series(np.arange(1,8))
df['EMPID']=Ids
print('The dataframe:\n')
print(df)


The output of the program is shown below:


The dataframe:

S   name  age designation Employed  EMPID
No.
0      a       20          VP         Yes              1
1      b       27         CEO       No              2
2      c       35         CFO       Yes             3
3      d       55          VP        No              4
4      e       18          VP        Yes              5
5      f       21         CEO       No              6
6      g      35          MD        No             7
------------------
(program exited with code: 0)

Press any key to continue . . .


To change a single value, just select the item and give it the new value. See the following program:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)

print('The dataframe:\n')
print(df)
df['name'][2]='Veevaeck'
print('The modified dataframe:\n')
print(df)


The output of the program is shown below:

The dataframe:

  name  age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

The modified dataframe:

       name       age     designation
0         a          20          VP
1         b          27         CEO
Veevaeck   35         CFO
3         d         55          VP
4         e         18          VP
5         f          21         CEO
6         g         35          MD
------------------
(program exited with code: 0)

Press any key to continue . . .


As seen in the output we have changed the name of the third item from c to Veevaeck

4. Determine the membership of dataframe objects

The isin() function can be used with the dataframe objects to check the membership. We get a dataframe containing Boolean values, where True indicates values that meet the membership. If we  pass the value returned as a condition, then we’ll get a new dataframe containing only the values that satisfy the condition. See the following program: 

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}


df = pd.DataFrame(my_dict)

print('The dataframe:\n')
print(df)
print('\nChecking membership:\n')
print(df.isin(['c','CFO']))
print('\nDataframe containing only the values that satisfy the condition:\n')
print(df[df.isin(['c','CFO'])])


The output of the program is shown below: 

The dataframe:

  name  age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

Checking membership:

    name    age  designation
0  False  False        False
1  False  False        False
2   True  False         True
3  False  False        False
4  False  False        False
5  False  False        False
6  False  False        False

Dataframe containing only the values that satisfy the condition:


  name  age designation
0  NaN  NaN         NaN
1  NaN  NaN         NaN
2    c  NaN         CFO
3  NaN  NaN         NaN
4  NaN  NaN         NaN
5  NaN  NaN         NaN
6  NaN  NaN         NaN
------------------
(program exited with code: 0)

Press any key to continue . . .


4. Deleting a column

To delete an entire column and all its contents, use the del command as shown in the following program: 

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}


df = pd.DataFrame(my_dict)

print('The dataframe:\n')
print(df)

del df['age']

print('\nThe modified dataframe:\n')
print(df)

 
The output of the program is shown below:

The dataframe:

  name  age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

The modified dataframe:

  name designation
0    a          VP
1    b         CEO
2    c         CFO
3    d          VP
4    e          VP
5    f         CEO
6    g          MD
------------------
(program exited with code: 0)

Press any key to continue . . .


As seen from the output the age column has been deleted from the dataframe.

5. Filtering a column

Over dataframes we can apply the filtering through the application of certain conditions. The following program get all employees whose designation is VP:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}


df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)

print('\nThe filtered dataframe:\n')
print(df[df['designation']=='VP'])


The output of the program is shown below: 

The dataframe:

  name  age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

The filtered dataframe:

  name  age designation
0    a   20          VP
3    d   55          VP
4    e   18          VP
------------------
(program exited with code: 0)

Press any key to continue . . .



5. DataFrame from Nested dict

When a nested dict is passed directly as an argument to the DataFrame() constructor, will be interpreted by pandas to treat external keys as column names and internal keys as labels for the indexes.

During the interpretation of the nested structure, it is possible that not all fields will find a successful match. pandas compensates for this inconsistency by adding the NaN value to missing values.

See the following program: 

import pandas as pd

my_dict = {

    'red': { 2012: 22, 2013: 33 },
            'white': { 2011: 13, 2012: 22, 2013: 16},
            'blue': {2011: 17, 2012: 27, 2013: 18}

}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df) 



The output of the program is shown below: 

The dataframe:

       red  white  blue
2011   NaN     13    17
2012  22.0     22    27
2013  33.0     16    18
------------------
(program exited with code: 0)

Press any key to continue . . . 


6. Transposition of a Dataframe

We can get the transposition of the dataframe by adding the T attribute to its application.See the following program: 

import pandas as pd

my_dict = {

    'red': { 2012: 22, 2013: 33 },
            'white': { 2011: 13, 2012: 22, 2013: 16},
            'blue': {2011: 17, 2012: 27, 2013: 18}

}


df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)

print('\nTransposed dataframe:\n')
print(df.T)


The output of the program is shown below: 

The dataframe:

       red  white  blue
2011   NaN     13    17
2012  22.0     22    27
2013  33.0     16    18

Transposed dataframe:

       2011  2012  2013
red     NaN  22.0  33.0
white  13.0  22.0  16.0
blue   17.0  27.0  18.0
------------------
(program exited with code: 0)

Press any key to continue . . .


Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share: