Pandas -4 (The dataframe - 2) ~ Python is easy to learn

In the previous post we learned how to access the various elements that make up a dataframe, now we'll follow the same logic to add or change the values in it.

1. Assigning Values

Within the dataframe structure, an array of indexes is specified by the index attribute, and the row containing the name of the columns is specified with the columns attribute. We can also assign a label, using the name attribute, to these two substructures to identify them. See the following program:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'id'
df.columns.name = 'item'
print('The dataframe:\n')
print(df)

The output of the program is shown below:

The dataframe:

item name age designation
id
0       a 20          VP
1       b        27         CEO
2       c        35         CFO
3       d        55          VP
4       e        18          VP
5       f         21         CEO
6       g        35          MD
------------------
(program exited with code: 0)

Press any key to continue . . .

2. Adding a new column

We can add a new column by assigning a value to the instance of the dataframe and specifying a new column name. See the following program:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']='Yes'
print('The dataframe:\n')
print(df)

The output of the program is shown below:

The dataframe:

S   name age designation Employed
No.
0      a      20          VP        Yes
1      b      27         CEO      Yes
2      c      35         CFO       Yes
3      d      55          VP        Yes
4      e      18          VP         Yes
5      f      21         CEO     Yes
6      g      35          MD     Yes
------------------
(program exited with code: 0)

Press any key to continue . . .

We can see from the output, there is a new column called Employed with the value Yes replicated for each of its elements.

3. Updating a column

In order to update the contents of a column we use an array as shown in the following program:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']=['Yes','No','Yes','No','Yes','No','No']
print('The dataframe:\n')
print(df)

The output of the program is shown below:

The dataframe:

S   name age designation Employed
No.
0      a   20          VP      Yes
1      b   27         CEO       No
2      c   35         CFO      Yes
3      d   55          VP       No
4      e   18          VP      Yes
5      f   21         CEO       No
6      g   35          MD       No
------------------
(program exited with code: 0)

Press any key to continue . . .

To update an entire column with a predetermined sequence we can use the np.arange() function as shown in the following program in which we'll create a new column EMPID and assign Ids:

import pandas as pd
import numpy as np

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']=['Yes','No','Yes','No','Yes','No','No']
Ids = pd.Series(np.arange(1,8))
df['EMPID']=Ids
print('The dataframe:\n')
print(df)

The output of the program is shown below:

The dataframe:

S   name age designation Employed EMPID
No.
0      a       20          VP         Yes        1
1      b    27         CEO       No              2
2      c       35         CFO       Yes             3
3      d       55          VP        No              4
4      e       18          VP        Yes              5
5      f       21         CEO       No              6
6      g      35          MD        No             7
------------------
(program exited with code: 0)

Press any key to continue . . .

To change a single value, just select the item and give it the new value. See the following program:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)
df['name'][2]='Veevaeck'
print('The modified dataframe:\n')
print(df)

The output of the program is shown below:

The dataframe:

name age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

The modified dataframe:

       name       age     designation
0         a          20          VP
1         b          27         CEO
2 Veevaeck   35         CFO
3         d    55          VP
4         e         18          VP
5         f          21         CEO
6         g    35          MD
------------------
(program exited with code: 0)

Press any key to continue . . .

As seen in the output we have changed the name of the third item from c to Veevaeck.

4. Determine the membership of dataframe objects

The isin() function can be used with the dataframe objects to check the membership. We get a dataframe containing Boolean values, where True indicates values that meet the membership. If we pass the value returned as a condition, then we’ll get a new dataframe containing only the values that satisfy the condition. See the following program:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)

print('The dataframe:\n')
print(df)
print('\nChecking membership:\n')
print(df.isin(['c','CFO']))
print('\nDataframe containing only the values that satisfy the condition:\n')
print(df[df.isin(['c','CFO'])])

The output of the program is shown below:

The dataframe:

name age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

Checking membership:

    name    age designation
0 False False        False
1 False False        False
2   True False         True
3 False False        False
4 False False        False
5 False False        False
6 False False        False

Dataframe containing only the values that satisfy the condition:

name age designation
0 NaN NaN         NaN
1 NaN NaN         NaN
2    c NaN         CFO
3 NaN NaN         NaN
4 NaN NaN         NaN
5 NaN NaN         NaN
6 NaN NaN         NaN
------------------
(program exited with code: 0)

Press any key to continue . . .

4. Deleting a column

To delete an entire column and all its contents, use the del command as shown in the following program:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)

print('The dataframe:\n')
print(df)

del df['age']

print('\nThe modified dataframe:\n')
print(df)

The output of the program is shown below:

The dataframe:

name age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

The modified dataframe:

name designation
0    a          VP
1    b         CEO
2    c         CFO
3    d          VP
4    e          VP
5    f         CEO
6    g          MD
------------------
(program exited with code: 0)

Press any key to continue . . .

As seen from the output the age column has been deleted from the dataframe.

5. Filtering a column

Over dataframes we can apply the filtering through the application of certain conditions. The following program get all employees whose designation is VP:

import pandas as pd

my_dict = {

     'name' : ["a", "b", "c", "d", "e","f", "g"],

     'age' : [20,27, 35, 55, 18, 21, 35],

     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]

}

df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)

print('\nThe filtered dataframe:\n')
print(df[df['designation']=='VP'])

The output of the program is shown below:

The dataframe:

name age designation
0    a   20          VP
1    b   27         CEO
2    c   35         CFO
3    d   55          VP
4    e   18          VP
5    f   21         CEO
6    g   35          MD

The filtered dataframe:

name age designation
0    a   20          VP
3    d   55          VP
4    e   18          VP
------------------
(program exited with code: 0)

Press any key to continue . . .

5. DataFrame from Nested dict

When a nested dict is passed directly as an argument to the DataFrame() constructor, will be interpreted by pandas to treat external keys as column names and internal keys as labels for the indexes.

During the interpretation of the nested structure, it is possible that not all fields will find a successful match. pandas compensates for this inconsistency by adding the NaN value to missing values.
See the following program:

import pandas as pd

my_dict = {

    'red': { 2012: 22, 2013: 33 },
            'white': { 2011: 13, 2012: 22, 2013: 16},
            'blue': {2011: 17, 2012: 27, 2013: 18}

}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)

The output of the program is shown below:

The dataframe:

       red white blue
2011   NaN     13    17
2012 22.0     22    27
2013 33.0     16    18
------------------
(program exited with code: 0)

Press any key to continue . . .

6. Transposition of a Dataframe

We can get the transposition of the dataframe by adding the T attribute to its application.See the following program:

import pandas as pd

my_dict = {

    'red': { 2012: 22, 2013: 33 },
            'white': { 2011: 13, 2012: 22, 2013: 16},
            'blue': {2011: 17, 2012: 27, 2013: 18}

}

df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)

print('\nTransposed dataframe:\n')
print(df.T)

The output of the program is shown below:

The dataframe:

       red white blue
2011   NaN     13    17
2012 22.0     22    27
2013 33.0     16    18

Transposed dataframe:

       2011 2012 2013
red     NaN 22.0 33.0
white 13.0 22.0 16.0
blue   17.0 27.0 18.0
------------------
(program exited with code: 0)

Press any key to continue . . .

Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!

Python is easy to learn

Sunday, March 31, 2019

Pandas -4 (The dataframe - 2)

0 comments:

Post a Comment