In the previous post we learned how to access the various elements that make up a dataframe, now we'll follow the same logic to add or change the values in it.
1. Assigning Values
Within the dataframe structure, an array of indexes is specified by the index attribute, and the row containing the name of the columns is specified with the columns attribute. We can also assign a label, using the name attribute, to these two substructures to identify them. See the following program:
import pandas as pd
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
df.index.name = 'id'
df.columns.name = 'item'
print('The dataframe:\n')
print(df)
The output of the program is shown below:
The dataframe:
item name age designation
id
0 a 20 VP
1 b 27 CEO
2 c 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
------------------
(program exited with code: 0)
Press any key to continue . . .
2. Adding a new column
We can add a new column by assigning a value to the instance of the dataframe and specifying a new column name. See the following program:
import pandas as pd
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']='Yes'
print('The dataframe:\n')
print(df)
The output of the program is shown below:
The dataframe:
S name age designation Employed
No.
0 a 20 VP Yes
1 b 27 CEO Yes
2 c 35 CFO Yes
3 d 55 VP Yes
4 e 18 VP Yes
5 f 21 CEO Yes
6 g 35 MD Yes
------------------
(program exited with code: 0)
Press any key to continue . . .
We can see from the output, there is a new column called Employed with the value Yes replicated for each of its elements.
3. Updating a column
In order to update the contents of a column we use an array as shown in the following program:
import pandas as pd
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']=['Yes','No','Yes','No','Yes','No','No']
print('The dataframe:\n')
print(df)
The output of the program is shown below:
The dataframe:
S name age designation Employed
No.
0 a 20 VP Yes
1 b 27 CEO No
2 c 35 CFO Yes
3 d 55 VP No
4 e 18 VP Yes
5 f 21 CEO No
6 g 35 MD No
------------------
(program exited with code: 0)
Press any key to continue . . .
To update an entire column with a predetermined sequence we can use the np.arange() function as shown in the following program in which we'll create a new column EMPID and assign Ids:
import pandas as pd
import numpy as np
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
df.index.name = 'No.'
df.columns.name = 'S'
df['Employed']=['Yes','No','Yes','No','Yes','No','No']
Ids = pd.Series(np.arange(1,8))
df['EMPID']=Ids
print('The dataframe:\n')
print(df)
The output of the program is shown below:
The dataframe:
S name age designation Employed EMPID
No.
0 a 20 VP Yes 1
1 b 27 CEO No 2
2 c 35 CFO Yes 3
3 d 55 VP No 4
4 e 18 VP Yes 5
5 f 21 CEO No 6
6 g 35 MD No 7
------------------
(program exited with code: 0)
Press any key to continue . . .
To change a single value, just select the item and give it the new value. See the following program:
import pandas as pd
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)
df['name'][2]='Veevaeck'
print('The modified dataframe:\n')
print(df)
The output of the program is shown below:
The dataframe:
name age designation
0 a 20 VP
1 b 27 CEO
2 c 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
The modified dataframe:
name age designation
0 a 20 VP
1 b 27 CEO
2 Veevaeck 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
------------------
(program exited with code: 0)
Press any key to continue . . .
As seen in the output we have changed the name of the third item from c to Veevaeck.
4. Determine the membership of dataframe objects
The isin() function can be used with the dataframe objects to check the membership. We get a dataframe containing Boolean values, where True indicates values that meet the membership. If we pass the value returned as a condition, then we’ll get a new dataframe containing only the values that satisfy the condition. See the following program:
import pandas as pd
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)
print('\nChecking membership:\n')
print(df.isin(['c','CFO']))
print('\nDataframe containing only the values that satisfy the condition:\n')
print(df[df.isin(['c','CFO'])])
The output of the program is shown below:
The dataframe:
name age designation
0 a 20 VP
1 b 27 CEO
2 c 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
Checking membership:
name age designation
0 False False False
1 False False False
2 True False True
3 False False False
4 False False False
5 False False False
6 False False False
Dataframe containing only the values that satisfy the condition:
name age designation
0 NaN NaN NaN
1 NaN NaN NaN
2 c NaN CFO
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
6 NaN NaN NaN
------------------
(program exited with code: 0)
Press any key to continue . . .
4. Deleting a column
To delete an entire column and all its contents, use the del command as shown in the following program:
import pandas as pd
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)
del df['age']
print('\nThe modified dataframe:\n')
print(df)
The output of the program is shown below:
The dataframe:
name age designation
0 a 20 VP
1 b 27 CEO
2 c 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
The modified dataframe:
name designation
0 a VP
1 b CEO
2 c CFO
3 d VP
4 e VP
5 f CEO
6 g MD
------------------
(program exited with code: 0)
Press any key to continue . . .
As seen from the output the age column has been deleted from the dataframe.
5. Filtering a column
Over dataframes we can apply the filtering through the application of certain conditions. The following program get all employees whose designation is VP:
import pandas as pd
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)
print('\nThe filtered dataframe:\n')
print(df[df['designation']=='VP'])
The output of the program is shown below:
The dataframe:
name age designation
0 a 20 VP
1 b 27 CEO
2 c 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
The filtered dataframe:
name age designation
0 a 20 VP
3 d 55 VP
4 e 18 VP
------------------
(program exited with code: 0)
Press any key to continue . . .
5. DataFrame from Nested dict
When a nested dict is passed directly as an argument to the DataFrame() constructor, will be interpreted by pandas to treat external keys as column names and internal keys as labels for the indexes.
During the interpretation of the nested structure, it is possible that not all fields will find a successful match. pandas compensates for this inconsistency by adding the NaN value to missing values.
See the following program:
import pandas as pd
my_dict = {
'red': { 2012: 22, 2013: 33 },
'white': { 2011: 13, 2012: 22, 2013: 16},
'blue': {2011: 17, 2012: 27, 2013: 18}
}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)
The output of the program is shown below:
The dataframe:
red white blue
2011 NaN 13 17
2012 22.0 22 27
2013 33.0 16 18
------------------
(program exited with code: 0)
Press any key to continue . . .
6. Transposition of a Dataframe
We can get the transposition of the dataframe by adding the T attribute to its application.See the following program:
import pandas as pd
my_dict = {
'red': { 2012: 22, 2013: 33 },
'white': { 2011: 13, 2012: 22, 2013: 16},
'blue': {2011: 17, 2012: 27, 2013: 18}
}
df = pd.DataFrame(my_dict)
print('The dataframe:\n')
print(df)
print('\nTransposed dataframe:\n')
print(df.T)
The output of the program is shown below:
The dataframe:
red white blue
2011 NaN 13 17
2012 22.0 22 27
2013 33.0 16 18
Transposed dataframe:
2011 2012 2013
red NaN 22.0 33.0
white 13.0 22.0 16.0
blue 17.0 27.0 18.0
------------------
(program exited with code: 0)
Press any key to continue . . .
Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
0 comments:
Post a Comment