Tuesday, April 16, 2019

Pandas - 16 (Reading and Writing Data on Microsoft Excel Files)

pandas provides specific functions for Excel spreadsheet format. We have seen that the I/O API provides two functions to this purpose:

• to_excel()
• read_excel()

The read_excel() function can read Excel 2003 (.xls) files and Excel 2007 (.xlsx) files. This is possible due to the integration of the internal module xlrd. You may have to install xlrd module before using these functions (pip install xlrd)

Let's create an excel document in our working directory and the content in sheet 1 and sheet 2 should be:

sheet 1


white
redgreenblack
a12231718
b22161918
c14232221

sheet 2


yellow
purpleblueorange
A11164422
B20222344
C30313732

To read the data contained in the XLS file and convert it into a dataframe, we only have to use the read_excel() function. See the following program :

import pandas as pd
import numpy as np

print('default\n')
print(pd.read_excel('mydata10.xlsx'))
print('\nUsing index for Sheet1\n')
print(pd.read_excel('mydata10.xlsx',0))
print('\nUsing name for Sheet1\n')
print(pd.read_excel('mydata10.xlsx','Sheet1'))
print('\nUsing name for Sheet2\n')
print(pd.read_excel('mydata10.xlsx','Sheet2'))
print('\nUsing index for Sheet2\n')
print(pd.read_excel('mydata10.xlsx',1))


By default, the returned dataframe is composed of the data tabulated in the first spreadsheets. To load the data in the second spreadsheet, we must specify the name of the sheet Sheet2 or the number of the sheet (index) [1] just as the second argument. The output of the program is shown below:

default

  Unnamed: 0  white  red  green  black
0                  a     12    23     17     18
1                  b     22   16     19     18
2                  c     14   23     22     21

Using index for Sheet1

  Unnamed: 0  white  red  green  black
0                       a     12   23     17     18
1                       b     22   16     19     18
2                       c     14   23     22     21

Using name for Sheet1

  Unnamed: 0  white  red  green  black
0                  a     12   23     17     18
1                  b     22   16     19     18
2                  c     14   23     22     21

Using name for Sheet2

  Unnamed: 0  yellow  purple  blue  orange
0                 A      11      16       44      22
1                 B      20      22       23      44
2                 C      30      31       37      32

Using index for Sheet2

  Unnamed: 0  yellow  purple  blue  orange
0                 A      11      16        44      22
1                 B      20      22        23      44
2                 C      30      31        37      32
------------------
(program exited with code: 0)

Press any key to continue . . . 


Now let's convert a dataframe into a spreadsheet on Excel. See the following program :

import pandas as pd
import numpy as np

frame = pd.DataFrame(np.random.random((4,4)),
                     index = ['exp1','exp2','exp3','exp4'],
                     columns = ['Jan2015','Fab2015','Mar2015','Apr2005'])

frame.to_excel('mydata11.xlsx')

print(pd.read_excel('mydata11.xlsx'))



In our program we first create a dataframe. Then using the to_excel() we convert our created dataframe into a spreadsheet whose name is passed as an argument to the to_excel('mydata11.xlsx') . Finally we read the newly created excel file using read_excel().

The output of the program is shown below:

  Unnamed: 0       Jan2015    Fab2015   Mar2015   Apr2005
0                 exp1  0.802199  0.669568  0.667242  0.938849
1                 exp2  0.326955  0.125371  0.909584  0.350017
2                 exp3  0.667992  0.707551  0.187137  0.244859
3                 exp4  0.139591  0.473944  0.945452  0.814224
------------------
(program exited with code: 0)

Press any key to continue . . .



Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment