Tuesday, December 18, 2018

Openpyxl module (work with Excel spreadsheets)

The openpyxl module allows Python program to read and modify Excel files. Python does not come with OpenPyXL, so you’ll have to install it. To install, open the command prompt and use the pip installer as follows:

C:\Users\Python>pip install openpyxl
Collecting openpyxl
  Downloading https://files.pythonhosted.org/packages/08/8a/509eb6f58672288da9a5
884e1cc7e90819bc8dbef501161c4b40a6a4e46b/openpyxl-2.5.12.tar.gz (173kB)
    100% |████████████████████████████████| 174kB 3.4MB/s
Collecting jdcal (from openpyxl)
  Downloading https://files.pythonhosted.org/packages/a0/38/dcf83532480f25284f3e
f13f8ed63e03c58a65c9d3ba2a6a894ed9497207/jdcal-1.4-py2.py3-none-any.whl
Collecting et_xmlfile (from openpyxl)
  Downloading https://files.pythonhosted.org/packages/22/28/a99c42aea746e18382ad
9fb36f64c1c1f04216f41797f2f0fa567da11388/et_xmlfile-1.0.1.tar.gz
Installing collected packages: jdcal, et-xmlfile, openpyxl
  Running setup.py install for et-xmlfile ... done
  Running setup.py install for openpyxl ... done
Successfully installed et-xmlfile-1.0.1 jdcal-1.4 openpyxl-2.5.12

C:\Users\Python>

After openpyxl is installed it can be imported into the Python programs which can then work with the Excel spreadsheets. I have created an excel spreadsheet 'Data.xlsx' and stored in my program directory. This file will be used in the programs for this post. We can also create Excel spreadsheet through program. Let's create one

1. Creating Excel spreadsheet

This program will create a new xlsx file with name myData.xlsx which will be stored in the your program directory. See the code below:

from openpyxl import Workbook

book = Workbook()

sheet = book.active

sheet['A1'] = 10
sheet['A2'] = 20
sheet['A3'] = 30

sheet['B1'] = 'Arnie'
sheet['B2'] = 'Barnie'
sheet['B3'] = 'Charlie'

sheet['C1'] = 10
sheet['C2'] = 20
sheet['C3'] = 30


book.save("myData.xlsx") 

Run this program and check your program directory, you will notice a new Excel file 'myData.xlsx' is created with the provided data.

Our program started by importing the Workbook class which  is the container for all other parts of the document. A workbook is always created with at least one worksheet.

The second line book = Workbook() created a new workbook and then we get the reference of sheet by using the openpyxl.workbook.Workbook.active() property in the third line sheet1 = book.active.

Then we write numerical data to cells A1, A2 ,A3 of column A, B1,B2,B3 of column B and C1,C2,C3 of column C. After the data is entered we write the contents to the myData.xlsx file with the save() method.

You must be wondering if we can create create new worksheets  in our Excel document through a program? Yes! we can.

We create new worksheets by using the openpyxl.workbook.Workbook.create_sheet() method as shown below:

from openpyxl import Workbook

import time

book = Workbook()
sheet = book.create_sheet()
sheet = book.create_sheet()
sheet = book.create_sheet()
book.save("mysheet1.xlsx")


Run this program and check your program directory, you will notice a new Excel file 'mysheet1.xlsx' is created..

Sheets are given a name automatically when they are created. They are numbered in sequence (Sheet, Sheet1, Sheet2,...). You can change this name at any time with the title property as shown below:

sheet.title = 'Name'

Let's revise our code and change the names of the worksheets. See the code below:

from openpyxl import Workbook

import time

book = Workbook()
sheet = book.create_sheet()
sheet.title = 'Name'
sheet = book.create_sheet()
sheet.title = 'Age'
sheet = book.create_sheet()
sheet.title = 'Salary'

book.save("mysheet1.xlsx")



Run this program and check your program directory, you will notice that the Excel file 'mysheet1.xlsx' has sheets with the specified titles, sheet, Name,Age,Salary.

The background color of the tab holding this title is white by default. We can change this providing an RRGGBB color code to the sheet_properties.tabColor property as shown below:

from openpyxl import Workbook
import time

book = Workbook()
sheet = book.create_sheet()
sheet.title = 'Name'
sheet.sheet_properties.tabColor = "1072BA"

sheet = book.create_sheet()
sheet.title = 'Age'
sheet.sheet_properties.tabColor = "458b00"

sheet = book.create_sheet()
sheet.title = 'Salary'
sheet.sheet_properties.tabColor = "daa520"

book.save("mysheet1.xlsx")


Run this program and check your program directory, you will notice that the Excel file 'mysheet1.xlsx' has sheets with the specified titles, sheet, Name,Age,Salary in different colors as specified in the program as shown below:




2. Using the sheets

In case if the spreadsheet has a decent number of worksheets and we want the names of these worksheets we can get these names with the help of the following code:


import openpyxl

book = openpyxl.load_workbook('mysheet1.xlsx')

print(book.get_sheet_names())


Run this program and see the output, it should be:

['Sheet', 'Name', 'Age', 'Salary']

------------------
(program exited with code: 0)

Press any key to continue . . . 


The output shows all the worksheets in our mysheet1.xlsx spreadsheet.  Remember that mysheet1.xlsx needs to be in the current working directory in order for you to work with it.
The openpyxl.load_workbook() function takes in the filename and returns a value of the workbook data type. This Workbook object represents the Excel file, a bit like how a File object represents an opened text file. This object is then used to call the get_sheet_names() method which shows all the worksheets in our mysheet1.xlsx spreadsheet.


It is also possible to get the workbook’s active sheet, which is the sheet that’s on top when the workbook is opened in Excel. Let's see how this is implemented through a program:

import openpyxl

book = openpyxl.load_workbook('mysheet1.xlsx')

print(book.get_active_sheet())


Each sheet is represented by a Worksheet object and using this we can call the get_active_sheet() method of a Workbook object to get the workbook’s active sheet. Run this program and see the output, it should be:

<Worksheet "Sheet">
------------------
(program exited with code: 0)

Press any key to continue . . .

3. Working with the data in spreadsheet

After having access to the worksheet, the next task is to access the cells and if required, modify their content.

a. Accessing a single cell 

Cells can be accessed directly as keys of the worksheet. See the program below:

import openpyxl

book = openpyxl.load_workbook('myData.xlsx')

mysheet = book.get_sheet_by_name('Sheet')

c1 = mysheet['B1']

print(c1.value)


Run this program and see the output, it should be:

Arnie
------------------
(program exited with code: 0)

Press any key to continue . . .


Workbook object represents the Excel file which is used to call the get_sheet_by_name() method with a specified worksheet name 'Sheet'. This is stored in mysheet variable which is then used to access a particular cell whose reference is stored in the variable c1. Finally using the value attribute we get the cell's data.

What if I try to access a cell which does not exist, say c4 = mysheet['B4']. This will return the cell at A4 or create one if it does not exist yet. Values can be directly assigned as shown below:

mysheet['B4'] = 'Veevaeck'

There is also the openpyxl.worksheet.Worksheet.cell() method which can be used to access a cell as shown below:

cell2 = mysheet.cell(row = 2, column = 2)

This is useful as specifying a column by letter can be tricky to program, especially because after column Z, the columns start by using two letters: AA, AB,AC, and so on. As an alternative, you can also get a cell using the sheet’s cell() method and passing integers for its row and column keyword arguments. The first row or column integer is 1, not 0.

Using the cell() method and its keyword arguments, we can even write a for loop to print the values of a series of cells. Let's make a program and implement what we have discussed just now:

import openpyxl

book = openpyxl.load_workbook('myData.xlsx')

mysheet = book.get_sheet_by_name('Sheet')

c1 = mysheet['B1']

print(c1.value)

c4 = mysheet['B4']

mysheet['B4'] = 'Veevaeck'

print(c4.value+'\n')

cell2 = mysheet.cell(row = 2, column = 2)

print(cell2.value+'\n')

for i in range(1,5):
   
    print(i,mysheet.cell(row = i,column = 2).value)


Run this program and see the output, it should be:

Arnie
Veevaeck

Barnie

1 Arnie
2 Barnie
3 Charlie
4 Veevaeck
------------------
(program exited with code: 0)

Press any key to continue . . .


The first output is already discussed. In the code c4 = mysheet['B4'] we get a cell with no data hence mysheet['B4'] = 'Veevaeck' assigns data to c4 which is then printed. Next we wrote a for loop to print all the data in the column 2.

We can also determine the size of the sheet with the Worksheet object’s max_row and max_column property as shown in the program below:

import openpyxl

book = openpyxl.load_workbook('myData.xlsx')

mysheet = book.get_sheet_by_name('Sheet')

print(mysheet.max_row)
print(mysheet.max_column)

Run this program and see the output, it should be:

3
3
------------------
(program exited with code: 0)

Press any key to continue . . . 


The return type is always an integer value.


B. Appending values to a sheet


Using the append() method, we can append a group of values at the bottom of the current sheet. See the code below:

import openpyxl

wb = openpyxl.load_workbook('myData.xlsx')
sheet = wb.get_active_sheet()

rows = (
    (88, 46, 57),
    (89, 38, 12),
    (23, 59, 78),
    (56, 21, 98),
    (24, 18, 43),
    (34, 15, 67)
)

for row in rows:
    sheet.append(row)

wb.save('myData.xlsx')
   
In this program we are appending three columns of data into the current sheet myData.xlsx. The data is stored in a 'rows' which is a tuple of tuples. Using a for loop we go through the container row by row and insert the data row with the append() method. After running this program open 'myData.xlsx' spreadsheet and see that the data is appended.   


C. Accessing cells from the sheets


Ranges of cells can be accessed using slicing . See the program below:

import openpyxl

book = openpyxl.load_workbook('myData.xlsx')

mysheet = book.active

cell_range = mysheet['A1':'B6']

for c1, c2 in cell_range:
    print("{0:8} {1:8}".format(c1.value, c2.value))


In the example, we read data from two columns using a range operation.

In this line, cell_range = mysheet['A1':'B6'] we read data from cells A1 - B6. The format() function is used for neat output of data on the console as shown below:

     100 Arnie
     200 Barnie
     300 Charlie
     400 Veevaeck
     500 Satya
     600 Naga
------------------
(program exited with code: 0)

Press any key to continue . . .


D. Iterating by rows 

The iter_rows() method return cells from the worksheet as rows. See the program below:

from openpyxl import Workbook

book = Workbook()
sheet = book.active

rows = (
    (88, 46, 57),
    (89, 38, 12),
    (23, 59, 78),
    (56, 21, 98),
    (24, 18, 43),
    (34, 15, 67)
)

for row in rows:
    sheet.append(row)
   
for row in sheet.iter_rows(min_row=1, min_col=1, max_row=6, max_col=3):
    for cell in row:
        print(cell.value, end=" ")
    print()   

book.save('iteration_rows.xlsx')


The program iterates over data row by row based on the boundaries we have provided for the iteration.See the output below:

88 46 57
89 38 12
23 59 78
56 21 98
24 18 43
34 15 67
------------------
(program exited with code: 0)

Press any key to continue . . .

E. Iterating by columns 

The iter_cols() method return cells from the worksheet as columns. See the program below:

from openpyxl import Workbook

book = Workbook()
sheet = book.active

rows = (
    (88, 46, 57),
    (89, 38, 12),
    (23, 59, 78),
    (56, 21, 98),
    (24, 18, 43),
    (34, 15, 67)
)

for row in rows:
    sheet.append(row)
   
for row in sheet.iter_cols(min_row=1, min_col=1, max_row=6, max_col=3):
    for cell in row:
        print(cell.value, end=" ")
    print()   

book.save('iteration_col.xlsx')


The example iterates over data column by column based on the boundaries we have provided for the iteration. See the output below:

88 89 23 56 24 34
46 38 59 21 18 15
57 12 78 98 43 67
------------------
(program exited with code: 0)

Press any key to continue . . .

F. Converting Between Column Letters and Numbers

To convert from letters to numbers, call the openpyxl.cell.column_index_from _string() function. To convert from numbers to letters, call the openpyxl.cell.get_column_letter() function. See the program below for the implementation of these functions:

import openpyxl
from openpyxl.utils import get_column_letter, column_index_from_string


wb = openpyxl.load_workbook('myData.xlsx')
sheet = wb.get_sheet_by_name('Sheet')

print(get_column_letter(sheet.max_column))
print(column_index_from_string('A'))


See the output below:

C
1
------------------
(program exited with code: 0)

Press any key to continue . . . 


G. Getting Rows and Columns from the Sheets

We can slice Worksheet objects to get all the Cell objects in a row, column, or rectangular area of the spreadsheet. Then loop over all the cells in the slice and print it's value. See the program below:

import openpyxl

wb = openpyxl.load_workbook('myData.xlsx')
sheet = wb.get_sheet_by_name('Sheet')

for row_Cell_Objects in sheet['A1':'C3']:
    for cell_Obj in row_Cell_Objects:
        print(cell_Obj.coordinate, cell_Obj.value)
    print('--- END OF ROW ---')

   
In our program we want the Cell objects in the rectangular area from A1 to C3, and we get a Generator object containing the Cell objects in that area. So overall, our slice of the sheet contains all the Cell objects in the area from A1 to C3, starting from the top-left cell and ending with the bottom right cell. To print the values of each cell in the area, we use two for loops. The outer for loop goes over each row in the slice. Then, for each row, the nested for loop goes through each cell in that row.

The output of this program is shown below:

A1 100
B1 Arnie
C1 10
--- END OF ROW ---
A2 200
B2 Barnie
C2 20
--- END OF ROW ---
A3 300
B3 Charlie
C3 30
--- END OF ROW ---
------------------
(program exited with code: 0)

Press any key to continue . . .



To access the values of cells in a particular row or column, we can also use a Worksheet object’s rows and columns attribute as shown in the program below:

import openpyxl

wb = openpyxl.load_workbook('myData.xlsx')
sheet = wb.get_active_sheet()

my_data = list(sheet.columns)[1]

for cell_Obj in my_data:
   
    print(cell_Obj.value)


The columns attribute returns a list containing the Cell objects in a particular column. to get the list that represents column B, we use list (sheet.columns)[1]. To get the list containing the Cell objects in column A, you’d use list(sheet.columns)[0]. Once we have a list representing one row or
column, we used a for loop through its Cell objects and print their values. The output of this program is shown below:

Arnie
Barnie
Charlie
Veevaeck
Satya
Naga
Raju
------------------
(program exited with code: 0)

Press any key to continue . . .



The Openpyxl module has more methods and properties but we have covered the mostly used ones. Practice more and more as this module is widely used in data retrieval. Here we end today's post. Till we meet next keep learning Python as Python is easy to learn!


   




   

























































Share:

0 comments:

Post a Comment