Tuesday, December 25, 2018

CSV module


CSV, “comma-separated values”, files are simplified spreadsheets stored as plaintext files. Python’s csv module contains parameters and method to parse CSV files. Each line in a CSV file represents a row in the spreadsheet, and commas separate the cells in the row. CSV files are widely supported by many types of programs, can be viewed in text editors (including IDLE’s file editor), and are a straightforward way to represent spreadsheet data.



CSV files are just text files of comma-separated values. Not every comma in a CSV file represents the boundary between two cells. CSV files also have their own set of escape characters to llow commas and other characters to be included as part of the values. Let’s start using the csv module for reading and

writing CSV files.



Reading data from a CSV file



To read data from a CSV file with the csv module we first create a Reader object which allows us to iterate over lines in the CSV file. Let’s create a CSV file fruits.csv as shown below:



20/12/2018 13:34,Apples,73

21/12/2018 3:41,Guava,85

22/12/2018 12:46,Pears,14

23/12/2018 8:59,Oranges,52

24/12/2018 2:07,Grapes,152

25/12/2018 18:10,Bananas,23

26/12/2018 2:40,Pineapple,98



Now we’ll read data from this file with the help of the following program:



import csv

myFile = open('fruits.csv')

myReader = csv.reader(myFile)

myData = list(myReader)

print(myData)



The output of the above program is shown below:



[['20/12/2018 13:34', 'Apples', '73'], ['21/12/2018 3:41', 'Guava', '85'], ['22/12/2018 12:46', 'Pears', '14'], ['23/12/2018 8:59', 'Oranges', '52'], ['24/12/2018 2:07', 'Grapes', '152'], ['25/12/2018 18:10', 'Bananas', '23'], ['26/12/2018 2:40', 'Pineapple', '98']]



------------------

(program exited with code: 0)



Press any key to continue . . .



The CSV module comes with Python, so no installation is required and we can import it in to our programs. To read a CSV file with the csv module, first open it using the open() function and store it in a File object (myFile). Next we pass myFile file object to the csv.reader() function which returns a Reader object. In order to access the values in the Reader object we convert it to a plain Python list by passing it to list() and store the list in a variable myData. Finally we print the CSV file as a list of lists as shown in the output file.



We can access the value at a particular row and column with the expression myData[row][col], here row is the index of one of the lists in myData, and col is the index of the item you want from that list. Let’s modify our program to print the value at a particular row and column. See the code below:



import csv

myFile = open('fruits.csv')

myReader = csv.reader(myFile)

myData = list(myReader)

print(myData[0][0])

print(myData[0][1])

print(myData[0][2])

print(myData[1][1])

print(myData[1][2])

print(myData[2][1])



The output of the above program is shown below:



20/12/2018 13:34

Apples

73

Guava

85

Pears





------------------

(program exited with code: 0)



Press any key to continue . . .





myData [0][0] goes into the first list and gives us the first string, myData [0][2] goes into the first list and gives us the third string, and so on. This approach is suitable for CSV files with small amount of data, so for large CSV files, we can use the Reader object in a for loop and print the data in CSV file. See the code below:



import csv

myFile = open('fruits.csv')

myReader = csv.reader(myFile)



for row in myReader:

                print('Row #' + str(myReader.line_num) + ' ' + str(row))



The output of the above program is shown below:



Row #1 ['20/12/2018 13:34', 'Apples', '73']

Row #2 ['21/12/2018 3:41', 'Guava', '85']

Row #3 ['22/12/2018 12:46', 'Pears', '14']

Row #4 ['23/12/2018 8:59', 'Oranges', '52']

Row #5 ['24/12/2018 2:07', 'Grapes', '152']

Row #6 ['25/12/2018 18:10', 'Bananas', '23']

Row #7 ['26/12/2018 2:40', 'Pineapple', '98']





------------------

(program exited with code: 0)



Press any key to continue . . .



Each row in a Reader object is a list of values, with each value representing a cell. Thus we loop through the rows in the Reader object and using the print() function we print the number of the current row and the

contents of the row. To get the row number, use the Reader object’s line_num variable, which contains the number of the current line.



Write data to a CSV file



To write data to a CSV file we require a Writer object which can be created using the csv.writer() function. See the following program to create a CSV file and write data to this file:



import csv

newFile = open('breakfast.csv', 'w', newline='')



myWriter = csv.writer(newFile)

myWriter.writerow(['bread', 'eggs', 'bacon', 'ham'])

myWriter.writerow(['scambled eggs', 'toast', 'juice', 'sandwich'])



newFile.close()



First we create an empty file and open it in write mode using the open() function. This will

create the object we can then pass to csv.writer() to create a Writer object. On Windows, we also need to pass a blank string for the open() function’s newline keyword argument. The writerow() method for Writer objects takes a list argument. Each value in the list is placed in its own cell in the output CSV file. The

return value of writerow() is the number of characters written to the file for that row (including newline characters).



When we run this program a new CSV file, 'breakfast.csv',  is created in your program directory which contains the data we wrote through the program.



The next program can be used to read the content of 'breakfast.csv' file:



import csv

myFile = open('breakfast.csv')

myReader = csv.reader(myFile)



for row in myReader:

                print('Row #' + str(myReader.line_num) + ' ' + str(row))



The output of the above program is shown below:



Row #1 ['bread', 'eggs', 'bacon', 'ham']

Row #2 ['scambled eggs', 'toast', 'juice', 'sandwich']



------------------

(program exited with code: 0)



Press any key to continue . . .



In case you want to separate cells with a tab character instead of a comma and you want the rows to be double-spaced , the following program can be used:



import csv

newFile = open('breakfast1.tsv', 'w', newline='')

myWriter = csv.writer(newFile,delimiter='\t', lineterminator='\n\n')

myWriter.writerow(['bread', 'eggs', 'bacon', 'ham'])

myWriter.writerow(['scambled eggs', 'toast', 'juice', 'sandwich'])

myWriter.writerow(['fried eggs', 'bread', 'juice', 'salad'])

newFile.close()



This program changes the delimiter and line terminator characters in our file. The delimiter is the character that appears between cells on a row. By default, the delimiter for a CSV file is a comma. The line terminator is the character that comes at the end of a row. By default, the line terminator is a newline. You can change characters to different values by using the delimiter and lineterminator keyword arguments with csv.writer(). We’re using the file extension .tsv, for tab-separated values.



Passing delimeter='\t' and lineterminator='\n\n'  changes the character between cells to a tab and the character between rows to two newlines. We then call writerow() three times to give us three rows. Open the 'breakfast1.tsv' file and notice that our cells are separated by tabs.
Share:

0 comments:

Post a Comment