Monday, November 26, 2018

File Handling in Python

We usually use variables to store data while our program is running, but if we need our data to persist even after the program has finished, we need to save it to a file. Se we can think of a file’s contents as a single string value, potentially gigabytes in size. Thus an incredible amount of data is available in text files. Text files can contain weather data, traffic data, socioeconomic data, literary works, and more.
A file has two key properties: a filename (usually written as one word) and a path. The path specifies the location of a file on the computer. For example, there is a file on my computer with the filename euler.txt in the path F:\Python_Code\examples. The part of the filename after the last period is called the file’s extension and tells you a file’s type. euler.txt is a text document.

Reading from a file

Reading from a file is useful in data analysis applications and also applicable to any situation in which we want to analyze or modify information stored in a file. When working with the content in a text file the first step is to read the file into memory. We can either read the entire contents of a file, or
work through the file one line at a time.

1. Reading an entire file
Let’s start with a file that contains Euler's number e with numerical value of e truncated to 50 decimal places with 10 decimal places per line:

2.7182818284
   5904523536
   0287471352
   6624977572
   4709369995


I've saved this file as euler.txt in F:\Python_Code\examples, where I store all my programs. Here’s a program euler_reader.py, that opens this file, reads it, and prints the contents of the file to the screen: 
with open('euler.txt') as file_object:
   
    file_content = file_object.read()
   
    print(file_content)


The output of the program is shown below:

01

To do any work with a file, even just printing its contents, you first need to open the file to access it. This is accomplished by the open(). The open() function needs one argument: the name of the file you want to open. Python looks for this file in the directory where the program that’s currently being executed is stored.

In this example, euler_reader.py is currently running, so Python looks for euler.txt in the directory where euler_reader.py is stored. The open() function returns an object representing the file. Here, open('euler.txt') returns an object representing euler.txt. Python stores this object in file_object.
The keyword with closes the file once access to it is no longer needed. Notice how we call open() in this program but not close(). We can open and close the file by calling open() and close(), but sometimes a bug in our program may prevent the close() statement from being executed, thus the file may never close. The improperly closed files can cause data to be lost or corrupted. Also if we call close() too early in our program, we might trying to work with a closed file (a file you can’t access), which leads to more errors. It’s not always easy to know exactly when you should close a file, but with the structure shown here, Python will figure that out for you. Thus, just open the file and work with it as desired, trusting that Python will close it automatically when the time is right.

Once we have a file object representing pi_digits.txt, we use the read() method in the second line of our program to read the entire contents of the file and store it as one long string in file_content. When we print the value of file_content, we get the entire text file back. The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read() returns an empty string when it reaches the end of the file; this empty string shows up as a blank line. If you want to remove the extra blank line, you can use rstrip() in the print statement:
The revised code and it's output is shown below :

with open('euler.txt') as file_object:
   
    file_content = file_object.read()
   
    print(file_content.rstrip())
 

02

By default when using the open() function, when we pass a file to the open  () Python looks in the directory where the file that’s currently being executed (that is, your .py program file) is stored. In case the file is stored in a different directory, we need to provide a file path, which tells Python to look in a specific location on your system.

Let us create a file Names.txt in F:/Python_Code/examples/Data. Because Names.txt is inside Python_Code, we could use a relative file path to open a file from Data. A relative file path tells Python to look for a given location relative to the directory where the currently running program file
is stored. See the code below:

with open('Data/Names.txt') as file_object:
   
    file_content = file_object.read()
   
    print(file_content.rstrip())

The line with open('Data/Names.txt') as file_object: tells Python to look for the desired .txt file in the folder Data and assumes that Data is located inside examples folder which is inside Python_Code folder.

We can also tell Python exactly where the file is on your computer regardless of where the program that’s being executed is stored. This is called an absolute file path. You use an absolute path if a relative path doesn’t work. For instance I have a folder Text_files in  F:/Text_files. In the folder I have a file positions.txt. In order to access positions.txt I have to provide the absolute file path, which in this case is F:/Text_files/positions.txt.

To print the content of positions.txt the code is shown below:

with open('F:/Text_files/positions.txt') as file_object:
   
    file_content = file_object.read()
   
    print(file_content.rstrip())


The output of the program is shown below:

03

In case if Absolute paths are longer than relative paths,  it’s helpful to store them in a variable and then pass that variable to open().

For example my_path = 'F:/Text_files/positions.txt' and then use this in the open() as follows:

with open('my_path') as file_object:

The revised program is as shown below:

my_path = 'F:/Text_files/positions.txt'

with open(my_path) as file_object:
   
    file_content = file_object.read()
   
    print(file_content.rstrip())


2. Reading line by line from a file 

We can use a for loop on the file object to examine each line from a file one at a time. Thus to read each line from our file euler .txt we can use the following program:

my_path = 'euler.txt'

with open(my_path) as file_object:
   
    for line in file_object:
       
        print(line) 

   
After we call open(), an object representing the file and its contents is stored in the variable file_object. To examine the file’s contents, we work through each line in the file by looping over the file object. The output is shown below:

04

Note the blank lines appear because an invisible newline character is at the end of each line in the text file. The print statement adds its own newline each time we call it, so we end up with two newline characters at the end of each line: one from the file and one from the print statement. Using rstrip() on each line in the print statement eliminates these extra blank lines. The revised program and it's output is shown below:

my_path = 'euler.txt'

with open(my_path) as file_object:
   
    for line in file_object:
       
        print(line.rstrip())


05

A point to be noted is that when we use with, the file object returned by open() is only available inside the with block that contains it. So if we want to retain access to a file’s contents outside the with block, we can store the file’s lines in a list inside the block and then work with that list.  In this way we can process parts of the file immediately and postpone some processing for later in the program.

The revised program and it's output using above mentioned approach is shown below:
my_path = 'euler.txt'

with open(my_path) as file_object:
   
    lines = file_object.readlines()
   
for line in lines:
       
    print(line.rstrip())

06

In case you want to print the value of euler's number in one line, here is the code:
my_path = 'euler.txt'

with open(my_path) as file_object:
   
    lines = file_object.readlines()
   
myline = ''

for line in lines:
   
    myline+=line.rstrip()
       
print(myline)

The output is shown below:

07
The output contains the whitespace that was on the left side of the digits in each line, but we can get rid of that by using strip() instead of rstrip():

my_path = 'euler.txt'

with open(my_path) as file_object:
   
    lines = file_object.readlines()
   
myline = ''

for line in lines:
   
    myline+=line.strip()
       
print(myline)
   
The output without whitespaces is shown below:

08

We can also check the presence of data in a file. For example in the file Names.txt if I want to check whether Tom is present I can do in the following way:
my_path = 'Data/Names.txt'

with open(my_path) as file_object:
   
    lines = file_object.readlines()
   
myline = ''

for line in lines:
   
    myline += line
       
myname = input("Enter your name to check it's existence in the file\n")

if myname in myline:
   
    print("\nThe name :" + myname + " is present in the file")
   
else:
   
    print("\nThe name :" + myname + " is not present in the file")   
   
The output of the program is shown below:

09

Writing to a file 

After learning how to read from a file, it's time to learn how to write to a file in order to save data. When we write text to a file, the output will still be available after we close the terminal containing our program’s output. We can examine output after a program finishes running, and can share the output files with others as well. It is also possible to write programs that read the text back into memory and work with it again later.

1. Writing to an empty file

In order to write text to a file, we use open() with a second argument telling Python that you want to write to the file. To see how this works, let’s write a simple program to store city name in a file instead of printing it to the screen:

filename = 'cities.txt'

with open(filename,'w') as file_object:
   
    file_object.write("Hyderabad")

The call to open() in this example has two arguments, the name of the file we want to open and the mode in which we want to open the file. Thus open(filename,'w') tells Python to open cities.txt file in a write mode. It is possible to open a file in read mode ('r'), write mode ('w'), append mode ('a'), or a mode that allows you to read and write to the file ('r+'). If you omit the mode argument, Python opens the file in read-only mode by default.

The open() function automatically creates the file you’re writing to if it doesn’t already exist. However, be careful opening a file in write mode ('w') because if the file does exist, Python will erase the file before returning the file object.

Finally we use the write() method on the file object to write a string to the file. This program has no terminal output, but if you open the file you'll see the string "Hyderabad" written in it.
The write() function doesn’t add any newlines to the text we write. So if we write more than one line without including newline characters, our file may not look the way wewant it to. For example I add more cities to our cities.txt file as shown below:

filename = 'cities.txt'

with open(filename,'w') as file_object:
   
    file_object.write("Hyderabad")
    file_object.write("Mumbai")
    file_object.write("Delhi")
    file_object.write("Chennai")
    file_object.write("Kolkata")
    file_object.write("Bengaluru")
   
The content in the file will be "HyderabadMumbaiDelhiChennaiKolkataBengaluru" which was not expected. Thus adding newlines in the write() may do the trick. See the revised code below:
filename = 'cities.txt'

with open(filename,'w') as file_object:
   
    file_object.write("Hyderabad\n")
    file_object.write("Mumbai\n")
    file_object.write("Delhi\n")
    file_object.write("Chennai\n")
    file_object.write("Kolkata\n")
    file_object.write("Bengaluru\n")

Now if we open cities.txt file the content will be as shown below:
Hyderabad
Mumbai
Delhi
Chennai
Kolkata
Bengaluru


2. Appending to an empty file 

In case you want to add more cities to the file cities.txt without erasing the existing content, open the file using append mode. See the code below:
filename = 'cities.txt'

with open(filename,'a') as file_object:
   
    file_object.write("Jabalpur\n")

Now if we open cities.txt file the content will be as shown below:
Hyderabad
Mumbai
Delhi
Chennai
Kolkata
Bengaluru
Jabalpur


Thus our file is with the original contents of the file, followed by the new content we just added. This brings to an end of today's discussion on file handling. As an exercise create a file say pizzas containing your favorite pizzas using open() and write(). Then read the content of the file and print the result on the screen. Till we meet next keep practicing and learning Python as Python is....



   
   











Share:

0 comments:

Post a Comment