Thursday, April 11, 2019

Pandas - 12 (Reading TXT Files Into Parts)

Often it is required to read the file into portions, for example when large files are processed, or when we are only interested in portions of these files. This is both to apply any iterations and because we are not interested in parsing the entire file.

In the following program we will read only a portion of the file mydata1.csv(used in previous post) by specifying the number of lines on which to parse :

import pandas as pd
import numpy as np


frame1 = pd.read_csv('mydata1.csv',skiprows=[2],nrows=3,header=None)

print('\nThe dataframe\n')
print(frame1)



The output of the program is shown below: 

The dataframe

     0  1  2  3      4
0  1  5  2  3       pen
1  2  7  8  5       pencil
2  2  2  8  3      eraser
------------------
(program exited with code: 0)

Press any key to continue . . .


In the next program we'll split into portions that part of the text on which we want to parse. Then, for each portion a specific operation will be carried out, in order to obtain an iteration, portion by portion. See the following program :

import pandas as pd
import numpy as np

out = pd.Series()
i=0

pieces = pd.read_csv('mydata.csv',chunksize=3)

for piece in pieces:
   
    out.set_value(i,piece['yellow'].sum())
    i+=1

print('\nThe series\n')
print(out)


In the above program we want to add the values in a column every three rows and then insert these sums in a series. The output of the program is shown below:

The series

0    16
1    10
dtype: int64
------------------
(program exited with code: 0)

Press any key to continue . . .



Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment