Tuesday, March 5, 2019

NumPy library - 9 (Reading and Writing Array Data on Files)

The process of reading data contained in a file a very common data analysis operation, since the size of the dataset to be analyzed is almost always huge, and therefore it is not advisable or even possible to manage it manually. NumPy provides a set of functions that allow data analysts to save the results of their calculations in a text or binary file. Similarly, NumPy allows us to read and convert written data in a file into an array.

Lets look into some of the functions:

1. The save() function

In order to save the array we call the save() function and specify as arguments the name of the file and the array. Suppose we have an array 'darray' that contains the results of our data analysis processing we can save it as shown in the following program:

import numpy as np

darray = np.random.random(9).reshape(3,3)

print('The result of analysis is in this array : \n')
print(darray)

np.save('result_DA',darray)

print('\nThe array is saved in your working directory')



The file result_DA.npy will be stored in your working directory. The .npy extension is automatically given to the file. The output of the above program only prints the created array:

The result of analysis is in this array :

[[0.27533037 0.50859006 0.97846682]
 [0.35034065 0.47055893 0.11045708]
 [0.55527203 0.37309907 0.31417643]]

The array is saved in your working directory
------------------
(program exited with code: 0)

Press any key to continue . . . 


2. The load() function

The data saved using the save() function in the .npy file can be recovered using NumPy's load() function by specifying the filename as the argument. Thus to access data stored in result_DA.npy we can use the load() function as shown below:

import numpy as np

darray_data = np.load('result_DA.npy')

print('The content of data analysis array : \n')
print(darray_data)


The output of the program is shown below:

The content of data analysis array :

[[0.27533037 0.50859006 0.97846682]
 [0.35034065 0.47055893 0.11045708]
 [0.55527203 0.37309907 0.31417643]]
------------------
(program exited with code: 0)

Press any key to continue . . .


3. The genfromtxt() function

This function allows to read data from a text file(.txt,.csv) and insert values into an array. It takes three arguments—the name of the file containing the data, the character that separates the values from each other (in this case is a comma), and whether the data contain column headers.

Let's read data from newcities.txt file which is in my working directory and it's content is shown below:

id,value1,value2,value3
1,123,1.4,23
2,110,0.5,18
3,164,2.1,19


The following program uses genfromtxt() function:

cities_data = np.genfromtxt('newcities.txt',delimiter=',', names=True)

print('The content of file : \n')
print(cities_data)
print(cities_data.dtype)


The output of the program is shown below:

The content of file :

[(1., 123., 1.4, 23.) (2., 110., 0.5, 18.) (3., 164., 2.1, 19.)]
[('id', '<f8'), ('value1', '<f8'), ('value2', '<f8'), ('value3', '<f8')]
------------------
(program exited with code: 0)

Press any key to continue . . . 


We get a structured array in which the column headings have become the field names. The genfromtxt() function implicitly performs two loops: the first loop reads a line at a time, and the second loop separates and converts the values contained in it, inserting the consecutive elements created specifically. One positive aspect of this feature is that if some data are missing, the function can handle them.

Lets remove some data from newcities.txt file and save the new file as newcities1.txt file. In our previous program use this new file. The output should be something like this:

The content of file :

[(1., 123., 1.4, 23.) (2., 110., nan, 18.) (3.,  nan, 2.1, 19.)]
[('id', '<f8'), ('value1', '<f8'), ('value2', '<f8'), ('value3', '<f8')]


------------------
(program exited with code: 0)

Press any key to continue . . .


As we can see from the output, the genfromtxt() function replaces the blanks in the file with nan values. The column headings contained in the file can be considered labels that act as indexes to extract the values by column, also by using the numerical indexes we can extract data
corresponding to the rows. as shown in the following program:

cities_data = np.genfromtxt('newcities1.txt',delimiter=',', names=True)

print('\nUsing indexes to extract the values by column : \n')
print(cities_data['id'])
print('\nUsing numerical indexes to extract extract data corresponding to the rows : \n')
print(cities_data[0])


The output of the program is shown below:


Using indexes to extract the values by column :

[1. 2. 3.]

Using numerical indexes to extract extract data corresponding to the rows :

(1., 123., 1.4, 23.)


------------------
(program exited with code: 0)

Press any key to continue . . .


Here I will end this post. We have learned about all the main aspects of the NumPy library and became familiar with a range of features and concepts. In the next post, we begin to introduce a new library, called SciPy.

So until we meet next, keep practicing and learning Python as Python is easy to learn!










Share:

0 comments:

Post a Comment