Monday, April 22, 2019

Pandas - 19 (Pickling with pandas)

Pickling is the process in which the hierarchy of an object is converted into a stream of bytes. This allows an object to be transmitted and stored, and then to be rebuilt by the receiver itself retaining all the original features.

The pickle module implements a powerful algorithm for serialization and deserialization of a data structure implemented in Python. Although the picking operation is carried out by the pickle module, there is a module called cPickle which is the result of an enormous amount of work optimizing the pickle module (written in C). This module can be in fact in many cases even 1,000 times faster than the pickle module. Regardless of which module do we use, the interfaces of the two modules are almost the same.

Serialize a Python Object with cPickle

The data format used by the pickle (or cPickle) module is specific to Python. By default, an ASCII representation is used to represent it, in order to be readable from the human point of view. Then, by opening a file with a text editor, you may be able to understand its contents. The following program shows how to perform a serialization and deserialization of the data using the pickle module:

import pandas as pd
import numpy as np
import pickle

data = { 'color': ['white','red'], 'value': [5, 7]}
pickled_data = pickle.dumps(data)

print('\nSerialized data\n')
print(pickled_data)

nframe = pickle.loads(pickled_data)
print('\nDe-Serialized data\n')
print(nframe)



We first created a dictionary named data. Then we performed the serialization of the data object through the dumps() function of the cPickle module which takes dict object as an argument. To see how it serialized the dict object, we print the contents of the pickled_data variable.

Once we have serialized data, they can easily be written on a file or sent over a socket, pipe, etc.
After being transmitted, it is possible to reconstruct the serialized object (deserialization) with the loads() function of the cPickle module which takes pickled_data as an argument.

The output of the program is shown below:

Serialized data

b'\x80\x03}q\x00(X\x05\x00\x00\x00colorq\x01]q\x02(X\x05\x00\x00\x00whiteq\x03X\
x03\x00\x00\x00redq\x04eX\x05\x00\x00\x00valueq\x05]q\x06(K\x05K\x07eu.'

De-Serialized data

{'color': ['white', 'red'], 'value': [5, 7]}
------------------
(program exited with code: 0)

Press any key to continue . . .


Pickling with pandas

Pickling (and unpickling) with the pandas library is much easier. There is no need to import the cPickle module in the Python session and the whole operation is performed implicitly. Also, the serialization format used by pandas is not completely in ASCII.

The following program shows how to perform pickling in pandas:

import pandas as pd
import numpy as np
import pickle

frame = pd.DataFrame(np.arange(16).reshape(4,4),
                    index=['up','down','left','right'])
frame.to_pickle('frame.pkl')
print(pd.read_pickle('frame.pkl'))


First we created a dataframe which is then converted to pickle using the to_pickle(). There is a new file called frame.pkl in our working directory that contains all the information about the frame dataframe.

To open a PKL file and read the contents we use read_pickle('frame.pkl') which takes as an argument, the file to be read. The output of the program is shown below:



0123
up0123
down4567
left891011
right12131415

As you may have noticed all the implications on the operation of pickling and unpickling are completely hidden from the pandas user, making the job as easy and understandable as possible, for those who must deal specifically with data analysis. A key caution while using this format is to make sure that the file you open is safe because the pickle format was not designed to be protected against erroneous and maliciously constructed data.

Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment