Sunday, April 21, 2019

Pandas - 18 (Reading and Writing Data on HDF5 format)

The HDF term stands for hierarchical data format, and in fact this library is concerned with reading and writing HDF5 files containing a structure with nodes and the possibility to store multiple datasets.

It is very efficient, especially when using this format to save huge amounts of data. Compared to other formats that work more simply in binary, HDF5 supports compression in real time, thereby taking advantage of repetitive patterns in the data structure to compress the file size.

At present, the possible choices in Python are PyTables and h5py. These two forms differ in several aspects and therefore their choice depends very much on the needs of those who use it.

h5py provides a direct interface with the high-level APIs HDF5, while PyTables makes abstract many of the details of HDF5 to provide more flexible data containers, indexed tables, querying capabilities, and other media on the calculations.

pandas has a class-like dict called HDFStore, using PyTables to store pandas objects. So before working with the format HDF5, you must import the HDFStore class

from pandas.io.pytables import HDFStore

Now let's make program to store the data of a dataframe within an.h5 file:

import pandas as pd
import numpy as np
from pandas.io.pytables import HDFStore

frame = pd.DataFrame(np.arange(16).reshape(4,4),
                    index=['white','black','red','blue'],
                    columns=['up','down','right','left'])
store = HDFStore('mydata12.h5')
store['obj1'] = frame

print('dataframe\n')
print((frame))

store['obj2'] = frame
print('\nMultiple data structures in a single file\n')
print((store))

First we create a dataframe and then create a file HDF5 calling it mydata12.h5, then enter the data inside of the dataframe. Next we store multiple data structures within the same HDF5 file, specifying for each of them a label. The output of the program is shown below:

dataframe

       up  down  right  left
white   0     1      2     3
black   4     5      6     7
red     8     9     10    11
blue   12    13     14    15

Multiple data structures in a single file

<class 'pandas.io.pytables.HDFStore'>
File path: mydata12.h5

Closing remaining open files:mydata12.h5...done
------------------
(program exited with code: 0)

Press any key to continue . . .


Taking account of having an HDF5 file containing various data structures, objects inside can be called in the following way:

import pandas as pd
import numpy as np
from pandas.io.pytables import HDFStore

frame = pd.DataFrame(np.arange(16).reshape(4,4),
                    index=['white','black','red','blue'],
                    columns=['up','down','right','left'])
store = HDFStore('mydata12.h5')


store['obj2'] = frame
print('\nCalling objects inside the HDF5 file\n')
print(store['obj2'])

 

The output of the program is shown below:

Calling objects inside the HDF5 file

            up  down  right  left
white   0     1          2     3
black   4     5          6     7
red      8     9         10    11
blue    12   13       14    15
Closing remaining open files:mydata12.h5...done
------------------
(program exited with code: 0)

Press any key to continue . . .


Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment