Wednesday, April 17, 2019

Pandas - 17 (Reading and Writing Data on JSON Data)

pandas provide the read_json() and to_json() functions to read and write on JSON Data. In the following program we'll define a dataframe and then call the to_json() function on it, passing as an argument the name of the file that you want to create. After our JSON file is created we'll use the read_json() method to read it by passing the filename as an argument. See the following program :

import pandas as pd
import numpy as np

frame = pd.DataFrame(np.arange(16).reshape(4,4),
                    index=['white','black','red','blue'],
                    columns=['up','down','right','left'])
frame.to_json('frame.json')

print(pd.read_json('frame.json'))

The output of the program is shown below:

            up  down  right  left
black    4     5         6       7
blue     12   13       14     15
red       8     9         10     11
white   0     1          2     3

------------------
(program exited with code: 0)

Press any key to continue . . .

The file frame.json, created in the working directory, contain the dataframe data translated into JSON format. If you open it's content should be:

{"up":{"white":0,"black":4,"red":8,"blue":12},"down":{"white":1,"black":5,"red":9,"blue":13},"right":{"white":2,"black":6,"red":10,"blue":14},"left":{"white":3,"black":7,"red":11,"blue":15}}

In the previous example the JSON data were in tabular form (since the file frame.json comes from a dataframe). Generally, the JSON files do not have a tabular structure. Thus, we will need to convert the structure dict file into tabular form. This process is called normalization. The library pandas provides a function, json_normalize(), which convert a dict or a list in a table.

Let's create a JSON file with content as shown below and save it in our working directory as books.json:

[{"writer": "Vivek Swami",
"nationality": "India",
"books": [
{"title": "Python with Vee", "price": 23.56},
{"title": "Python Fundamentals", "price": 50.70},
{"title": "The NumPy library", "price": 12.30}
]
},
{"writer": "Veevaeck Swami",
"nationality": "India",
"books": [
{"title": "Python is easy to learn", "price": 28.60},
{"title": "HTML5", "price": 31.35},
{"title": "Python for Dummies", "price": 28.00}
]
}]


Now let's try to read the books.json file using the read_json(). See the following program :

import pandas as pd
import numpy as np

print(pd.read_json('books.json'))

The output of the program is shown below:

                                                                books  ...          writer
0  [{'title': 'Python with Vee', 'price': 23.56},...  ...     Vivek Swami
1  [{'title': 'Python is easy to learn', 'price':...     ...  Veevaeck Swami

[2 rows x 3 columns]

------------------
(program exited with code: 0)

Press any key to continue . . .

As you can see in the output, as the file structure is no longer tabular, but more complex, the approach with the read_json() function is no longer valid. To get the data in tabular form from this structure we need to follow a different approach which is shown in our next program:

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize

file = open('books.json','r')
text = file.read()
text = pd.io.json.loads(text)

print(json_normalize(text,'books'))

We first load the contents of the JSON file and convert it into a string. In order to extract a table
that contains all the books we used the json_normalize() function with the string variable text as the first argument and the books key as the second argument. The output of the program is shown below:

     price                    title
0  23.56        Python with Vee
1  50.70        Python Fundamentals
2  12.30        The NumPy library
3  28.60       Python is easy to learn
4  31.35       HTML5
5  28.00       Python for Dummies

------------------
(program exited with code: 0)

Press any key to continue . . .

The function will read the contents of all the elements that have books as the key. All properties will be converted into nested column names while the corresponding values will fill the dataframe. For the indexes, the function assigns a sequence of increasing numbers.

We can add the values of other keys as an argument to the json_normalize() to retrieve more data from the file. See the following program :

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize

file = open('books.json','r')
text = file.read()
text = pd.io.json.loads(text)

print(json_normalize(text,'books',['nationality','writer']))

The output of the program is shown below which shows a dataframe from a starting tree structure:

    price                    title                         nationality          writer
0  23.56          Python with Vee              India            Vivek Swami
1  50.70          Python Fundamentals      India            Vivek Swami
2  12.30         The NumPy library          India            Vivek Swami
3  28.60         Python is easy to learn     India            Veevaeck Swami
4  31.35         HTML5                            India            Veevaeck Swami
5  28.00         Python for Dummies       India             Veevaeck Swami

------------------
(program exited with code: 0)

Press any key to continue . . .

See the following program :The output of the program is shown below:Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment