In the previous post we used datasets stored in the CSV format. In this post we'll work with data stored in JSON format. First we'll download a data set representing all the earthquakes that have occurred in the world during the previous month. Then we’ll make a map showing the location of these earthquakes and how significant each one was. Because the data is stored in the JSON format, we’ll work with it using the json module. Using Plotly’s beginner-friendly mapping tool for location-based data, we’ll create visualizations that clearly show the global distribution of earthquakes.
Download the file eq_1_day_m1.json from at https://earthquake.usgs.gov/earthquakes/feed/. and save it to the folder where you’re storing the project files. This file includes data for all earthquakes with a
magnitude M1 or greater that took place in the last 24 hours.
When we open eq_1_day_m1.json, we’ll see that it’s very dense and hard to read:
{"type":"FeatureCollection","metadata":{"generated":1550361461000,...
{"type":"Feature","properties":{"mag":1.2,"place":"11km NNE of Nor...
{"type":"Feature","properties":{"mag":4.3,"place":"69km NNW of Ayn...
{"type":"Feature","properties":{"mag":3.6,"place":"126km SSE of Co...
{"type":"Feature","properties":{"mag":2.1,"place":"21km NNW of Teh...
{"type":"Feature","properties":{"mag":4,"place":"57km SSW of Kakto...
--snip--
This file is formatted more for machines than it is for humans. But we can see that the file contains some dictionaries, as well as information that we’re interested in, such as earthquake magnitudes and
locations. The json module provides a variety of tools for exploring and working with JSON data. Some of these tools will help us reformat the file so we can look at the raw data more easily before we begin to work with it programmatically.
In the following program we'll load the data and display it in a format that’s easier to read. This is a long data file, so instead of printing it, we’ll rewrite the data to a new file. Then we can open that file and scroll back and forth easily through the data. See the code below:
import json
# Explore the structure of the data.
filename = 'data/eq_data_1_day_m1.json'
with open(filename) as f:
all_eq_data = json.load(f)
readable_file = 'data/readable_eq_data.json'
with open(readable_file, 'w') as f:
json.dump(all_eq_data, f, indent=4)
We first import the json module to load the data properly from the file, and then store the entire set of data in all_eq_data. The json.load() function converts the data into a dictionary. Next we create a file to write this same data into a more readable format. The json.dump() function takes a JSON data object and a file object, and writes the data to that file.
The indent=4 argument tells dump() to format the data using indentation that matches the data’s
structure. When you look in your data directory and open the file readable_eq_data .json, here’s the what you’ll see:
{
"type": "FeatureCollection",
"metadata": {
"generated": 1550361461000,
"url": "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_day.geojson",
"title": "USGS Magnitude 1.0+ Earthquakes, Past Day",
"status": 200,
"api": "1.7.0",
"count": 158
},
"features": [
{
"type": "Feature",
---snip---
"bbox": [
-176.7088,
-30.7399,
-1.16,
164.5151,
69.5346,
249.4
]
}
The first part of the file includes a section with the key "metadata". This tells us when the data file was generated and where we can find the data online. It also gives us a human-readable title and the number of earthquakes included in this file. In this 24-hour period, 158 earthquakes were recorded. This geoJSON file has a structure that’s helpful for location-based data. The information is stored in a list associated with the key "features".
Because this file contains earthquake data, the data is in list form where every item in the list corresponds to a single earthquake. This structure might look confusing, but it’s quite powerful. It allows geologists to store as much information as they need to in a dictionary about each earthquake,
and then stuff all those dictionaries into one big list.
Here is the dictionary representing a single earthquake:
"features": [
{
"type": "Feature",
"properties": {
"mag": 0.96,
"place": "8km NE of Aguanga, CA",
"time": 1550360775470,
"updated": 1550360993593,
"tz": -480,
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/ci37532978",
"detail": "https://earthquake.usgs.gov/earthquakes/feed/v1.0/detail/ci37532978.geojson",
"felt": null,
"cdi": null,
"mmi": null,
"alert": null,
"status": "automatic",
"tsunami": 0,
"sig": 14,
"net": "ci",
"code": "37532978",
"ids": ",ci37532978,",
"sources": ",ci,",
"types": ",geoserve,nearby-cities,origin,phase-data,",
"nst": 32,
"dmin": 0.02648,
"rms": 0.15,
"gap": 37,
"magType": "ml",
"type": "earthquake",
"title": "M 1.0 - 8km NE of Aguanga, CA"
},
"geometry": {
"type": "Point",
"coordinates": [
-116.7941667,
33.4863333,
3.22
]
},
"id": "ci37532978"
},
The key "properties" contains a lot of information about each earthquake. We’re mainly interested in the magnitude of each quake, which is associated with the key "mag". We’re also interested in the title of each earthquake, which provides a nice summary of its magnitude and location.
The key "geometry" helps us understand where the earthquake occurred. We’ll need this information to map each event. We can find the longitude and the latitude for each earthquake in a list associated
with the key "coordinates".
This file contains way more nesting than we’d use in the code we write, so if it looks confusing, but Python will handle most of the complexity. We’ll only be working with one or two nesting levels at a time. We’ll start by pulling out a dictionary for each earthquake that was recorded in the 24-hour time period.
The following program makes a list that contains all the information about every earthquake that occurred:
import json
# Explore the structure of the data.
filename = 'data/eq_data_1_day_m1.json'
with open(filename) as f:
all_eq_data = json.load(f)
all_eq_dicts = all_eq_data['features']
print(len(all_eq_dicts))
We take the data associated with the key 'features' and store it in all_eq_dicts. We know this file contains records about 158 earthquakes, and the output shown below verifies that we’ve captured all of the earthquakes in the file:
158
------------------
(program exited with code: 0)
Press any key to continue . . .
Using the list containing data about each earthquake, we can loop through that list and extract any information we want.The next program pulls the magnitudes from each earthquake. Add the code shown below to the previous program:
mags = []
for eq_dict in all_eq_dicts:
mag = eq_dict['properties']['mag']
mags.append(mag)
print(mags[:10])
We make an empty list to store the magnitudes, and then loop through the dictionary all_eq_dicts. Inside this loop, each earthquake is represented by the dictionary eq_dict. Each earthquake’s magnitude is stored in the 'properties' section of this dictionary under the key 'mag'. We store each magnitude in the variable mag, and then append it to the list mags.
We print the first 10 magnitudes, so we can see whether we’re getting the correct data. The output is shown below:
[0.96, 1.2, 4.3, 3.6, 2.1, 4, 1.06, 2.3, 4.9, 1.8]
------------------
(program exited with code: 0)
Press any key to continue . . .
Now we’ll pull the location data for each earthquake, and then we can make a map of the earthquakes. The location data is stored under the key "geometry". Inside the geometry dictionary is a "coordinates" key, and the first two values in this list are the longitude and latitude. The following program pulls the location data:
mags, lons, lats = [], [], []
for eq_dict in all_eq_dicts:
mag = eq_dict['properties']['mag']
lon = eq_dict['geometry']['coordinates'][0]
lat = eq_dict['geometry']['coordinates'][1]
mags.append(mag)
lons.append(lon)
lats.append(lat)
print(mags[:10])
print(lons[:5])
print(lats[:5])
We make empty lists for the longitudes and latitudes. The code eq_dict ['geometry'] accesses the dictionary representing the geometry element of the earthquake. The second key, 'coordinates', pulls the list of values associated with 'coordinates'. Finally, the 0 index asks for the first value in the list of coordinates, which corresponds to an earthquake’s longitude.
We print the first five longitudes and latitudes, the following output shows that we’re pulling the correct data:
[0.96, 1.2, 4.3, 3.6, 2.1, 4, 1.06, 2.3, 4.9, 1.8]
[-116.7941667, -148.9865, -74.2343, -161.6801, -118.5316667]
[33.4863333, 64.6673, -12.1025, 54.2232, 35.3098333]
------------------
(program exited with code: 0)
Press any key to continue . . .
Here I am ending this post. In the next post we'll use this data and move on to mapping each earthquake.
Download the file eq_1_day_m1.json from at https://earthquake.usgs.gov/earthquakes/feed/. and save it to the folder where you’re storing the project files. This file includes data for all earthquakes with a
magnitude M1 or greater that took place in the last 24 hours.
When we open eq_1_day_m1.json, we’ll see that it’s very dense and hard to read:
{"type":"FeatureCollection","metadata":{"generated":1550361461000,...
{"type":"Feature","properties":{"mag":1.2,"place":"11km NNE of Nor...
{"type":"Feature","properties":{"mag":4.3,"place":"69km NNW of Ayn...
{"type":"Feature","properties":{"mag":3.6,"place":"126km SSE of Co...
{"type":"Feature","properties":{"mag":2.1,"place":"21km NNW of Teh...
{"type":"Feature","properties":{"mag":4,"place":"57km SSW of Kakto...
--snip--
This file is formatted more for machines than it is for humans. But we can see that the file contains some dictionaries, as well as information that we’re interested in, such as earthquake magnitudes and
locations. The json module provides a variety of tools for exploring and working with JSON data. Some of these tools will help us reformat the file so we can look at the raw data more easily before we begin to work with it programmatically.
In the following program we'll load the data and display it in a format that’s easier to read. This is a long data file, so instead of printing it, we’ll rewrite the data to a new file. Then we can open that file and scroll back and forth easily through the data. See the code below:
import json
# Explore the structure of the data.
filename = 'data/eq_data_1_day_m1.json'
with open(filename) as f:
all_eq_data = json.load(f)
readable_file = 'data/readable_eq_data.json'
with open(readable_file, 'w') as f:
json.dump(all_eq_data, f, indent=4)
We first import the json module to load the data properly from the file, and then store the entire set of data in all_eq_data. The json.load() function converts the data into a dictionary. Next we create a file to write this same data into a more readable format. The json.dump() function takes a JSON data object and a file object, and writes the data to that file.
The indent=4 argument tells dump() to format the data using indentation that matches the data’s
structure. When you look in your data directory and open the file readable_eq_data .json, here’s the what you’ll see:
{
"type": "FeatureCollection",
"metadata": {
"generated": 1550361461000,
"url": "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_day.geojson",
"title": "USGS Magnitude 1.0+ Earthquakes, Past Day",
"status": 200,
"api": "1.7.0",
"count": 158
},
"features": [
{
"type": "Feature",
---snip---
"bbox": [
-176.7088,
-30.7399,
-1.16,
164.5151,
69.5346,
249.4
]
}
The first part of the file includes a section with the key "metadata". This tells us when the data file was generated and where we can find the data online. It also gives us a human-readable title and the number of earthquakes included in this file. In this 24-hour period, 158 earthquakes were recorded. This geoJSON file has a structure that’s helpful for location-based data. The information is stored in a list associated with the key "features".
Because this file contains earthquake data, the data is in list form where every item in the list corresponds to a single earthquake. This structure might look confusing, but it’s quite powerful. It allows geologists to store as much information as they need to in a dictionary about each earthquake,
and then stuff all those dictionaries into one big list.
Here is the dictionary representing a single earthquake:
"features": [
{
"type": "Feature",
"properties": {
"mag": 0.96,
"place": "8km NE of Aguanga, CA",
"time": 1550360775470,
"updated": 1550360993593,
"tz": -480,
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/ci37532978",
"detail": "https://earthquake.usgs.gov/earthquakes/feed/v1.0/detail/ci37532978.geojson",
"felt": null,
"cdi": null,
"mmi": null,
"alert": null,
"status": "automatic",
"tsunami": 0,
"sig": 14,
"net": "ci",
"code": "37532978",
"ids": ",ci37532978,",
"sources": ",ci,",
"types": ",geoserve,nearby-cities,origin,phase-data,",
"nst": 32,
"dmin": 0.02648,
"rms": 0.15,
"gap": 37,
"magType": "ml",
"type": "earthquake",
"title": "M 1.0 - 8km NE of Aguanga, CA"
},
"geometry": {
"type": "Point",
"coordinates": [
-116.7941667,
33.4863333,
3.22
]
},
"id": "ci37532978"
},
The key "properties" contains a lot of information about each earthquake. We’re mainly interested in the magnitude of each quake, which is associated with the key "mag". We’re also interested in the title of each earthquake, which provides a nice summary of its magnitude and location.
The key "geometry" helps us understand where the earthquake occurred. We’ll need this information to map each event. We can find the longitude and the latitude for each earthquake in a list associated
with the key "coordinates".
This file contains way more nesting than we’d use in the code we write, so if it looks confusing, but Python will handle most of the complexity. We’ll only be working with one or two nesting levels at a time. We’ll start by pulling out a dictionary for each earthquake that was recorded in the 24-hour time period.
The following program makes a list that contains all the information about every earthquake that occurred:
import json
# Explore the structure of the data.
filename = 'data/eq_data_1_day_m1.json'
with open(filename) as f:
all_eq_data = json.load(f)
all_eq_dicts = all_eq_data['features']
print(len(all_eq_dicts))
We take the data associated with the key 'features' and store it in all_eq_dicts. We know this file contains records about 158 earthquakes, and the output shown below verifies that we’ve captured all of the earthquakes in the file:
158
------------------
(program exited with code: 0)
Press any key to continue . . .
Using the list containing data about each earthquake, we can loop through that list and extract any information we want.The next program pulls the magnitudes from each earthquake. Add the code shown below to the previous program:
mags = []
for eq_dict in all_eq_dicts:
mag = eq_dict['properties']['mag']
mags.append(mag)
print(mags[:10])
We make an empty list to store the magnitudes, and then loop through the dictionary all_eq_dicts. Inside this loop, each earthquake is represented by the dictionary eq_dict. Each earthquake’s magnitude is stored in the 'properties' section of this dictionary under the key 'mag'. We store each magnitude in the variable mag, and then append it to the list mags.
We print the first 10 magnitudes, so we can see whether we’re getting the correct data. The output is shown below:
[0.96, 1.2, 4.3, 3.6, 2.1, 4, 1.06, 2.3, 4.9, 1.8]
------------------
(program exited with code: 0)
Press any key to continue . . .
Now we’ll pull the location data for each earthquake, and then we can make a map of the earthquakes. The location data is stored under the key "geometry". Inside the geometry dictionary is a "coordinates" key, and the first two values in this list are the longitude and latitude. The following program pulls the location data:
mags, lons, lats = [], [], []
for eq_dict in all_eq_dicts:
mag = eq_dict['properties']['mag']
lon = eq_dict['geometry']['coordinates'][0]
lat = eq_dict['geometry']['coordinates'][1]
mags.append(mag)
lons.append(lon)
lats.append(lat)
print(mags[:10])
print(lons[:5])
print(lats[:5])
We make empty lists for the longitudes and latitudes. The code eq_dict ['geometry'] accesses the dictionary representing the geometry element of the earthquake. The second key, 'coordinates', pulls the list of values associated with 'coordinates'. Finally, the 0 index asks for the first value in the list of coordinates, which corresponds to an earthquake’s longitude.
We print the first five longitudes and latitudes, the following output shows that we’re pulling the correct data:
[0.96, 1.2, 4.3, 3.6, 2.1, 4, 1.06, 2.3, 4.9, 1.8]
[-116.7941667, -148.9865, -74.2343, -161.6801, -118.5316667]
[33.4863333, 64.6673, -12.1025, 54.2232, 35.3098333]
------------------
(program exited with code: 0)
Press any key to continue . . .
Here I am ending this post. In the next post we'll use this data and move on to mapping each earthquake.