Wednesday, July 10, 2019

Third BigData program - Looking for the Most Polluted City in the World on an Hourly Basis

In the previous two programs we used the Medicare datasets but there is another public database on BigQuery, OpenAQ, which contains air-quality measurements from 47 countries around the world. As this database is updated hourly we can use it find the top most polluted cities in the world on an hourly basis. The following program picks up the top three worst polluted cities in the world measured by air quality:

import pandas as pd
from google.cloud import bigquery

# sample query from:
QUERY = """
SELECT location, city, country, value, timestamp
FROM `bigquery-public-data.openaq.global_air_quality`
WHERE pollutant = "pm10" AND timestamp > "2017-04-01"
ORDER BY value DESC
LIMIT 1000
"""
client = bigquery.Client.from_service_account_json(
'MedicareProject2-1223283ef413.json')
query_job = client.query(QUERY)
df = query_job.to_dataframe()
print (df.head(3))

When we run this program we get the following list as the result:

   location             city                         country   value       timestamp
0 Dilovası            Kocaeli                    TR         5243.00  2019-06-25 12:00:00+00:00
1 Bukhiin urguu  Ulaanbaatar            MN       1428.00  2019-01-21 17:00:00+00:00
2 Chaiten Norte   Chaiten Norte          CL         999.83    2019-04-24 11:00:00+00:00

This program is to just have an idea how in real world we can use Python with BigData. Still there is a lot to explore in the field of BigData which we'll do in future posts. Here I'm ending today's post and will be back again with more on Machine Learning in Healthcare. So till we meet again keep practicing and learning Python as Python is easy to learn!
Share:

0 comments:

Post a Comment