Wednesday, May 3, 2023

Converting Unstructured Data to Structured Form

As a data scientist, you not only need to fetch the data but also analyze it. Storing the data in a structured form simplifies this task. In this section, we will learn how to convert the data fetched from MongoDB into a structured format.

Storing into a Dataframe

The find function returns a dictionary from a MongoDB collection. You can directly insert it into a dataframe. First, let’s fetch 100 MongoDB documents and then we will store these documents into a dataframe:

import pandas as pd

samples=table.find().sort("_id",pymongo.DESCENDING)[:100]

df=pd.DataFrame(samples)

df.head()


The readability of this dataframe is far better than that of the default format returned by the function.

Writing to a File


Pandas dataframes can directly be exported into CSV, Excel or SQL. Let us try to store this data to a CSV file:

df.to_csv('StructuredData.csv',index=False)

Similarly, you can use the to_sql function to export the data into a SQL database.
Share:

0 comments:

Post a Comment