Friday, June 17, 2022

Categories of Data

Programmers divide data into three main categories: unstructured, structured, and semi-structured. In a data processing pipeline, the source data is typically unstructured; from this, you form structured or semi-structured datasets for further processing. Some pipelines, however, use structured data from the start. For example, an application processing geographical locations might receive structured data directly from GPS sensors. Let's explore the three main categories of data as well as time series data, a special type of data that can be structured or semi-structured. In this post we will focus on-

Unstructured Data

Unstructured data is data with no predefined organizational system, or schema. This is the most widespread form of data, with common examples including images, videos, audio, and natural language text. To illustrate, consider the following financial statement from a pharmaceutical company:

GoodComp shares soared as much as 8.2% on 2021-01-07 after the company announced positive early-stage trial results for its vaccine.

This text is considered unstructured data because the information found in it isn’t organized with a predefined schema. Instead, the information is randomly scattered within the statement. You could rewrite this statement in any number of ways while still conveying the same information. For example:

Following the January 7, 2021, release of positive results from its vaccine trial, which is still in its early stages, shares in GoodComp rose by 8.2%.

Despite its lack of structure, unstructured data may contain important information, which you can extract and convert to structured or semi-structured data through appropriate transformation and analysis steps.

For example, image recognition tools first convert the collection of pixels within an image into a dataset of a predefined format and then analyze this data to identify content in the image. Similarly, the  following section will show a few ways in which the data extracted from our financial statement could be structured.


Share:

0 comments:

Post a Comment