Friday, July 8, 2022

Remaining steps in the data processing pipeline

Analysis

Analysis is the key step in the data processing pipeline. Here you interpret the raw data, enabling you to draw conclusions that aren’t immediately apparent.

Continuing with our sentiment analysis example, you might want to study the sentiment toward a company over a specified period in relation to that company’s stock price. Or you might compare stock market index figures, such as those on the S&P 500, with the sentiment expressed in a broad sampling of news articles for this same period. The following fragment illustrates what the dataset might look like, with S&P 500 data shown alongside the overall sentiment of that day’s news:

Date News_sentiment S&P_500

---------------------------------------

2021-04-16 0.281074 4185.47

2021-04-19 0.284052 4163.26

2021-04-20 0.262421 4134.94

Since both the sentiment figures and stock figures are expressed in numbers, you might plot two corresponding graphs on the same plot for visual analysis, as illustrated below-


Visual analysis is one of the most commonly used and efficient methods for interpreting data.

Storage

In most cases, you’ll need to store the results generated during the data analysis process to make them available for later use. Your storage options typically include files and databases. The latter is preferable if you anticipate frequent reuse of your data.

Share:

0 comments:

Post a Comment