Thursday, June 20, 2019

The Data Analysis Process

The Data Analysis Process consists of several steps in which the raw data are transformed and processed in order to produce data visualizations and make predictions based on the collected data. Thus data analysis is schematized as a process chain consisting of the following sequence of stages:

• Problem definition
• Data extraction
• Data preparation - Data cleaning
• Data preparation - Data transformation
• Data exploration and visualization
• Predictive modeling
• Model validation/test
• Deploy - Visualization and interpretation of results
• Deploy - Deployment of the solution

The object of study of data analysis is basically the data. The data then will be the key player in all processes of data analysis. The data constitute the raw material to be processed, and due to their processing and analysis, it is possible to extract a variety of information in order to increase the level of knowledge of the system under study, that is, one from which the data came.



The purpose of data analysis is to extract information that is not easily deducible but that, when understood, leads to the possibility of carrying out studies on the mechanisms of the systems that have produced them, thus allowing you to forecast possible responses of these systems and their evolution in time.

Starting from a simple methodical approach on data protection, data analysis has become a real discipline, leading to the development of real methodologies generating models. The model is in fact the translation into a mathematical form of a system placed under study. Once there is a mathematical or logical form that can describe system responses under different levels of precision, you can then make predictions about its development or response to certain inputs. Thus the aim of data analysis is not the model, but the quality of its predictive power.

The predictive power of a model depends not only on the quality of the modeling techniques but also on the ability to choose a good dataset upon which to build the entire data analysis process. So the search for data, their extraction, and their subsequent preparation, while representing preliminary activities of an analysis, also belong to data analysis itself, because of their importance in the success of the results.


In parallel to all stages of processing of data analysis, various methods of data visualization have been developed. In fact, to understand the data, both individually and in terms of the role they play in the entire dataset, there is no better system than to develop the techniques of graphic representation capable of transforming information, sometimes implicitly hidden, in figures, which help you more easily understand their meaning. Over the years lots of display modes have been developed for
different modes of data display: the charts.



At the end of the data analysis process, you will have a model and a set of graphical displays and then you will be able to predict the responses of the system under study; after that, you will move to the test phase. The model will be tested using another set of data for which you know the system response. These data are, however, not used to define the predictive model. Depending on the ability of the model to replicate real observed responses, you will have an error calculation and knowledge of the validity of the model and its operating limits.

These results can be compared with any other models to understand if the newly created one is more efficient than the existing ones. Once you have assessed that, you can move to the last phase of data analysis—deployment. This consists of implementing the results produced by the analysis, namely, implementing the decisions to be taken based on the predictions generated by the model and the associated risks.


Share:

0 comments:

Post a Comment