Tuesday, July 20, 2021

Understanding data analysis

In today's smart world, data analysis offers an effective decision-making process for business and government operations. Data analysis is the activity of inspecting, preprocessing, exploring, describing, and visualizing the given dataset. The main objective of the data analysis process is to discover the required information for decision-making. Data analysis offers multiple approaches, tools, and techniques, all of which can be applied to diverse domains such as business, social science, and fundamental science.

Let's look at some of the core fundamental data analysis libraries of the Python ecosystem:

NumPy: This is a short form of numerical Python. It is the most powerful scientific library available in Python for handling multidimensional arrays, matrices, and methods in order to compute mathematics efficiently.

SciPy: This is also a powerful scientific computing library for performing scientific, mathematical, and engineering operations.

Pandas: This is a data exploration and manipulation library that offers tabular data structures such as DataFrames and various methods for data analysis and manipulation.

Scikit-learn: This stands for "Scientific Toolkit for Machine learning". It is a machine learning library that offers a variety of supervised and unsupervised algorithms, such as regression, classification, dimensionality reduction, cluster analysis, and anomaly detection. 

Matplotlib: This is a core data visualization library and is the base library for all other visualization libraries in Python. It offers 2D and 3D plots, graphs, charts, and figures for data exploration. It runs on top of NumPy and SciPy.

Seaborn: This is based on Matplotlib and offers easy to draw, high-level, interactive, and more organized plots.

Plotly: Plotly is a data visualization library. It offers high quality and interactive graphs, such as scatter charts, line charts, bar charts, histograms, boxplots, heatmaps, and subplots. 

The standard process of data analysis

Data analysis refers to investigating the data, finding meaningful insights from it, and drawing conclusions. The main goal of this process is to collect, filter, clean, transform, explore, describe, visualize, and communicate the insights from this data to discover decision-making information. Generally, the data analysis process is comprised of the following phases:

1. Collecting Data: Collect and gather data from several sources.
2. Preprocessing Data: Filter, clean, and transform the data into the required format.
3. Analyzing and Finding Insights: Explore, describe, and visualize the data and find insights and               conclusions.
4. Insights Interpretations: Understand the insights and find the impact each variable has on the system.
5. Storytelling: Communicate your results in the form of a story so that a layman can understand them. 

In the next post, we will discuss the KDD process. 

Share:

0 comments:

Post a Comment