Friday, July 22, 2022

Combinatorial Explosion

At the beginning of a data analysis task, we are tempted to visualize the pairwise interrelationships between all kinds of numeric features that are present in the given dataset. This is often a necessary step for exploratory data analysis and can reveal significant insights about the general pattern...
Share:

Wednesday, July 20, 2022

Iterating Over a pandas DataFrame

Usually we are given a large pandas DataFrame and asked to check some relationships between various fields in the columns, in a row-by-row fashion. It could be a logical operation or a sophisticated mathematical transformation on the raw data.Essentially, it is a simple case of iterating over the rows of the DataFrame and doing some processing at each iteration. However, it may not be that simple...
Share:

Monday, July 18, 2022

A Typical Data Science Pipeline

Data science is a vast and dynamic field. In the modern business and technology space, the discipline of data science has assumed the role of a truly transformative force. Every kind of industry and socio-economic field from healthcare to transportation and from online retail to on-demand music uses...
Share:

Friday, July 15, 2022

SQL tables from databases

Spreadsheets share many features with databases, but they are not quite the same. A table extracted from an SQL query from a database can somewhat resemble a spreadsheet. Not surprisingly, spreadsheets can be used to import large amounts of data into a database and a table can be exported from the database...
Share:

Wednesday, July 13, 2022

Spreadsheets

A spreadsheet is an application on a computer whose purpose is to organize, analyze, and store data in tabular form. Spreadsheets are nothing more than the digital evolution of paper worksheets. Accountants once collected all the data in large ledgers full of printouts, from which they extracted the...
Share:

Monday, July 11, 2022

Tabular form of data

We know that data must be processed in order to be structured in tabular form. The pandas library also has structured data within it that follow this particular form of ordering the individual data. Now the questions is, why this data structure?The tabular format has always been the most used method to arrange and organize data. Whether for historical reasons or for a natural predisposition...
Share:

Friday, July 8, 2022

Remaining steps in the data processing pipeline

AnalysisAnalysis is the key step in the data processing pipeline. Here you interpret the raw data, enabling you to draw conclusions that aren’t immediately apparent.Continuing with our sentiment analysis example, you might want to study the sentiment toward a company over a specified period in relation...
Share:

Wednesday, July 6, 2022

The Data Processing Pipeline

Let's take a conceptual look at the steps involved in data processing, also known as the data processing pipeline. The usual steps applied to the data are:. Acquisition. Cleansing. Transformation. Analysis. StorageYou'll notice that these steps aren’t always clear-cut. In some applications you’ll be able to combine multiple steps into one or omit some steps altogether. Now, let's explore these further.AcquisitionBefore...
Share:

Monday, July 4, 2022

Files

Files may contain structured, semistructured, and unstructured data. Python’s built-in open() function allows you to open a file so you can use its data within your script. However, depending on the format of the data (for example, CSV, JSON, or XML), you may need to import a corresponding library to be able to perform read, write, and/or append operations on it.Plaintext files don’t require a library...
Share:

Friday, July 1, 2022

Databases

Another common source of data is a relational database, a structure that provides a mechanism to efficiently store, access, and manipulate your structured data. You fetch from or send a portion of data to tables in the database using a Structured Query Language (SQL) request. For instance, the following request issued to an employees table in the database retrieves the list of only those programmers...
Share: