At the beginning of a data analysis task, we are tempted to visualize the pairwise interrelationships between all kinds of numeric features that are present in the given dataset. This is often a necessary step for exploratory data analysis and can reveal significant insights about the general pattern...
Friday, July 22, 2022
Wednesday, July 20, 2022
Iterating Over a pandas DataFrame
Usually we are given a large pandas DataFrame and asked to check some relationships between various fields in the columns, in a row-by-row fashion. It could be a logical operation or a sophisticated mathematical transformation on the raw data.Essentially, it is a simple case of iterating over the rows of the DataFrame and doing some processing at each iteration. However, it may not be that simple...
Monday, July 18, 2022
A Typical Data Science Pipeline
Data science is a vast and dynamic field. In the modern business and technology space, the discipline of data science has assumed the role of a truly transformative force. Every kind of industry and socio-economic field from healthcare to transportation and from online retail to on-demand music uses...
Friday, July 15, 2022
SQL tables from databases
Spreadsheets share many features with databases, but they are not quite the same. A table extracted from an SQL query from a database can somewhat resemble a spreadsheet. Not surprisingly, spreadsheets can be used to import large amounts of data into a database and a table can be exported from the database...
Wednesday, July 13, 2022
Spreadsheets
A spreadsheet is an application on a computer whose purpose is to organize, analyze, and store data in tabular form. Spreadsheets are nothing more than the digital evolution of paper worksheets. Accountants once collected all the data in large ledgers full of printouts, from which they extracted the...
Monday, July 11, 2022
Tabular form of data
We know that data must be processed in order to be structured in tabular form. The pandas library also has structured data within it that follow this particular form of ordering the individual data. Now the questions is, why this data structure?The tabular format has always been the most used method to arrange and organize data. Whether for historical reasons or for a natural predisposition...
Friday, July 8, 2022
Remaining steps in the data processing pipeline
AnalysisAnalysis is the key step in the data processing pipeline. Here you interpret the raw data, enabling you to draw conclusions that aren’t immediately apparent.Continuing with our sentiment analysis example, you might want to study the sentiment toward a company over a specified period in relation...
Wednesday, July 6, 2022
The Data Processing Pipeline
Let's take a conceptual look at the steps involved in data processing, also known as the data processing pipeline. The usual steps applied to the data are:. Acquisition. Cleansing. Transformation. Analysis. StorageYou'll notice that these steps aren’t always clear-cut. In some applications you’ll be able to combine multiple steps into one or omit some steps altogether. Now, let's explore these further.AcquisitionBefore...
Monday, July 4, 2022
Files
Files may contain structured, semistructured, and unstructured data. Python’s built-in open() function allows you to open a file so you can use its data within your script. However, depending on the format of the data (for example, CSV, JSON, or XML), you may need to import a corresponding library to be able to perform read, write, and/or append operations on it.Plaintext files don’t require a library...
Friday, July 1, 2022
Databases
Another common source of data is a relational database, a structure that provides a mechanism to efficiently store, access, and manipulate your structured data. You fetch from or send a portion of data to tables in the database using a Structured Query Language (SQL) request. For instance, the following request issued to an employees table in the database retrieves the list of only those programmers...