Let’s plot the sine function over the interval from 0 to 4 . The main plotting function plot in matplotlibdoes not plot functions per se, it plots (x;y) data sets. As we shall see, we can instruct the function plot either to just draw points or dots at each data point, or we can instruct it to draw...
Tuesday, December 31, 2019
Monday, December 30, 2019
Introducing New Elements to a Plot
Charts are supposed to make your data visually appealing. To do this, it is important to ensure you use the correct chart to represent the data you need, because not all charts are suitable for any kind of data. The basic lines and markers will not be sufficient in making the charts appealing. You...
Sunday, December 29, 2019
How to Create a Chart
Before you begin, import pyplot to your programming environment and set the name as plt as shown below:import matplotlib.pyplot as pltplt.plot([1,2,3,4])
When you enter this code, you will have created a Line2D object. An object in this case is a linear representation of the trends you will plot within...
Saturday, December 28, 2019
Display Tools in Matplotlib
There are different display tools you can use to help you understand a plot the first time you see it. Legends and annotations serve this purpose. Legends identify different series of data within your plot. To access it, you call the matplotlib function legend () .
Annotations, on the other hand, help in identifying the important points in the plot. Annotations are called using the matplotlib function...
Friday, December 27, 2019
Scatter Plots
The role of a scatter plot is to identify the relationship between a couple of variables displayed in a coordinate system. Each data point is identified according to the variable values. From the scatter graph, you can tell whether there is a relationship between the variables or not.
When studying...
Thursday, December 26, 2019
Logarithmic Plots (Log Plots)
A logarithmic plot is essentially a basic plot, but it is set on a logarithmic scale. The difference between this and a normal linear scale is that the intervals are set in order of their magnitude. We have two different types of log plots: the log-log plot and semi-log plot.
The log-log plot has logarithm...
Wednesday, December 25, 2019
Basic Matplotlib Plots
A simple plot
Before you plot on matplotlib, you must have a plot () function within the matplotlib.pyplot sub package. This is to give you the basic plot with x-axis and y-axis variables. Alternatively, you can also use format parameters to represent the line style you are using. To determine the...
Tuesday, December 24, 2019
Fundamentals of Matplotlib
Lets look at some of the important concepts that you shall come across and use in Matplotlib, and their meanings or roles:
Axis – This represents a number line, and is used to determine the graph limits.
Axes – These represent what we construe as plots. A single figure can hold as many axes as ...
Monday, December 23, 2019
Data Visualization with Matplotlib
Data visualization is one of the first things you have to perform before you analyze data. The moment you have a glance at some data, your mind creates a rough idea of the way you want it to look when you map it on a graph.
Matplotlib might seem rather complex at first, but with basic coding knowledge,...
Saturday, December 21, 2019
How to Avoid Data Contamination
From empty data fields to data duplication and invalid addresses, there are so many ways you can end up with contaminated data. Having looked at possible causes and methods of cleaning data, it is important for an expert in your capacity to put measures in place to prevent data contamination in the...
Thursday, December 19, 2019
How to Clean Data
Having gone through the procedures described in the previous post and identified unclean data, your next challenge is how to clean it and use accurate data for analysis.
You have five possible alternatives for handling such a situation:
● Data imputation
If you are unable to find the necessary...
Identify Inaccurate Data
More often, you need to make a judgement call to determine whether the data you are accessing is accurate or not.
As you go through data, you must make a logical decision based on what you see. The following are some factors you should think about:
● Study the range
First, check the range of data....
Wednesday, December 18, 2019
Data Cleaning
Data cleaning is one of the most important procedures you should learn in data analysis. You will constantly be working with different sets of data and the accuracy or completeness of the same is never guaranteed. Because of this reason, you should learn how to handle such data and make sure the incompleteness...
Tuesday, December 17, 2019
Data Manipulation
By this point, you are aware of how to draw summaries from the data in your possession. Beyond this, you should learn how to slice, select, and extract data from your DataFrame. I mentioned earlier that DataFrames and Series share many similarities, especially in the methods used on them. However,...
Monday, December 16, 2019
Describing Variables
There is so much more information you can get from your DataFrames. A summary of the continuous variables can be arrived at using the following syntax:
squad_df.describe()
This will return information about continuous numbers. This information is useful when you are uncertain about the kind of plot diagram to use for visual representation. .describe() is a useful attribute because it returns the...
Friday, December 13, 2019
Data Imputation
Imputation is a cleaning process that allows you to maintain valuable data in your DataFrames, even if they have null values. This is important in situations where eliminating rows that contain null values might eliminate a lot of data from your dataset. Instead of losing all values, you can use the median or mean of the column in place of the null value.
Using the example above, and assuming a new...
Thursday, December 12, 2019
Computation with Missing Values
One thing you can be certain about as a data analyst is that you will not always come across complete sets of data. Since data is collected by different people, they might not use the same conventions you prefer. Therefore, you can always expect to bump into some challenges with missing values in datasets.
In Python, you will encounter None or np.nan in NumPy whenever you come across such types of...
Wednesday, December 11, 2019
Cleaning Data in a Column
We often come across datasets that have varying names for their columns, encounter typos, spaces, and a mixture of upper and lower-case words.
Cleaning up these columns will make it easier for you to choose the correct column for your computations.
In the example shown in the previous post, the syntax below will help us print the column names:
squad_df.columns
You will have the following output:
Index...
Tuesday, December 10, 2019
Dealing with Duplicates
The example we used in the previous post does not have any duplicate rows, thus we need to learn how to identify duplicates to ensure that we perform accurate computations. In the example in our previous post, we can append the squad Dataframe to itself and double it as shown:
temp_df = squad_df.append(squad_df)temp_df.shape
Our output will be as follows:
(40, 2)
The append() attribute copies the...
Monday, December 9, 2019
Extracting Information from Data
The .info() command will help you derive information from your data sets. The syntax is as follows:
squad_df.info()
You will have the following output:
<class ‘pandas.core.frame.DataFrame’>Index: 20 entries, Manchester United to SwanseaData Columns (total 2 columns):Position 20 non-null int64Designation 20 non-null objectdtypes: int64 (1), object (1)memory usage: 35.7+ KB
The .info() command...
Friday, December 6, 2019
Obtaining Data from SQL Databases
Before you begin, check to ensure you have a connection with the Python library in question. Once the connection is established, you can then push a query to Pandas. You need SQLite to establish a connection with your database, from where you will then create a DataFrame using the SELECT query as follows:
import...