Tuesday, December 31, 2019

Plotting the sine function

Let’s plot the sine function over the interval from 0 to 4 . The main plotting function plot in matplotlibdoes not plot functions per se, it plots (x;y) data sets. As we shall see, we can instruct the function plot either to just draw points or dots at each data point, or we can instruct it to draw...
Share:

Monday, December 30, 2019

Introducing New Elements to a Plot

Charts are supposed to make your data visually appealing. To do this, it is important to ensure you use the correct chart to represent the data you need, because not all charts are suitable for any kind of data. The basic lines and markers will not be sufficient in making the charts appealing. You...
Share:

Sunday, December 29, 2019

How to Create a Chart

Before you begin, import pyplot to your programming environment and set the name as plt as shown below:import matplotlib.pyplot as pltplt.plot([1,2,3,4]) When you enter this code, you will have created a Line2D object. An object in this case is a linear representation of the trends you will plot within...
Share:

Saturday, December 28, 2019

Display Tools in Matplotlib

There are different display tools you can use to help you understand a plot the first time you see it. Legends and annotations serve this purpose. Legends identify different series of data within your plot. To access it, you call the matplotlib function legend () . Annotations, on the other hand, help in identifying the important points in the plot. Annotations are called using the matplotlib function...
Share:

Friday, December 27, 2019

Scatter Plots

The role of a scatter plot is to identify the relationship between a couple of variables displayed in a coordinate system. Each data point is identified according to the variable values. From the scatter graph, you can tell whether there is a relationship between the variables or not. When studying...
Share:

Thursday, December 26, 2019

Logarithmic Plots (Log Plots)

A logarithmic plot is essentially a basic plot, but it is set on a logarithmic scale. The difference between this and a normal linear scale is that the intervals are set in order of their magnitude. We have two different types of log plots: the log-log plot and semi-log plot. The log-log plot has logarithm...
Share:

Wednesday, December 25, 2019

Basic Matplotlib Plots

A simple plot Before you plot on matplotlib, you must have a plot () function within the matplotlib.pyplot sub package. This is to give you the basic plot with x-axis and y-axis variables. Alternatively, you can also use format parameters to represent the line style you are using. To determine the...
Share:

Tuesday, December 24, 2019

Fundamentals of Matplotlib

Lets look at some of the important concepts that you shall come across and use in Matplotlib, and their meanings or roles: Axis – This represents a number line, and is used to determine the graph limits. Axes – These represent what we construe as plots. A single figure can hold as many axes as ...
Share:

Monday, December 23, 2019

Data Visualization with Matplotlib

Data visualization is one of the first things you have to perform before you analyze data. The moment you have a glance at some data, your mind creates a rough idea of the way you want it to look when you map it on a graph. Matplotlib might seem rather complex at first, but with basic coding knowledge,...
Share:

Saturday, December 21, 2019

How to Avoid Data Contamination

From empty data fields to data duplication and invalid addresses, there are so many ways you can end up with contaminated data. Having looked at possible causes and methods of cleaning data, it is important for an expert in your capacity to put measures in place to prevent data contamination in the...
Share:

Thursday, December 19, 2019

How to Clean Data

Having gone through the procedures described in the previous post and identified unclean data, your next challenge is how to clean it and use accurate data for analysis. You have five possible alternatives for handling such a situation: ● Data imputation If you are unable to find the necessary...
Share:

Identify Inaccurate Data

More often, you need to make a judgement call to determine whether the data you are accessing is accurate or not. As you go through data, you must make a logical decision based on what you see. The following are some factors you should think about: ● Study the range First, check the range of data....
Share:

Wednesday, December 18, 2019

Data Cleaning

Data cleaning is one of the most important procedures you should learn in data analysis. You will constantly be working with different sets of data and the accuracy or completeness of the same is never guaranteed. Because of this reason, you should learn how to handle such data and make sure the incompleteness...
Share:

Tuesday, December 17, 2019

Data Manipulation

By this point, you are aware of how to draw summaries from the data in your possession. Beyond this, you should learn how to slice, select, and extract data from your DataFrame. I mentioned earlier that DataFrames and Series share many similarities, especially in the methods used on them. However,...
Share:

Monday, December 16, 2019

Describing Variables

There is so much more information you can get from your DataFrames. A summary of the continuous variables can be arrived at using the following syntax: squad_df.describe() This will return information about continuous numbers. This information is useful when you are uncertain about the kind of plot diagram to use for visual representation. .describe() is a useful attribute because it returns the...
Share:

Friday, December 13, 2019

Data Imputation

Imputation is a cleaning process that allows you to maintain valuable data in your DataFrames, even if they have null values. This is important in situations where eliminating rows that contain null values might eliminate a lot of data from your dataset. Instead of losing all values, you can use the median or mean of the column in place of the null value. Using the example above, and assuming a new...
Share:

Thursday, December 12, 2019

Computation with Missing Values

One thing you can be certain about as a data analyst is that you will not always come across complete sets of data. Since data is collected by different people, they might not use the same conventions you prefer. Therefore, you can always expect to bump into some challenges with missing values in datasets. In Python, you will encounter None or np.nan in NumPy whenever you come across such types of...
Share:

Wednesday, December 11, 2019

Cleaning Data in a Column

We often come across datasets that have varying names for their columns, encounter typos, spaces, and a mixture of upper and lower-case words. Cleaning up these columns will make it easier for you to choose the correct column for your computations. In the example shown in the previous post, the syntax below will help us print the column names: squad_df.columns You will have the following output: Index...
Share:

Tuesday, December 10, 2019

Dealing with Duplicates

The example we used in the previous post does not have any duplicate rows, thus we need to learn how to identify duplicates to ensure that we perform accurate computations. In the example in our previous post, we can append the squad Dataframe to itself and double it as shown: temp_df = squad_df.append(squad_df)temp_df.shape Our output will be as follows: (40, 2) The append() attribute copies the...
Share:

Monday, December 9, 2019

Extracting Information from Data

The .info() command will help you derive information from your data sets. The syntax is as follows: squad_df.info() You will have the following output: <class ‘pandas.core.frame.DataFrame’>Index: 20 entries, Manchester United to SwanseaData Columns (total 2 columns):Position 20 non-null int64Designation 20 non-null objectdtypes: int64 (1), object (1)memory usage: 35.7+ KB The .info() command...
Share:

Friday, December 6, 2019

Obtaining Data from SQL Databases

Before you begin, check to ensure you have a connection with the Python library in question. Once the connection is established, you can then push a query to Pandas. You need SQLite to establish a connection with your database, from where you will then create a DataFrame using the SELECT query as follows: import...
Share: