Monday, January 31, 2022

Combining biotechnology and machine learning

In recent years, scientific advancements in the field, boosted by applications of machine learning and various predictive technologies, have led to many major accomplishments, such as the discovery of new and novel treatments, faster and more accurate diagnostic tests, greener manufacturing methods,...
Share:

Friday, January 28, 2022

Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:In [51]: arr = np.array([[1., 2., 3.], [4., 5., 6.]])In [52]: arrOut[52]:array([[...
Share:

Thursday, January 27, 2022

The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.To...
Share:

Wednesday, January 26, 2022

Removing Items from a NumPy Array

To delete an item from an array, you may use the delete() method. You need to pass the existing array and the index of the item to be deleted to the delete() method. The following script deletes an item at index 1 (second item) from the my_array array.my_array = np.array(["Red", "Green", "Orange"])print(my_array)print("After deletion")updated_array = np.delete(my_array, 1)print(updated_array)The output...
Share:

Tuesday, January 25, 2022

Adding Items in a NumPy Array

To add the items into a NumPy array, you can use the append() method from the NumPy module. First, you need to pass the original array and the item that you want to append to the array to the append() method. The append() method returns a new array that contains newly added items appended to the end of the original array.The following script adds a text item “Yellow” to an existing array with three...
Share:

Sunday, January 23, 2022

Printing NumPy Arrays

Depending on the dimensions, there are various ways to display the NumPy arrays. The simplest way to print a NumPy array is to pass the array to the print method, as you have already seen in the previous posts. An example is given below: my_array = np.array([10,12,14,16,20,25])print(my_array)Output:[10 12 14 16 20 25]You can also use loops to display items in a NumPy array. It is a good idea...
Share:

Saturday, January 22, 2022

Creating NumPy Arrays

Depending on the type of data you need inside your NumPy array, different methods can be used to create a NumPy array. 1. Using Array Method - To create a NumPy array, you can pass a list to the array() method of the NumPy module, as shown below:nums_list = [10,12,14,16,20]nums_array = np.array(nums_list)type(nums_array)Output:numpy.ndarrayYou can also create a multi-dimensional NumPy array....
Share:

Friday, January 21, 2022

NumPy Array 2

We can convert data types in the NumPy array to other data types via the astype() method. But first, you need to specify the target data type in the astype() method.For instance, the following script converts the array you created in the previous script (previous post) to the datetime data type. You can see that “M” is passed as a parameter value to the astype() function. “M” stands for the datetime...
Share:

Thursday, January 20, 2022

NumPy Arrays

The main data structure in the NumPy library is the NumPy array, which is an extremely fast and memory-efficient data structure. The NumPy array is much faster than the common Python list and provides vectorized matrix operations. Let us see the different data types that you can store in a NumPy array, the different ways to create the NumPy arrays, how you can access items in a NumPy array, and how...
Share:

Inferential statistics

Inferential statistics deals with inferring or deducing things from the sample data we have in order to make statements about the population as a whole. When we're looking to state our conclusions, we have to be mindful of whether we conducted an observational study or an experiment. With an observational study, the independent variable is not under the control of the researchers, and so we are observing...
Share:

Wednesday, January 19, 2022

Tuesday, January 18, 2022

Ice cream shop sales prediction

Say our favorite ice cream shop has asked us to help predict how many ice creams they can expect to sell on a given day. They are convinced that the temperature outside has a strong influence on their sales, so they have collected data on the number of ice creams sold at a given temperature. We agree...
Share:

Sunday, January 16, 2022

Anscombe's quartet

There is a very interesting dataset illustrating how careful we must be when only using summary statistics and correlation coefficients to describe our data. It also shows us that plotting is not optional. Anscombe's quartet is a collection of four different datasets that have identical summary statistics...
Share:

Multivariate statistics

With multivariate statistics, we seek to quantify relationships between variables and attempt to make predictions for future behavior. The covariance is a statistic for quantifying the relationship between variables by showing how one variable changes with respect to another (also referred to as their...
Share:

Friday, January 14, 2022

Common distributions

While there are many probability distributions, each with specific use cases, there are some that we will come across often. The Gaussian, or normal, looks like a bell curve and is parameterized by its mean (μ) and standard deviation (σ). The standard normal (Z) has a mean of 0 and a standard deviation...
Share:

Kernel density estimate

KDEs are similar to histograms, except rather than creating bins for the data, they draw a smoothed curve, which is an estimate of the distribution's probability density function (PDF). The PDF is for continuous variables and tells us how probability is distributed over the values. Higher values for...
Share:

Thursday, January 13, 2022

Summarizing data

We have seen many examples of descriptive statistics that we can use to summarize our data by its center and dispersion; in practice, looking at the 5-number summary and visualizing the distribution prove to be helpful first steps before diving into some of the other aforementioned metrics. The 5-number...
Share:

Wednesday, January 12, 2022

Interquartile range

As mentioned earlier, the median is the 50 percentile or the 2 quartile (Q ). Percentiles and quartiles are both quantiles—values that divide data into equal groups each containing the same percentage of the total data. Percentiles divide the data into 100 parts, while quartiles do so into four (25%,...
Share:

Tuesday, January 11, 2022

Standard deviation

We can use the standard deviation to see how far from the mean data points are on average. A small standard deviation means that values are close to the mean, while a large standard deviation means that values are dispersed more widely. This is tied to how we would imagine the distribution curve: the...
Share:

Monday, January 10, 2022

Range and variance

The range is the distance between the smallest value (minimum) and the largest value (maximum). The units of the range will be the same units as our data. Therefore, unless two distributions of data are in the same units and measuring the same thing, we can't compare their ranges and say one is more...
Share:

Thursday, January 6, 2022

Median and mode

Unlike the mean, the median is robust to outliers. Consider income in the US; the top 1% is much higher than the rest of the population, so this will skew the mean to be higher and distort the perception of the average person's income. However, the median will be more representative of the average income...
Share:

Wednesday, January 5, 2022

Descriptive statistics

Let us begin our discussion of descriptive statistics with univariate statistics; univariate simply means that these statistics are calculated from one (uni) variable. Everything in this section can be extended to the whole dataset, but the statistics will be calculated per variable we are recording...
Share:

Tuesday, January 4, 2022

Statistical foundations for data analysis

When we want to make observations about the data we are analyzing, we often, if not always, turn to statistics in some fashion. The data we have is referred to as the sample, which was observed from (and is a subset of) the population.Two broad categories of statistics are descriptive and inferential statistics. With descriptive statistics, as the name implies, we are looking to describe the sample....
Share: