Monday, January 10, 2022

Range and variance

The range is the distance between the smallest value (minimum) and the largest value (maximum). The units of the range will be the same units as our data. Therefore, unless two distributions of data are in the same units and measuring the same thing, we can't compare their ranges and say one is more dispersed than the other:


Just from the definition of the range, we can see why it wouldn't always be the  best way to measure the spread of our data. It gives us upper and lower bounds on what we have in the data; however, if we have any outliers in our data, the range will be rendered useless.

Another problem with the range is that it doesn't tell us how the data is dispersed around its center; it really only tells us how dispersed the entire dataset is. This brings us to the variance. The variance describes how far apart observations are spread out from their average value (the mean). The population variance is denoted as σ (pronounced sigma-squared), and the sample variance is written as s . It is calculated as the average squared distance from the mean. Note that the distances must be squared so that distances below the mean don't cancel out those above the mean.

If we want the sample variance to be an unbiased estimator of the population variance, we divide by n - 1 instead of n to account for using the sample mean instead of the population mean; this is called Bessel's correction. Most statistical tools will give us the sample variance by default, since it is very rare that we would have data for the entire population:



The variance gives us a statistic with squared units. This means that if we started with data on income in dollars ($), then our variance would be in dollars squared ($*$  ). This isn't really useful when we're trying to see how this describes the data; we can use the magnitude (size) itself to see how spread out something is (large values = large spread), but beyond that, we need a measure of spread with units that are the same as our data. For this purpose, we use the standard deviation. This we will see in the next post


Share:

0 comments:

Post a Comment