As a simple rule of thumb: discrete data is a set of values that can be counted, whereas continuous data must be measured. Discrete data can “reasonably” fit in a drop-down list of values, but there is no exact value for making such a determination. One person might think that a list of 500 values is discrete, whereas another person might think it’s continuous.
For example, the list of provinces of Canada and the list of states of the United States are discrete data values, but is the same true for the number of countries in the world (roughly 200) or for the number of languages in the world (more than 7,000)?
On the other hand, values for temperature, humidity, and barometric pressure are considered continuous. Currency is also treated as continuous, even though there is a measurable difference between two consecutive values. The smallest unit of currency for U.S. currency is one penny, which is 1/100th of a dollar (accounting-based measurements use the “mil”, which is 1/1,000th of a dollar).
Continuous data types can have subtle differences. For example, someone who is 200 centimeters tall is twice as tall as someone who is 100 centimeters tall; the same is true for 100 kilograms versus 50 kilograms. However, temperature is different: 80 degrees Fahrenheit is not twice as hot as 40 degrees Fahrenheit.
Furthermore, keep in mind that the meaning of the word “continuous” in mathematics is not necessarily the same as continuous in machine learning. In the former, a continuous function (let’s say in the 2D Euclidean plane) can have an uncountably infinite number of values. On the other hand, a feature in a dataset that can have more values than can be “reasonably” displayed in a drop-down list is treated as though it’s a continuous variable.
For instance, values for stock prices are discrete: they must differ by at least a penny (or some other minimal unit of currency), which is to say, it’s meaningless to say that the stock price changes by one-millionth of a penny. However, since there are “so many” possible stock values, it’s treated as a continuous variable. The same comments apply to car mileage, ambient temperature, barometric pressure, and so forth.
0 comments:
Post a Comment