Difference between the dotted and solid portions of the regression line ~ Python is easy to learn

When we make predictions using the solid portion of the line, we are using interpolation, meaning that we will be predicting ice cream sales for temperatures the regression was created on. On the other hand, if we try to predict how many ice creams will be sold at 45°C, it is called extrapolation (the dotted portion of the line), since we didn't have any temperatures this high when we ran the regression. Extrapolation can be very dangerous as many trends don't continue indefinitely. People may decide not to leave their houses because it is so hot. This means that instead of selling the predicted 39.54 ice creams, they would sell zero.

When working with time series, our terminology is a little different: we often look to forecast future values based on past values. Forecasting is a type of prediction for time series. Before we try to model the time series, however, we will often use a process called time series decomposition to split the time series into components, which can be combined in an additive or multiplicative fashion and may be used as parts of a model.

The trend component describes the behavior of the time series in the long term without accounting for seasonal or cyclical effects. Using the trend, we can make broad statements about the time series in the long run, such as the population of Earth is increasing or the value of a stock is stagnating. The seasonality component explains the systematic and calendar-related movements of a time series. For example, the number of ice cream trucks on the streets of New York City is high in the summer and drops to nothing in the winter; this pattern repeats every year, regardless of whether the actual amount each summer is the same. Lastly, the cyclical component accounts for anything else unexplained or irregular with the time series; this could be something such as a hurricane driving the number of ice cream trucks down in the short term because it isn't safe to be outside. This component is difficult to anticipate with a forecast due to its unexpected nature.

We can use Python to decompose the time series into trend, seasonality, and noise or residuals. The cyclical component is captured in the noise (random, unpredictable data); after we remove the trend and seasonality from the time series, what we are left with is the residual:

When building models to forecast time series, some common methods include exponential smoothing and ARIMA-family models. ARIMA stands for autoregressive (AR), integrated (I), moving average (MA). Autoregressive models take advantage of the fact that an observation at time t is correlated to a previous observation, for example, at time t - 1. Later, we will look at some techniques for determining whether a time series is autoregressive; note that not all time series are. The integrated component concerns the differenced data, or the change in the data from one time to another. For example, if we were concerned with a lag (distance between times) of 1, the differenced data would be the value at time t subtracted by the value at time t - 1. Lastly, the moving average component uses a sliding window to average the last x observations, where x is the length of the sliding window. If, for example, we have a 3-period moving average, by the time we have all of the data up to time 5, our moving average calculation only uses time periods 3, 4, and 5 to forecast time 6.

The moving average puts equal weight on each time period in the past involved in the calculation. In practice, this isn't always a realistic expectation of our data. Sometimes, all past values are important, but they vary in their influence on future data points. For these cases, we can use exponential smoothing, which allows us to put more weight on more recent values and less weight on values further away from what we are predicting.

Note that we aren't limited to predicting numbers; in fact, depending on the data, our predictions could be categorical in nature—things such as determining which flavor of ice cream will sell the most on a given day or whether an email is spam or not. This type of prediction will also be discussed in future. Our topic of discussion for the next post will be inferential statistics.

Python is easy to learn

Wednesday, January 19, 2022

Difference between the dotted and solid portions of the regression line

0 comments:

Post a Comment