Prediction with the linear model Flashcards
week 4
The aim of prediction
The aim of prediction is to estimate a numeric variable.
how to call the variable we’re estimating
The variable we’re estimating is the target, outcome, or response variable
how to call the variables we use to do the estimation
The variables we use to do the estimation are the predictors, covariates or features.
training data
We typically have a training data set which contains both the target and predictors so that we can build a
model for prediction
testing data
We then have a testing data set which contains only the predictors. The goal is to estimate the resulting
target on this data set.
For testing purposes we often create an artificial test set or validation data set so that we can check how
well our model is performing by comparing the predicted target against the actual target
Methods of prediction
Linear regression models
Regression trees
Random Forests
(Simple!) Neural networks
binary indicator variables
To handle factors, we introduce binary indicator variables (taking the value zero or one), and use these to
code each level of a factor
Prediction intervals
Prediction intervals give the range of values associated with specified level of confidence.
E.g. 95% sure that predicted value is in range
(a, b)
difference between prediction and confidence intervals
A prediction interval is less certain than a confidence interval. A prediction interval predicts an individual number, whereas a confidence interval predicts the mean value. A prediction interval focuses on future events, whereas a confidence interval focuses on past or current events.
Bias
It’s a systematic error because it’s consistent and not random. So, even if you shoot many times, your shots will always miss in the same way.
Variance
Variance is a measure of the variability of prediction. Variance is how much your shots vary from each other. Let’s say sometimes you shoot and the ball goes too far to the left, sometimes it goes too far to the right, and sometimes it’s just right. That’s variance. It measures how much your shots differ from each other.
Decomposition of the MSE
Mean squared error can be decomposed into squared bias and variance components
Mean squared error
The overall accuracy of a prediction can be measured by its mean squared error (MSE).
Mean squared error can be decomposed into variance and squared bias components.
Accepting some bias will be advantageous if it results in a more substantial decrease in variance.
In practice we will want to use a prediction model that gets the right balance between prediction variance
and bias so as to minimize the prediction MSE.
where the Mean Squared Error (MSE) on the test data is much higher compared to the training MSE,
is a common phenomenon in machine learning. It’s typically attributed to the model’s inability to generalize well to unseen data.
Overfitting: During training, the model tries to capture all the patterns and nuances present in the training data, including noise. If the model becomes too complex or is trained for too long, it may start to memorize the training data instead of learning general patterns. This phenomenon is called overfitting. As a result, the model performs well on the training data (low training MSE) but poorly on new, unseen data (high test MSE).
Generalization: When the model encounters new data during testing, it may encounter patterns or variations that it hasn’t seen before. If the model hasn’t learned to generalize well from the training data, it may struggle to make accurate predictions on this new data. This leads to higher MSE on the test data compared to the training data.
Data Distribution: Sometimes, the distribution of the test data may be different from the distribution of the training data. If the model is trained on one type of data but tested on another type, its performance may suffer. This emphasizes the importance of having representative training and test datasets.
The Bias-Variance trade-off
The bias-variance tradeoff implies that as we increase the complexity of a model, its variance decreases, and its bias increases. Conversely, as we decrease the model’s complexity, its variance increases, but its bias decreases.