8.2 Predictive Analytics: Model Evaluation & Bias-Variance Trade-Off Flashcards
What is the primary goal of supervised learning methods?
To minimize error by making predictions as accurate as possible.
How do classification methods evaluate model accuracy?
By calculating the percentage of correct predictions.
How do regression models measure error?
By calculating the difference between predicted and actual values.
Why is evaluating unsupervised learning models difficult?
We don’t have labeled data, so we don’t know the correct number of groups or classifications.
What is the purpose of a train-test split in machine learning?
To evaluate how well a model generalizes to unseen data and prevent overly optimistic projections.
What percentage of data is typically used for training vs. testing?
80% for training, 20% for testing
What happens at deployment after the train-test split?
The model is trained on 100% of the data based on what was learned during model evaluation.
What is the formula for model error?
Model Error = Irreducible error + Bias + Variance
What is irreducible error in a model?
Noise in the data or insufficient data to fully represent patterns, which cannot be reduced by training models.
How is bias measured in a model?
By how far predictions are from actual values; high bias means the model oversimplifies patterns.
How is variance measured in a model?
By how much predictions fluctuate around the mean when trained on different data.
What is underfitting?
When a model hasn’t sufficiently learned patterns from the data, leading to high bias and low variance.
What is overfitting?
When a model fits too tightly to the training data, causing low bias and high variance.
What is the ideal bias-variance trade-off?
- Low bias and low variance
- a model complex enough to capture patterns but not overly sensitive to small data changes.
How do training error and variance relate?
- As training error decreases, model complexity increases, leading to higher variance.
How do error rates indicate underfitting and overfitting?
- Underfit model: High error in both training and testing data.
- Overfit model: Low error in training but high error in testing