Lecture 2 Flashcards
Terminology Mapping: Number of observations, Data set size, Variables, Dependants variable, Coefficient.
Number of observations (Statistics) = Number of samples (Machine Learning)
Data set size (Statistics) = Sample size (Machine Learning)
Variables (Statistics) = Features (Machine Learning)
Dependent variable (Statistics) = Label (Machine Learning)
Coefficient (Statistics) = Weight (Machine Learning)
What is the purpose of splitting data into training and test sets?
Training set: Used to train the ML model.
Test set: Used to evaluate the model’s performance on unseen data.
Why split? To prevent overfitting, where a model memorizes training data but fails to generalize.
Why is shuffle=False important for time-series data splitting?
Time-series data relies on temporal order.
Setting shuffle=False preserves the sequence, ensuring future data isn’t used to predict past events.
Distinguish classification and regression.
Classification: Predicts a discrete class label (e.g., spam/not spam).
Regression: Predicts a continuous value (e.g., stock price).
Name models that work for both classification and regression.
- K-Nearest Neighbors
- Decision Trees
- Support Vector Machines (SVMs)
- Ensemble Methods
- ANNs (including deep neural networks)
Exceptions: Linear/Logistic Regression are task-specific.
List common supervised ML algorithms.
- Linear Regression
- Regularized Regression (Ridge, LASSO, Elastic Net)
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Machines (SVMs)
- Naive Bayes Classifiers
- CART (Classification and Regression Trees)
- ANN-Based models
Explain the No Free Lunch Theorem.
No single ML algorithm works best for all problems.
Performance depends on assumptions about the data.
model –> simplification–> assumptions –> fail.
What are the two steps to train a linear regression model?
- Define a loss function: Residual Sum of Squares (RSS)
- Minimize the loss: Adjust coefficients to fit the data.
Strengths and weaknesses of linear regression.
- Strengths: Simple, interpretable, no parameters.
- Weaknesses: Prone to overfitting, assumes linearity, sensitive to multicollinearity.
What is the goal of regularized regression?
Add penalties (L1/L2) to coefficients to reduce overfitting.
Types:
* Ridge (L2): Shrinks coefficients toward zero.
* LASSO (L1): Sets some coefficients to zero (feature selection).
* Elastic Net: Combines L1 and L2.
Ridge regression minimizes which objective function?
RSS + λ∑j=1pβj2
λ controls penalty strength.
Larger λ → simpler model.
How does LASSO differ from Ridge?
LASSO uses L1 penalty (∑|βj|), forcing some coefficients to exactly zero.
Enables automatic feature selection.
What is Elastic Net?
Combines L1 and L2 penalties.
Requires tuning two parameters: λ (strength) and α (L1/L2 mix).
Why is logistic regression a classification algorithm?
Logistic regression is a classification algorithm because it predicts probabilities for class labels and applies a threshold (e.g., 0.5) to assign a category to each observatiom, using logistic function.
Strengths and weaknesses of logistic regression.
- Strengths: easy to implement, Interpretable, works for linearly separable data.
- Weaknesses: Overfits when there are many features, cannot model complex relationships(non-linear or complex relation between x and y), and handle badly multicollinearity.
What makes KNN a lazy learner?
No explicit training phase; memorizes the entire dataset.
Predictions rely on distance metrics (e.g., Euclidean) to find nearest neighbors.
How does KNN handle classification vs. regression?
- Classification: Majority vote of neighbors.
- Regression: average (or weighted average) of the target values of its K nearest neighbors
What are KNN’s main weaknesses?
- Slow predictions with large datasets.
- Requires feature scaling.
- Performs poorly on sparse data.
Key parameters for KNN.
- Number of neighbors (k): Small values (3-5) often work.
- Distance metric: Default is Euclidean.