Lecture 2 Flashcards

Question 1

Q

Terminology Mapping: Number of observations, Data set size, Variables, Dependants variable, Coefficient.

Answer

A

Number of observations (Statistics) = Number of samples (Machine Learning)
Data set size (Statistics) = Sample size (Machine Learning)
Variables (Statistics) = Features (Machine Learning)
Dependent variable (Statistics) = Label (Machine Learning)
Coefficient (Statistics) = Weight (Machine Learning)

Question 2

Q

Question 3

Q

What is the purpose of splitting data into training and test sets?

Answer

A

Training set: Used to train the ML model.
Test set: Used to evaluate the model’s performance on unseen data.
Why split? To prevent overfitting, where a model memorizes training data but fails to generalize.

Question 4

Q

Why is shuffle=False important for time-series data splitting?

Answer

A

Time-series data relies on temporal order.
Setting shuffle=False preserves the sequence, ensuring future data isn’t used to predict past events.

Question 5

Q

Distinguish classification and regression.

Answer

A

Classification: Predicts a discrete class label (e.g., spam/not spam).
Regression: Predicts a continuous value (e.g., stock price).

Question 6

Q

Name models that work for both classification and regression.

Answer

A

K-Nearest Neighbors
Decision Trees
Support Vector Machines (SVMs)
Ensemble Methods
ANNs (including deep neural networks)
Exceptions: Linear/Logistic Regression are task-specific.

Question 7

Q

List common supervised ML algorithms.

Answer

A

Linear Regression
Regularized Regression (Ridge, LASSO, Elastic Net)
Logistic Regression
K-Nearest Neighbors (KNN)
Support Vector Machines (SVMs)
Naive Bayes Classifiers
CART (Classification and Regression Trees)
ANN-Based models

Question 8

Q

Explain the No Free Lunch Theorem.

Answer

A

No single ML algorithm works best for all problems.
Performance depends on assumptions about the data.

model –> simplification–> assumptions –> fail.

Question 9

Q

What are the two steps to train a linear regression model?

Answer

A

Define a loss function: Residual Sum of Squares (RSS)
Minimize the loss: Adjust coefficients to fit the data.

Question 10

Q

Strengths and weaknesses of linear regression.

Answer

A

Strengths: Simple, interpretable, no parameters.
Weaknesses: Prone to overfitting, assumes linearity, sensitive to multicollinearity.

Question 11

Q

What is the goal of regularized regression?

Answer

A

Add penalties (L1/L2) to coefficients to reduce overfitting.
Types:
* Ridge (L2): Shrinks coefficients toward zero.
* LASSO (L1): Sets some coefficients to zero (feature selection).
* Elastic Net: Combines L1 and L2.

Question 12

Q

Ridge regression minimizes which objective function?

Answer

A

RSS + λ∑j=1pβj2
λ controls penalty strength.
Larger λ → simpler model.

Question 13

Q

How does LASSO differ from Ridge?

Answer

A

LASSO uses L1 penalty (∑|βj|), forcing some coefficients to exactly zero.
Enables automatic feature selection.

Question 14

Q

What is Elastic Net?

Answer

A

Combines L1 and L2 penalties.
Requires tuning two parameters: λ (strength) and α (L1/L2 mix).

Question 15

Q

Why is logistic regression a classification algorithm?

Answer

A

Logistic regression is a classification algorithm because it predicts probabilities for class labels and applies a threshold (e.g., 0.5) to assign a category to each observatiom, using logistic function.

Question 16

Q

Strengths and weaknesses of logistic regression.

Answer

Study These Flashcards

A

Strengths: easy to implement, Interpretable, works for linearly separable data.
Weaknesses: Overfits when there are many features, cannot model complex relationships(non-linear or complex relation between x and y), and handle badly multicollinearity.