Topic 1-4 Flashcards

1
Q

What is statistical learning?

A

Statistical learning refers to a set of approaches for estimating the relationship between a response variable Y and predictors X. This relationship is modeled as Y=f(X)+ϵ, where f(X) is the unknown function to be estimated, and ϵ is the error term representing noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the names of
X and Y?

A

Y: Response variable (output or dependent variable)
X: Predictors (inputs, features, or independent variables)​.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pros/cons of the parametric and non-parametric methods:

A

Pros: Simple and interpretable; require fewer observations.
Cons: May fail if the assumed model is far from the true f
Non-parametric Methods:
Pros: Can model complex shapes of
f; more flexible.
Cons: Require more data; less interpretable​.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For the best prediction performance, which of our tested statistical learning methods should we choose?

A

Select the method that achieves the lowest test MSE by balancing bias and variance​.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the relationship between test performance and model flexibility

A

Test error decreases initially with flexibility (reducing bias) but increases beyond a certain point due to variance (overfitting)​.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the RSE (residual standard error) useful for? And what is the formula?

A

For quickly seeing how much a prediction is usually off (the average standard error, measured in units of Y. So if RSE = 400, the average measurement is 400 units off.

RSE = Root(1/(n-2) * RSS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 p values associated with multicolinearity?

A

The F-test and the T-test.
The F-test wether there is any non-0 parameter in the regression, if there is H0(all Bj is 0) is rejected
The T test give the amount of standard errors the parameter is from 0, if the parameter is a lot of std. error from 0, it likely is valuable to the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Difference between Qualitative and Quantitative predictors

A

Qualitative = Race etc.
Quantitative = income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

With different coding schemes for Qualitative variables, what happens to the regression lines when interaction effects are implemented into the scheme?

A

They are no longer parallel to one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are assumptions of a simple linear regression?

A

Linearity: The relationship between
X and Y is linear.
Independence: Observations are independent.
Homoscedasticity: Constant variance of errors (ϵ).
Normality: Errors (ϵ) are normally distributed​.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the five potential potential problems with a regression model (model diagnostics)?

A
  • Non-linearity
  • Non constant variance of error terms
  • outliers
  • High leverage points
  • Collinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is collinearity

A

2 predictors are very closely related to one another, so its difficult to determine how they seperately affect the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is there a need for model diagnostics?

A

To see if any of the assumptions of the model are violated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are RSS, RSE, and Training MSE:

A

RSS (Residual Sum of Squares): Measures the total deviation of predicted values from actual values.
RSE (Residual Standard Error): An estimate of the standard deviation of ϵ, derived from RSS.
Training MSE: RSS normalized by the number of observations (adjusted for degrees of freedom). It is a measure of training error​.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to asses coeficient accuracy?

A

Standard errors (SE) of coefficients and their p-values indicate accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you check if the model fitting is good?

A

RSS and RSE

17
Q

What are assumptions of a multiple linear regression?

A

Linearity: The relationship between
X and Y is linear.
Independence: Observations are independent.
Homoscedasticity: Constant variance of errors (ϵ).
Normality: Errors (ϵ) are normally distributed​.
No multicollinearity among predictors

Inclusion of all relevant predictors​

18
Q

What are solutions to the 5 problems a regression model might have?
- Non-linearity
- Non constant variance of error terms
- outliers
- High leverage points
- Collinearity

A
  • Use nonlinear transformations of predictors
  • Transform the response variable (y) to something nonlinear (log(Y))
  • Check studentized residuals and remove plot
  • Limit the values of X
  • Use VIF to see the severity of collinearity, drop one of the predictors
19
Q

What are the assumptions of LDA?

A

The predictor variable is normally distributed under each response class

If there is more than one predictor, the predictors follow a multivariate normal distribution

All predictors have equal variance

20
Q

Why is the Bayes classifier the gold standard to compare other classifiers against?

A

Because it assumes perfect normality in the predictors, in the real world we do not know the distribution of our predictors. So the bayes classifer is purely theoretical

21
Q

What are the assumptions of Naive Bayes?

A

Within each class, the predictors are independent. This means that for each class you get a Function that is built up of functions for each predictor. Fk(x) = Fk1(x1) + Fk2(x2) + Fkp(xp)
k=class

22
Q

What is the main difference between LDA and QDA?

A

LDA assumes the predictor has a Class-specific mean and a shared variance. QDA assumes that the predictor has a class specific mean and a class specific variance.

23
Q

What are the performance measures for all classifiers?

A

Accuracy = (TN +TP) / (N + P)
Error rate = (FN +FP) / (N + P)
Sensitivity = TP / P
Specificity = TN / N

24
Q

What are the 2 extra performance classifiers for LDA and QDA? (Explain them in your head as well)

A

They are called the:
-ROC (receiver operating characteristics)
-AUC (Area under curve)

Both based on a curve Y-axis True positive rate (sensitivity = TP/P) and the false positive rate (1-specificity) .

25
Q

How to find the optimal K-value with KNN?

A

Cross validation

26
Q

What are the three CV methods?

A
  • K-fold CV
  • LeaveOneOut CV
  • Validation Set approach
27
Q

What are the two Scenarios for Resampling Methods?

A

Model assessment: Estimating the test error rate when a test set is unavailable.
Model selection: Selecting the model with an appropriate level of flexibility​.

28
Q

List the pro’s and cons for each CV method

A

Validation Set:
Pros: Simple and quick.
Cons: High variance due to randomness in splitting data; training on fewer observations can lead to overestimation of the test error​.
LOOCV:
Pros: Uses nearly all the data for training; reduces bias.
Cons: Computationally intensive; requires refitting the model n times​.
K-Fold Cross Validation:
Pros: Balances bias and variance well; computationally efficient compared to LOOCV.
Cons: Still computationally expensive with very large datasets.