SRM Chapter 1 Flashcards

1
Q

Response/Output/Dependent Variable

A
  • Y
  • Variable we want to predict using other (explanatory) variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explanatory/Input/Independent Variable

A
  • Xj’s
  • Variables we use to predict the dependent variable
  • We want to study the relationship between these and the dependent variable
  • Aka predictor, feature
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Count Variable

A
  • Quantitative
  • Variable that takes on non-negative integers (discrete)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Continuous Variable

A
  • Quantitative
  • Takes on continuous values within an interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Categorical Variable

A
  • Qualitative
  • Takes on different categories
  • Aka class, level
  • Each category is given a number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nominal Variable

A
  • Categorical variable that has no logical order
  • The numbers don’t have any meaning, they just differentiate/label the categories
  • e.g. seasons numbered 1-4 alphabetically (there is no correspondence with the numbers to the meanings of the actual season)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ordinal Variable

A
  • Categorical variable that has a logical order
  • Numbers used to label the categories have meaning/there is an order
  • e.g. seasons numbered 1-4 in order of the calendar year (there is a meaning to the order)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Notation: j

A
  • Denotes the specific predictor (xj), if there is more than one
  • Up to p predictors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Notation: i

A
  • For a predictor xj, denotes the specific observation of that predictor
  • Up to n observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Supervised Learning

A
  • We have a y (dependent variable)
  • Focus on predicting y based on x’s (predictors)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unsupervised Learning

A
  • No y (dependent variable)
  • Focus is not on predicting but on finding and explaining patterns/relationships between the x’s and across observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression Problem

A
  • Y is quantitative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classification Problem

A
  • Y is qualitative (categorical)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Parametric

A
  • There is a functional form of F specified
  • i.e. the relationship between the x’s (predictors) and y (the dependent variable) can be expressed as a function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Non-Parametric

A
  • There is no specified functional form of f
  • i.e. there is no function that describes the relationship between the x’s and y
  • F-hat is algorithmic rather than functional because there are no parameters to estimate
  • Need a lot of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Supervised Models (13)

A
  1. SLR (Single Linear Regression)
  2. MLR (Multiple Linear Regression)
  3. (GLM) Generalized Linear Model
  4. Ridge
  5. Lasso
  6. Weighted Least Squares
  7. Partial Least Squares
  8. KNN (K-Nearest Neighbours)
  9. Decision Trees
  10. Bagging
  11. Random Forest
  12. Boosting
  13. PCR (Principal Components Regression)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Unsupervised Models (2)

A
  1. Cluster Analysis
  2. PCA (Principal Components Analysis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Parametric Models (8)

A
  1. SLR (Single Linear Regression)
  2. MLR (Multiple Linear Regression)
  3. (GLM) Generalized Linear Model
  4. Ridge
  5. Lasso
  6. Weighted Least Squares
  7. Partial Least Squares
  8. PCR (Principal Components Regression)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Non-Parametric Models (5)

A
  1. KNN (K-Nearest Neighbours)
  2. Decision Trees
  3. Bagging
  4. Random Forest
  5. Boosting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Training Data

A
  • Data (observations) used to train/formulate f-hat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

f vs f-hat

A

For the relationship between y (dependent variable) and x’s (its predictors):
- f is the function itself (the relationship itself, that we don’t necessarily know)
- f-hat is our estimation of this function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

e

A
  • Error term variable
  • Expected value of 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Two components of an observation from the response variable

A
  1. Systematic - expected value of the response variable (our function f)
  2. Random - error term
  • Aka signal plus noise
24
Q

Signal Plus Noise

A
  • Each observation of Y is made up of two parts:
  • Systematic (our function f)
  • Random (error term e)
25
Q

Bayes Classifier

A
  • Best decision function
  • When the Bayes Classifier is used, test error rate is minimized and this is the best decision function
26
Q

Decision Function

A
  • Function (f) for classification problems that decides which category Y (dependent variable) belongs to
27
Q

Objectives to supervised learning (2)

A
  1. Prediction - predicting values of y based on x’s
  2. Inference - understanding the impact of changes in x’s on the value of y
28
Q

Flexibility

A
  • Describes how closely f-hat can follow the data
  • Related to prediction (more flexible model means more accurate predictions)
  • Rougher fit = more flexible f-hat
  • Smoother fit = less flexible f-hat
29
Q

Interpretability

A
  • Ability to understand what the model is doing (components, parameters)
  • Related to inference (easier to explain the specifics in the relationship between x’s and y if we understand what the model is doing)
30
Q

Flexibility: Rougher Fit

A
  • More flexible f-hat
  • Often more parameters
31
Q

Flexibility: Smoother Fit

A
  • Less flexible f-hat
  • Often less parameters (simpler function)
32
Q

Flexibility vs Interpretability

A
  • Inverse relationship
  • As flexibility increases, we are able to make more accurate predictions but more parameters means the model might be harder to understand/interpret
33
Q

Flexibility vs Accuracy

A
  • More flexibility doesn’t always mean more accurate predictions in general
  • It means more accurate predictions on the training data only
34
Q

MSE

A
  • Mean squared error
  • Measures error in regression models
  • We want this number to be small (smaller MSE means more accurate)
35
Q

Training MSE vs Flexibility of f-hat

A
  • Inverse relationship
  • Training MSE decreases as flexibility of f-hat increases
36
Q

Overfitting

A
  • Happens when f-hat fits the training data too closely
  • Won’t carry over well to new data (test data) so predictions on the test data won’t be as accurate
  • Often happens when f-hat is too flexible, modelled too closely to the training data)
  • Too rough fit, too flexible
37
Q

Underfitting

A
  • f-hat is not robust (flexible) enough to capture relationships between the y and x’s
  • Too smooth fit, not flexible enough
38
Q

Training vs Test MSE

A
  • Training MSE is not always a good indicator of model accuracy because minimizing the training MSE only means that accuracy is maximized on the training data, not the testing data.
  • So, test MSE is a better indicator of model accuracy
39
Q

Training MSE

A
  • Mean squared error based on the training data
  • Goes down as flexibility increases
  • Not the best indicator of model accuracy because based only on the training data
40
Q

Test MSE

A
  • Mean squared error based on the test data, not based on past observations
  • This makes it a better indicator of model accuracy
  • U shaped as flexibility increases
  • Not flexible enough means that it’s too smooth of a fit (underfitting); the relationship between x’s and y is not captured enough
  • Too flexible means that it’s too rough of a fit (overfitting); f-hat is too closely fitted to the training data but on the test data accuracy declines
  • So the best test MSE is usually produced by a moderately flexible model
41
Q

Bias-Variance Tradeoff

A
  • We want both variance and bias to be low
  • Increasing flexibility increases variance though it decreases bias.
  • Decreasing flexibility decreases variance, but it increases bias.
42
Q

Irreducible error

A
  • Variance in y (dependent variable) that can’t be explained by f-hat
43
Q

Reducible error

A
  • Var(f-hat) + (Bias(f-hat))^2
  • The variance in y that can be reduced by choosing the best model
  • Want to balance: want low variance and low bias though there is a tradeoff between the two
44
Q

Variance

A
  • How f-hat changes when different training data is used
  • Want this to be low (little variability between sets of training data)
  • Bigger variance means f-hat changes more depending on the training data used
45
Q

Bias

A
  • How close f-hat is to the actual shape of f
  • Want this to be low (close)
46
Q

Flexibility-Variance-Squared Bias Relationship

A
  • F low - V low - B high
  • As flexibility decreases, variance also decreases but bias increases (underfitting)
  • F high - V high - B low
  • As flexibility increases, variance also increases but bias decreases
47
Q

Flexibility-Variance Relationship

A
  • As flexibility increases, so does variance
  • Because as flexibility increases, the model gets more specifically fit to that particular set of training data, so there is more variance in the shape of f-hat when using different training data.
48
Q

Flexibility-Bias Relationship

A
  • As flexibility increases, bias decreases
  • By increasing flexibility we are able to get f-hat closer to the actual shape of f, which means squared bias decreases.
  • Bias happens/grows when f-hat is not flexible enough/ too simple to catch the patterns and shape of f (underfitting).
49
Q

Test Error Rate

A
  • Measure for classification model error
  • Uses I (indicator function). 1 if correct, 0 if otherwise
50
Q

Bayes Error Rate

A
  • Using Bayes classifier in place of Y-hat in the test error rate indicator function
  • When this is used, test error rate is at a minimum and the Bayes Indicator is the best decision function.
51
Q

k-Nearest Neighbours Steps

A
  1. Find the location of the observation in the domain of X1,…,Xp. This is the centre.
  2. Identify the k nearest training observations to the centre.
  3. The most frequent category of the k training observations is the prediction y-hat.
52
Q

Distance used for k-Nearest Neighbours Method

A
  • Euclidean distance
53
Q

k-Nearest Neighbours: Size of k

A
  • k too large: observations are too far away from the centre of the neighbourhood so predictions are too general.
  • k too small: observations are unstable/volatile (dependent on a small few).
  • Want a middle-sized k because of the bias-variance tradeoff.
54
Q

k-Nearest Neighbours: k vs. Flexibility Relationship

A
  • k is inversely related to flexibility.
  • A small k means y-hat is very dependent on a small number of observations, so flexibility is high (very tailored to those few observations).
  • A large k means y-hat is very generalized, so flexibility is low.
55
Q

Smooth fit = ? flexibility

A

Less flexibility

56
Q

Rough fit = ? flexibility

A

More flexibility