SRM Chapter 1 Flashcards
1
Q
Response/Output/Dependent Variable
A
- Y
- Variable we want to predict using other (explanatory) variables
2
Q
Explanatory/Input/Independent Variable
A
- Xj’s
- Variables we use to predict the dependent variable
- We want to study the relationship between these and the dependent variable
- Aka predictor, feature
3
Q
Count Variable
A
- Quantitative
- Variable that takes on non-negative integers (discrete)
4
Q
Continuous Variable
A
- Quantitative
- Takes on continuous values within an interval
5
Q
Categorical Variable
A
- Qualitative
- Takes on different categories
- Aka class, level
- Each category is given a number
6
Q
Nominal Variable
A
- Categorical variable that has no logical order
- The numbers don’t have any meaning, they just differentiate/label the categories
- e.g. seasons numbered 1-4 alphabetically (there is no correspondence with the numbers to the meanings of the actual season)
7
Q
Ordinal Variable
A
- Categorical variable that has a logical order
- Numbers used to label the categories have meaning/there is an order
- e.g. seasons numbered 1-4 in order of the calendar year (there is a meaning to the order)
8
Q
Notation: j
A
- Denotes the specific predictor (xj), if there is more than one
- Up to p predictors
9
Q
Notation: i
A
- For a predictor xj, denotes the specific observation of that predictor
- Up to n observations
10
Q
Supervised Learning
A
- We have a y (dependent variable)
- Focus on predicting y based on x’s (predictors)
11
Q
Unsupervised Learning
A
- No y (dependent variable)
- Focus is not on predicting but on finding and explaining patterns/relationships between the x’s and across observations
12
Q
Regression Problem
A
- Y is quantitative
13
Q
Classification Problem
A
- Y is qualitative (categorical)
14
Q
Parametric
A
- There is a functional form of F specified
- i.e. the relationship between the x’s (predictors) and y (the dependent variable) can be expressed as a function
15
Q
Non-Parametric
A
- There is no specified functional form of f
- i.e. there is no function that describes the relationship between the x’s and y
- F-hat is algorithmic rather than functional because there are no parameters to estimate
- Need a lot of observations
16
Q
Supervised Models (13)
A
- SLR (Single Linear Regression)
- MLR (Multiple Linear Regression)
- (GLM) Generalized Linear Model
- Ridge
- Lasso
- Weighted Least Squares
- Partial Least Squares
- KNN (K-Nearest Neighbours)
- Decision Trees
- Bagging
- Random Forest
- Boosting
- PCR (Principal Components Regression)
17
Q
Unsupervised Models (2)
A
- Cluster Analysis
- PCA (Principal Components Analysis)
18
Q
Parametric Models (8)
A
- SLR (Single Linear Regression)
- MLR (Multiple Linear Regression)
- (GLM) Generalized Linear Model
- Ridge
- Lasso
- Weighted Least Squares
- Partial Least Squares
- PCR (Principal Components Regression)
19
Q
Non-Parametric Models (5)
A
- KNN (K-Nearest Neighbours)
- Decision Trees
- Bagging
- Random Forest
- Boosting
20
Q
Training Data
A
- Data (observations) used to train/formulate f-hat
21
Q
f vs f-hat
A
For the relationship between y (dependent variable) and x’s (its predictors):
- f is the function itself (the relationship itself, that we don’t necessarily know)
- f-hat is our estimation of this function
22
Q
e
A
- Error term variable
- Expected value of 0