Module 6 Flashcards
Describe the Supervised learning problem
- Outcome measurement Y ( dependent var, response/target)
- Vector of P predictor measurements X ( input, regressor, covariates, independent var)
What are X and Y in regression/classification problems
Regression problem
- Y is quantitative ( price, blood pressure)
Classification problem
- Y takes value in a finite ordered set ( classes, true/false)
has training data - instances of the data
List objectives of supervised learning (AUA)
- Accurately predict unseen test cases
- Understand which inputs affect the outcome and how
- Assess the quality of our predictions and inferences
Describe unsupervised learning
- No outcome variables, just a set of predictors/features measured on a set of samples
- objective is fuzzy - find group of samples
- difficult to tell how well you’re doing
- useful for pre-processing in supervised learning
Describe Statistical Learning vs ML
ML is a subset of AI
SL is a subfield of stats
ML has a greater emphasis on large-scale applications and prediction accuracy
SL emphasizes models and their interpretability, precision, and uncertainty
Describe the regression function
- Is also defined for vector X. f (x) = f (x1, x2, x3) = E(Y |X1 = x1, X2 = x2, X3 = x3
- Is the ideal/optimal predictor of Y with regard to mean squared prediction error - minimizes error
- E is the irreducible error - error in prediction due to distribution of y values
- mean squared prediction error = reducable error + irreducible error
E[(Y − ˆf (X))2|X = x] = [f (x) − ˆf (x)]^2 Reducible
+ Var(e) Irreducible
Describe the nearest neighbor
- N(x)
- good for sample / p <= 4
- can be lousy when p is large due to curse of dimenionality - nearest neighbours far in high dimensions
Describe the linear model
f(x) = B0 + B1X1 + B2X2 + … BPXP
- Parametric Model
- specified in terms of p + 1 parameters
- almost never correct - good and interpretable appx to unknown true function
trade-offs of linear model (PGP)
- Prediciton accuracy vs interpretability
Linear models easy to interpret - Good fit vs over/under-fit
- Parismony vs Blakcbox
- prefer simple model with fewer variables
Describe assessing model accuracy
Compute average squared prediction error over TE (fresh test data) rather than TR (training data) to avoid bias towards overfit models.
- MSETe = Avei∈Te[yi − ˆf (xi)]2
Describe Bias Variance Trade-off
- As flexibility of f increases, so does variance and bias decreases
- choosing flexibility based on average test error amounts to bias-variance trade-off
Describe Classification Problem (BAU)
- Response variable Y is qualitative
- Goals are to:
1) Build a classifier that assigns a class label from C to a future unlabeled observation X
2) Assess uncertainty in each classification
3) Understand the roles of different predictors among X
Is there an ideal C(X)?
- Let pk(x) = Pr(Y = k|X = x), k = 1, 2, . . . , K. These are conditional class probabiliteies
The Bayes optimal classifier at x is
C(x) = j if pj (x) = max{p1(x), p2(x), . . . , pK (x)}
Classification details (MBS)
- Measure Performance through misclassification rate
ErrTe = Avei∈TeI[yi 6 = ˆC(xi)] - Bayes classifier has the smallest error
- SVM builds structured models for C(x)
Describe Tree based models
- for regression and classification
- involve stratifying or segmenting predictor space into a number of simple regions
- splitting decision methods are also known as decision tree methods