Chapter 2 - Statistical Learning Flashcards

Question 1

Q

3 important facts about e from Y = f(x) + e

Answer

A

e is the random error term, it is assumed to be independent of X, it has mean 0.

Question 2

Q

define prediction

Answer

A

prediction is when sets of input variables are known, but the response cannot be easily obtained. since u(e) = 0, the formula becomes Yhat = fhat(X). fhat is a blackbox in prediction, because we don’t care about the form, as long as it predicts Y accurately. The difference between Yhat and Y comes from reducible (fhat is an imperfect representation of f) and irreducible error (Y contains e).

Question 3

Q

define inference

Answer

A

inference is when we try how the response is affected by changes in X. in this scenario, we care about estimating f, but not necessarily with predicting Y. f is not a black box, we need to know its exact form. Inference is: which predictors are associated with response? what is relationship between response and each predictor? is f linear?

Question 4

Q

explain the tradeoff between solving for prediction and solving for inference

Answer

A

inference = simpler models that are easier to interpret, predictions not as accurate. prediction = complex models that are harder to interpret, but make better predictions

Question 5

Q

what are the classes of statistical learning methods.

Answer

A

parametric and non-parametric.

Question 6

Q

facts about parametric statistical learning methods

Answer

A

parametric is where you make an assumption about f before you begin, and then you select a procedure (e.g., OLS) to fit the model to the training data. Parametric is easier because its easier to fit model coefficients once you have a model assumption (rather than finding a completely new f). Unfortunately, the assumption we make about the model form is usually wrong. we can make the model more flexible, but that could lead to over-fitting. parametric methods reduce the problem of estimating f to estimating a few coefficients

Question 7

Q

why do we want to estimate f in Y = f(x) + e?

Answer

A

prediction or inference

Question 8

Q

facts about nonparametric statistical learning methods

Answer

A

no assumption made about the form of f, allows a wide range of possible shapes for f. drawback of nonparametric methods is that they require a large number of observations to get an accurate approx of f.

Question 9

Q

unsupervised learning

Answer

A

data matrix, goal is to find meaningful relationships between variables (correlation), find low variable representations of the data (PCA), find meaningful groupings (clustering)

Question 10

Q

supervised learning

Answer

A

input variables + output variables. if output variable is quantitative this is a regression problem, if not its a classification problem. our goal is to learn f (the true function of the data) using the training set. Y = f(X) + epsilon

Question 11

Q

prediction error

Answer

A

goal is supervised learning is to minimize prediction error. for regression problems this is usually MSE = E(Y - fhat(X))^2. We have to compute the training MSE because we don’t know the true MSE.

Question 12

Q

Bias Variance Decomposition

Answer

A

MSE = E(Y - fhat(X))^2 = Var(fhat(X)) + [Bias(fhat(X))]^2 + Var(epsilon). Var(epsilon) is the irreducible error. Other parameters are the variance of the estimate of Y, the squared bias of the estimate. Both variance and squared bias are always positive. High variance implies more flexibility implies less bias (goal is to minimize both sources of error simultaneously). You only know bias/variance if you know the true curve. Variance is the measure of how much the estimate of fhat at x0 changes when we sample new training data

Question 13

Q

Classification problems

Answer

A

The output takes values in a discrete set. Y is not necessarily real values so we use a different notation. and we use the training error rate (i.e., mis-classification rate)

Question 14

Q

Bayes Classifier

Answer

A

The test error rate is minimized by a simple classifier that that assigns each observation to the most likely class, given its predictor value. yhat i = argmax j P(Y = j | X = x_i). The error rate of the bays classifier (i.e., the Bayes Error rate) is 1 - E(max_j P(Y = j | X)). the decision you would make if you had an infinite amount of data

Question 15

Q

KNN (comment on decision boundary shape wrt K)

Answer

A

imagine our blue and yellow classification problem (purple dashed line is bayes boundary, known distribution of (X,Y)). To assign a color to point x0, you look at its k-nearest neighbors and decide. the decision you come up with can be number within a certain radius or distance weighted. KNN has a decision boundary - the higher the K the smoother the decision boundary.

Chapter 2 - Statistical Learning Flashcards

(15 cards)