Week 2 Flashcards by MARIA CAROLINA VESGA CENTENO

feature, or input, or predictor;

How well did you know this?

Not at all

Perfectly

response or target that we wish to predict.

How well did you know this?

Not at all

Perfectly

linear model

Y = f(X) + ϵ an important example of a parametric structure

How well did you know this?

Not at all

Perfectly

measurement errors and other discrepancies.

How well did you know this?

Not at all

Perfectly

f(X)

We can understand which components of
X = (X1;X2; : : : ;Xp) are important in explaining Y , and
which are irrelevant.

How well did you know this?

Not at all

Perfectly

expected value

average of y when x has a value [E(Y jX = x)]

How well did you know this?

Not at all

Perfectly

regression function.

f(x) = E(Y jX = x)  vector X; e.g.
f(x) = f(x1; x2; x3) = E(Y jX1 = x1;X2 = x2;X3 = x3)

How well did you know this?

Not at all

Perfectly

mean-squared prediction error

How well did you know this?

Not at all

Perfectly

ideal or optimal predictor of Y

E[(Y - g(X))2jX = x]

How well did you know this?

Not at all

Perfectly

irreducible error

ϵ = Y - f(x)

How well did you know this?

Not at all

Perfectly

estimate ^ f(x) of f(x)

E[(Y - ^ f(X))2jX = x] = [f(x) - ^ f(x)]2
| 
Reducible
\+ Var(ϵ) 
Irreducible

How well did you know this?

Not at all

Perfectly

Nearest neighbor averaging

Similar y when x is equal to x1, methods can be lousy when p is large.
Reason: the curse of dimensionality.

How well did you know this?

Not at all

Perfectly

average squared prediction error

over Training data

MSETr = AveiϵTr[yi - ^ f(xi)]2

How well did you know this?

Not at all

Perfectly

average squared prediction error

over Test data

MSETe = AveiϵTr[yi - ^ f(xi)]2

How well did you know this?

Not at all

Perfectly

Bias

(^ f(x0))] = E[ ^ f(x0)] - f(x0).

How well did you know this?

Not at all

Perfectly

Flexibility of ^ f increases,

its variance increases, and its bias decreases. So choosing the exibility based on average test error amounts to a bias-variance trade-off

Classication:

Typically we measure the performance of ^ C(x) using the
misclassication error rate:
ErrTe = Avei2TeI[yi 6= ^ C(xi)]

Linear regression

is a simple approach to supervised
learning. It assumes that the dependence of Y on

Y = β0 + β1X + ϵ;

β0 and β1

two unknown constants that represent
the intercept and slope, also known as coecients or
parameters,

residual

ei = yi - ^yi

residual sum of squares

RSS = e^21+ e^22 + … + e^2n;

least squares

Estimation of the parameters, chooses β0 and β1 to minimizethe RSS

standard error

estimator reflects how it varies under repeated sampling. standard errors can be used to compute confidence intervals.

confidence intervals

defined as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter.

There is no relationship between X and Y | versus the alternative hypothesis

There is some relationship between X and Y

t-statistic

To test the null hypothesis, t =^β 1 - 0 / SE( ^ β1)

p-value.

it is easy to compute the probability of observing any value equal to jtj or larger.

RSS

residual sum-of-squares

TSS

total sum of squares.