Week 2 Flashcards

1
Q

X1

A

feature, or input, or predictor;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Y1

A

response or target that we wish to predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

linear model

A

Y = f(X) + ϵ an important example of a parametric structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ϵ

A

measurement errors and other discrepancies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

f(X)

A

We can understand which components of
X = (X1;X2; : : : ;Xp) are important in explaining Y , and
which are irrelevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

expected value

A

average of y when x has a value [E(Y jX = x)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

regression function.

A
f(x) = E(Y jX = x)  vector X; e.g.
f(x) = f(x1; x2; x3) = E(Y jX1 = x1;X2 = x2;X3 = x3)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

g

A

mean-squared prediction error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ideal or optimal predictor of Y

A

E[(Y - g(X))2jX = x]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

irreducible error

A

ϵ = Y - f(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

estimate ^ f(x) of f(x)

A
E[(Y - ^ f(X))2jX = x] = [f(x) - ^ f(x)]2
| 
Reducible
\+ Var(ϵ) 
Irreducible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nearest neighbor averaging

A

Similar y when x is equal to x1, methods can be lousy when p is large.
Reason: the curse of dimensionality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

average squared prediction error

over Training data

A

MSETr = AveiϵTr[yi - ^ f(xi)]2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

average squared prediction error

over Test data

A

MSETe = AveiϵTr[yi - ^ f(xi)]2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bias

A

(^ f(x0))] = E[ ^ f(x0)] - f(x0).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Flexibility of ^ f increases,

A

its variance increases, and its bias decreases. So choosing the exibility based on average test error amounts to a bias-variance trade-off

17
Q

Classication:

A

Typically we measure the performance of ^ C(x) using the
misclassication error rate:
ErrTe = Avei2TeI[yi 6= ^ C(xi)]

18
Q

Linear regression

A

is a simple approach to supervised
learning. It assumes that the dependence of Y on

Y = β0 + β1X + ϵ;

19
Q

β0 and β1

A

two unknown constants that represent
the intercept and slope, also known as coecients or
parameters,

20
Q

residual

A

ei = yi - ^yi

21
Q

residual sum of squares

A

RSS = e^21+ e^22 + … + e^2n;

22
Q

least squares

A

Estimation of the parameters, chooses β0 and β1 to minimizethe RSS

23
Q

standard error

A

estimator reflects how it varies under repeated sampling. standard errors can be used to compute confidence intervals.

24
Q

confidence intervals

A

defined as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter.

25
Q

H0

A

There is no relationship between X and Y

versus the alternative hypothesis

26
Q

Ha

A

There is some relationship between X and Y

27
Q

t-statistic

A

To test the null hypothesis, t =^β 1 - 0 / SE( ^ β1)

28
Q

p-value.

A

it is easy to compute the probability of observing any value equal to jtj or larger.

29
Q

RSS

A

residual sum-of-squares

30
Q

TSS

A

total sum of squares.