Week 2 Flashcards
X1
feature, or input, or predictor;
Y1
response or target that we wish to predict.
linear model
Y = f(X) + ϵ an important example of a parametric structure
ϵ
measurement errors and other discrepancies.
f(X)
We can understand which components of
X = (X1;X2; : : : ;Xp) are important in explaining Y , and
which are irrelevant.
expected value
average of y when x has a value [E(Y jX = x)]
regression function.
f(x) = E(Y jX = x) vector X; e.g. f(x) = f(x1; x2; x3) = E(Y jX1 = x1;X2 = x2;X3 = x3)
g
mean-squared prediction error
ideal or optimal predictor of Y
E[(Y - g(X))2jX = x]
irreducible error
ϵ = Y - f(x)
estimate ^ f(x) of f(x)
E[(Y - ^ f(X))2jX = x] = [f(x) - ^ f(x)]2 | Reducible \+ Var(ϵ) Irreducible
Nearest neighbor averaging
Similar y when x is equal to x1, methods can be lousy when p is large.
Reason: the curse of dimensionality.
average squared prediction error
over Training data
MSETr = AveiϵTr[yi - ^ f(xi)]2
average squared prediction error
over Test data
MSETe = AveiϵTr[yi - ^ f(xi)]2
Bias
(^ f(x0))] = E[ ^ f(x0)] - f(x0).
Flexibility of ^ f increases,
its variance increases, and its bias decreases. So choosing the exibility based on average test error amounts to a bias-variance trade-off
Classication:
Typically we measure the performance of ^ C(x) using the
misclassication error rate:
ErrTe = Avei2TeI[yi 6= ^ C(xi)]
Linear regression
is a simple approach to supervised
learning. It assumes that the dependence of Y on
Y = β0 + β1X + ϵ;
β0 and β1
two unknown constants that represent
the intercept and slope, also known as coecients or
parameters,
residual
ei = yi - ^yi
residual sum of squares
RSS = e^21+ e^22 + … + e^2n;
least squares
Estimation of the parameters, chooses β0 and β1 to minimizethe RSS
standard error
estimator reflects how it varies under repeated sampling. standard errors can be used to compute confidence intervals.
confidence intervals
defined as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter.
H0
There is no relationship between X and Y
versus the alternative hypothesis
Ha
There is some relationship between X and Y
t-statistic
To test the null hypothesis, t =^β 1 - 0 / SE( ^ β1)
p-value.
it is easy to compute the probability of observing any value equal to jtj or larger.
RSS
residual sum-of-squares
TSS
total sum of squares.