empirical risk Flashcards

1
Q

simple linear regression model

A

H(x) = w0 + w1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

slope in simple linear regression model

A

w1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

intercept in simple linear regression model

A

w0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

loss function

A

quantifies how bad a prediction is for a single data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

if our prediction is close to the actual value

A

we should have low loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

if our prediction is far from the actual value

A

we should have high loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

error

A

difference between actual and predicted values (yi - H(xi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

squared loss function

A

computes (actual - predicted)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

constant model

A

Lsq(yi, h) = (yi - h)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

another term for average squared loss

A

mean squared error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

best prediction, h*

A

Rsq(h) = 1/n(Summation of i = 1, n) (yi - h)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

constant model

A

H(x) = h

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

simple linear regression

A

H(x) = w0 + w1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how do we find h* that minimizes Rsq(h)

A

using calculus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

minimize Rsq(h)

A
  1. take its derivative with respect to h
  2. set it equal to 0
  3. solve for the resulting h*
  4. perform a second derivative test to ensure we found a minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

derivative of Rsq(h)

A

-2/n(SUMMATION of n starting w/ i = 1)(yi - h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Mean minimizes…

A

mean squared error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

absolute loss

A

Labs(yi, H(xi)) = |yi - H(xi)|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

average absolute loss

A

Rabs(h) = 1/n summation of n from i = 1 |yi - h|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

to minimize mean absolute error

A
  1. take its derivative with respect to h
  2. set it equal to 0
  3. solve for the resulting h*
  4. perform a second derivative test to ensure we found a minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

derivative of |yi - h|

A

it is a piece-wise function, so will be undefined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

derivative of Rabs(h)

A

d/dh(1/n SUMMATION of n from i = 1, |yi - h|) = 1/n[#(h > yi) - #(h < yi)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

median minimizes

A

mean absolute error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

best constant prediction in terms of mean absolute error

A

median
1. when n is odd, answer is unique
2. when n is even, any number between the middle two data points also minimizes mean absolute error
3. when n is even, define the median to be the mean of the middle two data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

process for minimizing average loss

A

empirical risk minimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

another name for “average loss”

A

empirical risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

corresponding empirical risk when using the squared loss function

A

Rsq(h) = 1/n SUMMATION of n from i = 1 (yi - h)^2

28
Q

if L(yi, h) is any loss function the corresponding empirical risk is

A

R(h) = 1/n(SUMMATION Of n from i = 1, L(yi, h)

29
Q

Modeling recipe

A
  1. choose a model
  2. choose a loss function
  3. minimize average loss to find optimal model parameters
30
Q

empirical risk minimization

A

formal name for the process of minimizing average loss

31
Q

corresponding squared loss function to Lsq(yi, h) = (yi - h)^2

A

Rsq(h) = 1/n Summation of n from i = 1 (yi - h)^2

32
Q

For the mean

A

sum of distances below = sum of distances above

33
Q

Mean is the point where

A

Summation of n from i = 1 (yi - h) = 0

34
Q

Median is the point where

A

(yi < h) = #(yi > h)

35
Q

Lp loss

A

Lp(yi, h) = |yi - h|^p

36
Q

Corresponding empirical risk to Lp

A

Rp(h) = 1/n summation of n from i = 1|yi - h|^p

37
Q

midrange minimizes

A

L(infinity loss)

38
Q

As p –> infinity,

A

the minimizer of mean Lp loss approached the midpoint of the minimum and maximum values in the dataset or midrange

39
Q

The general form of empirical risk for any loss function

A

R(h) = 1/n Summation of n from i = 1 (L(yi , h))

40
Q

input h* that minimizes R(h) is…

A

some measure of the center of the dataset

41
Q

minimum output R(h*) represents

A

some measure of the spread or variation in the dataset

42
Q

Minimum value of Rsq(h)

A

Rsq(h*) = Rsq(Mean(y1, y2…yn))
= 1/n SUMMATION of n starting from i = 1(yi - Mean(y1, y2,… yn))^2

43
Q

Variance

A

minimum value of Rsq(h) is the mean squared deviation from the mean, measures the squared distance of each data point from the mean, on average

44
Q

standard deviation

A

square root of variance

45
Q

empirical risk for absolute loss

A

Rabs(h) = 1/n summation of n starting from i = 1|yi - h|

46
Q

Rabs(h) is minimized when

A

h* = Median(y1, y2,… yn)

47
Q

Minimum value of Rabs(h) is…

A

mean absolute deviation from the median
(1/n)SUMMATION of n from i = 1|yi - Median(y1, y2,…yn)|

48
Q

empirical risk for 0-1 Loss

A

R0,1(h) = 1/n Summation of n starting from i = 1 {0 - yi = h, 1 yi doesn’t equal h

proportion (between 0 and 1) of data points not equal to h

49
Q

R0,1(h) is minimized when

A

h* = Mode(y1,y2…yn)

50
Q

the minimum value of R0,1(h)

A

proportion of data points not equal to mode

51
Q

simple linear regression model

A

H(x) = w0 + w1x

52
Q

when using squared loss

A

h* = Mean(y1, y2… yn)
Rsq(h*) = Variance(y1, y2, … yn)

53
Q

When using absolute loss

A

h* = Median(y1, y2… yn)
Rabs(h*) = MAD from median

54
Q

R0,1(h) is minimized when

A

h* = Mode(y1,y2,… yn)
so therefore R0,1(h*) is the proportion of data points not equal to the mode

55
Q

minimum value of R0,1(h) is the

A

proportion of data points not equal to mode

56
Q

higher value means

A

less of the data is clustered at the mode

57
Q

hypothesis function

A

H, takes in an x as input and returns a predicted y

58
Q

parameters define

A

the relationship between the input and output of a hypothesis function

59
Q

Since linear hypothesis functions are of the form H(x) = w0 + w1x, we can re-write Rsq

A

Rsq(w0, w1) = 1/n Summation n from i = 1 (yi - (w0 + w1xi))^2

60
Q

Minimize mean squared error

A

Take partial derivatives with respect to each variable
set all partial derivatives to 0
solve the resulting system of equations
ensure that you’ve found a minimum, rather than a maximum or saddle point

61
Q

We have a system of two equations and two unknowns (w0 and w1)
-2/n Summation of n from i = 1 ( yi - (w0 + w1xi)) = 0

-2/n Summation of n from i = 1 (yi - (w0 + w1xi))xi = 0

A

solve for w0 in first equation, result becomes best intercept
plut w0* into second equation and solve for w1

62
Q

correlation

A

linear association, pattern that looks like a line

63
Q

association

A

any pattern

64
Q

correlation coefficient ,r

A

measure of strength of linear assocaition of two variables, x and y, measures how tightly clustered a scatter plot is around a straight line, between -1 and 1

65
Q

correlation coefficient, r is defined

A

average of the product of x and y when both are in standard units

66
Q

slope : w1* = r(sigma y / sigma x)

A

units of y per units of x

67
Q
A