Lecture 2 Flashcards
What is x?
feature or independent variable
What is y?
target or dependent variable
What do we use x for?
to predict y
How is the ith observation denoted?
(x_i, y_i)
What is H?
the hypothesis function
What does the hypothesis function do?
takes in an x as input and returns a predicted y
What are parameters?
define the relationship between the input and output of a hypothesis function
What is the constant model?
H(x) = h
How many parameters does the constant model have?
one: h
How do we calculate the mean?
adding all the numbers in our set up and divide by n numbers
How do we calculate the median?
sort our numbers in ascending order and take the middle number
What are both the mean and the median?
they are summary statistics
What are summary statistics?
they summarize a collection of numbers with a single number
What is a loss function?
quantifies how bad a prediction is for a single data point
What can we say about our loss if our prediction is close to the actual value?
that we should have low loss
What can we say about our loss if our prediction is far from the actual value?
that we should have high loss
What is error?
the difference between actual and predicted values
What does y_i - H(x_i) mean?
actual - predicted
What does y_i stand for?
actual
What does H(x_i) stand for?
predicted
What is the squares loss function?
L_sq(y_i, H(x_i)) = (y_i - H(x_i))^2
How can we simplify the squared loss function for the constant model?
L_sq(y_i, h) = (y_i - h)^2
What is the average of squared losses?
a single number that describes the quality of our predictions across our entire dataset
What is another term for the average squared loss?
mean square error (MSE)
What does L stand for?
loss for a single point
What does R stand for?
average loss over all points; empirical risk
What is the notation for mean squared error?
R_sq(h)
What is the summation notation for mean squared error?
R_sq(h) = 1/n En i=1 (y_i - h)^2
What is the first step of minimizing our loss?
take its derivative with respect to h
What does h* stand for?
the best prediction
What is the second step of minimizing our loss?
set it equal to 0
What is the third step of minimizing our loss?
solve for the resulting h*
What is the final step of minimizing our loss?
perform a second derivative test to ensure we found a minimum
What is the derivative of x^n?
d/dx x^n = nxn^n-1
What is the derivative of f(g(x))?
d/dx (f(g(x))) = fā(g(x)) * gā(x)
What is the derivative of (y_i - h)^2?
2(h - y_i)
What is the total derivative of R_sq(h)?
-2/n En i=1 (y_i - h)
What is h* of R_sq(h)?
1/n En i=1 y_i; the mean
What shape is R_sq(h)?
convex
If R_sq(h) opens upwards, what is h*?
it is a minimum
What is the best constant prediction in terms of the mean squared error?
the mean
What is h* the solution to?
an optimization problem
What is the first step of the modeling recipe?
choose a model
What is the second step of the modeling recipe?
choose a loss function
What is the third step of the modeling recipe?
minimize average loss to find optimal model parameters