Week 2 Flashcards
What is linear regression with multiple variables?
They are also known as “multivariate linear regression”
Notation: n
Number of features.
Notation: x_i
Input (features) of ith training example
Notation: x_ji
Value of feature j in ith training example.
What is the hypothesis function?
h_t(x)=t^Tx (Theta - transposed times x)
Is theta a vector?
Yes.
What are parameters?
Theta
How do you update the thetas?
theta_j:=theta_j-alpha*dJ(theta)/dtheta_j
What is feature scaling?
Feature scaling is normalizing features
What does feature scaling do?
Feature scaling speeds up gradient descent. It puts inputs roughly in same range.
What does theta do on smaller ranges?
Theta descends quickly on smaller ranges.
What does theta do on larger ranges?
Theta descents slowly on larger ranges.
Where do you put inputs variables into ranges ideally?
-1<=x_i<=1 or -0.5<=x_i<=0.5
What is the definition of feature scaling?
Feature scaling involves dividing the input values by the range of the input variable, resulting in a new range of just one.
What is the definition of mean normalization?
Mean normalization involves subtracting the average value of the input variable from the values for that input variable resulting in a new average for the input variable of just zero.
How to implement both feature scaling and normalization?
x_i:=(x_i-u_i)/s_i
Where u_i is the average of all the values for the feature (i) and s_i is the range of values (max-min). Or s_i is the standard deviation.
How do you make sure gradient descent is working correctly?
J(theta) should decrease after ever iteration.
What is an example of automatic convergence test?
Declare convergence if J(theta) decreases by less than 10^-3 in one iteration.
What are examples where gradient descent is not working?
When the cost goes up.
When cost goes down then up over time.
When cost fluctuates.
There is a bug in the code.
What happens if you have a sufficiently small alpha?
Cost should decrease for every iteration.
What happens if alpha is too small?
Gradient decent can be too slow to converge.
What is an example to choose alpha?
By …,0.001, 0.01, 0.1, 1,…
What is an example of choosing features efficiently?
Combining 2 input features into 1.
What happens if you simplify the hypothesis
You MAY get a better model.