Week 2 Flashcards

1
Q

What is linear regression with multiple variables?

A

They are also known as “multivariate linear regression”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Notation: n

A

Number of features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Notation: x_i

A

Input (features) of ith training example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Notation: x_ji

A

Value of feature j in ith training example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the hypothesis function?

A

h_t(x)=t^Tx (Theta - transposed times x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is theta a vector?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are parameters?

A

Theta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you update the thetas?

A

theta_j:=theta_j-alpha*dJ(theta)/dtheta_j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is feature scaling?

A

Feature scaling is normalizing features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does feature scaling do?

A

Feature scaling speeds up gradient descent. It puts inputs roughly in same range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does theta do on smaller ranges?

A

Theta descends quickly on smaller ranges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does theta do on larger ranges?

A

Theta descents slowly on larger ranges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Where do you put inputs variables into ranges ideally?

A

-1<=x_i<=1 or -0.5<=x_i<=0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the definition of feature scaling?

A

Feature scaling involves dividing the input values by the range of the input variable, resulting in a new range of just one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the definition of mean normalization?

A

Mean normalization involves subtracting the average value of the input variable from the values for that input variable resulting in a new average for the input variable of just zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to implement both feature scaling and normalization?

A

x_i:=(x_i-u_i)/s_i

Where u_i is the average of all the values for the feature (i) and s_i is the range of values (max-min). Or s_i is the standard deviation.

17
Q

How do you make sure gradient descent is working correctly?

A

J(theta) should decrease after ever iteration.

18
Q

What is an example of automatic convergence test?

A

Declare convergence if J(theta) decreases by less than 10^-3 in one iteration.

19
Q

What are examples where gradient descent is not working?

A

When the cost goes up.
When cost goes down then up over time.
When cost fluctuates.
There is a bug in the code.

20
Q

What happens if you have a sufficiently small alpha?

A

Cost should decrease for every iteration.

21
Q

What happens if alpha is too small?

A

Gradient decent can be too slow to converge.

22
Q

What is an example to choose alpha?

A

By …,0.001, 0.01, 0.1, 1,…

23
Q

What is an example of choosing features efficiently?

A

Combining 2 input features into 1.

24
Q

What happens if you simplify the hypothesis

A

You MAY get a better model.

25
What is polynomial regression?
Regression that uses a hypothesis function that uses a polynomial equation.
26
What are the types of polynomial equation?
Quadratic, Cubic, square root, etc
27
What happens if x values get too high?
You use feature scaling.
28
How do you change the curve of the hypothesis function.
We can change the behavior or curve of the hypothesis function by making it quadratic, cubic or square root function or any other form,
29
What is a normal equation?
A method to solve for theta analytically.
30
How does the normal equation work?
If 1D (theta into R) J(theta)=a*theta^2+b*theta+c dJ(theta)/dtheta=...set=0 solve for theta
31
What is an example of normal equation?
https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/dykma6dwEea3qApInhZCFg_333df5f11086fee19c4fb81bc34d5125_Screenshot-2016-11-10-10.06.16.png?expiry=1601596800000&hmac=D8XVZ5vYo0V7wO9slGqUCSfFiE275RCAMU2mVjjcgUY
32
What is the normal equation method about?
Normal equation method is to minimize J by explicitly taking its derivatives with respect to the theta j's, and setting them to zero. This allows us to find the optimum theta without iteration.
33
What is the normal equation equation?
thetha=(X^T*X)^-1*x^T*y
34
What does the normal equation do?
Find the optimal value of theta
35
What are the advantages and disadvantages of gradient descent and normal equation?
With gradient descent, you need to choose the alpha and it needs many iterations to find the optimal value of theta but it still works well even when n is large. Compared to the normal equation which doesn't need an alpha and no need to iterate. But you need to compute its equation and its slow when n is very large.
36
What features is considered too large for normal equation?
when N is and exceeds 10,000. Unless the computer is really fast at computing.
37
What happens if X^TX is non-invertible?
You can delete a feature that is linearly dependent with another or deleting one or more features when there are too many features.
38
If X^TX noninvertible, the common causes might be having...
1. Redundant features (linearly dependant) | 2. Too many features (eg. m<=n). In this case, delete some features or use regularization.