Week 2 Flashcards

1
Q

What is linear regression with multiple variables?

A

They are also known as “multivariate linear regression”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Notation: n

A

Number of features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Notation: x_i

A

Input (features) of ith training example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Notation: x_ji

A

Value of feature j in ith training example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the hypothesis function?

A

h_t(x)=t^Tx (Theta - transposed times x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is theta a vector?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are parameters?

A

Theta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you update the thetas?

A

theta_j:=theta_j-alpha*dJ(theta)/dtheta_j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is feature scaling?

A

Feature scaling is normalizing features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does feature scaling do?

A

Feature scaling speeds up gradient descent. It puts inputs roughly in same range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does theta do on smaller ranges?

A

Theta descends quickly on smaller ranges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does theta do on larger ranges?

A

Theta descents slowly on larger ranges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Where do you put inputs variables into ranges ideally?

A

-1<=x_i<=1 or -0.5<=x_i<=0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the definition of feature scaling?

A

Feature scaling involves dividing the input values by the range of the input variable, resulting in a new range of just one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the definition of mean normalization?

A

Mean normalization involves subtracting the average value of the input variable from the values for that input variable resulting in a new average for the input variable of just zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to implement both feature scaling and normalization?

A

x_i:=(x_i-u_i)/s_i

Where u_i is the average of all the values for the feature (i) and s_i is the range of values (max-min). Or s_i is the standard deviation.

17
Q

How do you make sure gradient descent is working correctly?

A

J(theta) should decrease after ever iteration.

18
Q

What is an example of automatic convergence test?

A

Declare convergence if J(theta) decreases by less than 10^-3 in one iteration.

19
Q

What are examples where gradient descent is not working?

A

When the cost goes up.
When cost goes down then up over time.
When cost fluctuates.
There is a bug in the code.

20
Q

What happens if you have a sufficiently small alpha?

A

Cost should decrease for every iteration.

21
Q

What happens if alpha is too small?

A

Gradient decent can be too slow to converge.

22
Q

What is an example to choose alpha?

A

By …,0.001, 0.01, 0.1, 1,…

23
Q

What is an example of choosing features efficiently?

A

Combining 2 input features into 1.

24
Q

What happens if you simplify the hypothesis

A

You MAY get a better model.

25
Q

What is polynomial regression?

A

Regression that uses a hypothesis function that uses a polynomial equation.

26
Q

What are the types of polynomial equation?

A

Quadratic, Cubic, square root, etc

27
Q

What happens if x values get too high?

A

You use feature scaling.

28
Q

How do you change the curve of the hypothesis function.

A

We can change the behavior or curve of the hypothesis function by making it quadratic, cubic or square root function or any other form,

29
Q

What is a normal equation?

A

A method to solve for theta analytically.

30
Q

How does the normal equation work?

A

If 1D (theta into R)
J(theta)=atheta^2+btheta+c
dJ(theta)/dtheta=…set=0
solve for theta

31
Q

What is an example of normal equation?

A

https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/dykma6dwEea3qApInhZCFg_333df5f11086fee19c4fb81bc34d5125_Screenshot-2016-11-10-10.06.16.png?expiry=1601596800000&hmac=D8XVZ5vYo0V7wO9slGqUCSfFiE275RCAMU2mVjjcgUY

32
Q

What is the normal equation method about?

A

Normal equation method is to minimize J by explicitly taking its derivatives with respect to the theta j’s, and setting them to zero. This allows us to find the optimum theta without iteration.

33
Q

What is the normal equation equation?

A

thetha=(X^TX)^-1x^T*y

34
Q

What does the normal equation do?

A

Find the optimal value of theta

35
Q

What are the advantages and disadvantages of gradient descent and normal equation?

A

With gradient descent, you need to choose the alpha and it needs many iterations to find the optimal value of theta but it still works well even when n is large. Compared to the normal equation which doesn’t need an alpha and no need to iterate. But you need to compute its equation and its slow when n is very large.

36
Q

What features is considered too large for normal equation?

A

when N is and exceeds 10,000. Unless the computer is really fast at computing.

37
Q

What happens if X^TX is non-invertible?

A

You can delete a feature that is linearly dependent with another or deleting one or more features when there are too many features.

38
Q

If X^TX noninvertible, the common causes might be having…

A
  1. Redundant features (linearly dependant)

2. Too many features (eg. m<=n). In this case, delete some features or use regularization.