Week 2 Flashcards by Unknown Unknown

What is linear regression with multiple variables?

They are also known as “multivariate linear regression”

How well did you know this?

Not at all

Perfectly

Notation: n

Number of features.

How well did you know this?

Not at all

Perfectly

Notation: x_i

Input (features) of ith training example

How well did you know this?

Not at all

Perfectly

Notation: x_ji

Value of feature j in ith training example.

How well did you know this?

Not at all

Perfectly

What is the hypothesis function?

h_t(x)=t^Tx (Theta - transposed times x)

How well did you know this?

Not at all

Perfectly

Is theta a vector?

Yes.

How well did you know this?

Not at all

Perfectly

What are parameters?

Theta

How well did you know this?

Not at all

Perfectly

How do you update the thetas?

theta_j:=theta_j-alpha*dJ(theta)/dtheta_j

How well did you know this?

Not at all

Perfectly

What is feature scaling?

Feature scaling is normalizing features

How well did you know this?

Not at all

Perfectly

What does feature scaling do?

Feature scaling speeds up gradient descent. It puts inputs roughly in same range.

How well did you know this?

Not at all

Perfectly

What does theta do on smaller ranges?

Theta descends quickly on smaller ranges.

How well did you know this?

Not at all

Perfectly

What does theta do on larger ranges?

Theta descents slowly on larger ranges.

How well did you know this?

Not at all

Perfectly

Where do you put inputs variables into ranges ideally?

-1<=x_i<=1 or -0.5<=x_i<=0.5

How well did you know this?

Not at all

Perfectly

What is the definition of feature scaling?

Feature scaling involves dividing the input values by the range of the input variable, resulting in a new range of just one.

How well did you know this?

Not at all

Perfectly

What is the definition of mean normalization?

Mean normalization involves subtracting the average value of the input variable from the values for that input variable resulting in a new average for the input variable of just zero.

How well did you know this?

Not at all

Perfectly

How to implement both feature scaling and normalization?

Study These Flashcards

x_i:=(x_i-u_i)/s_i

Where u_i is the average of all the values for the feature (i) and s_i is the range of values (max-min). Or s_i is the standard deviation.

How do you make sure gradient descent is working correctly?

Study These Flashcards

J(theta) should decrease after ever iteration.

What is an example of automatic convergence test?

Study These Flashcards

Declare convergence if J(theta) decreases by less than 10^-3 in one iteration.

What are examples where gradient descent is not working?

Study These Flashcards

When the cost goes up.
When cost goes down then up over time.
When cost fluctuates.
There is a bug in the code.

What happens if you have a sufficiently small alpha?

Study These Flashcards

Cost should decrease for every iteration.

What happens if alpha is too small?

Study These Flashcards

Gradient decent can be too slow to converge.

What is an example to choose alpha?

Study These Flashcards

By …,0.001, 0.01, 0.1, 1,…

What is an example of choosing features efficiently?

Study These Flashcards

Combining 2 input features into 1.

What happens if you simplify the hypothesis

Study These Flashcards

You MAY get a better model.

What is polynomial regression?

Regression that uses a hypothesis function that uses a polynomial equation.

What are the types of polynomial equation?

Quadratic, Cubic, square root, etc

What happens if x values get too high?

You use feature scaling.

How do you change the curve of the hypothesis function.

We can change the behavior or curve of the hypothesis function by making it quadratic, cubic or square root function or any other form,

What is a normal equation?

A method to solve for theta analytically.

How does the normal equation work?

If 1D (theta into R) J(theta)=a*theta^2+b*theta+c dJ(theta)/dtheta=...set=0 solve for theta

What is an example of normal equation?

https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/dykma6dwEea3qApInhZCFg_333df5f11086fee19c4fb81bc34d5125_Screenshot-2016-11-10-10.06.16.png?expiry=1601596800000&hmac=D8XVZ5vYo0V7wO9slGqUCSfFiE275RCAMU2mVjjcgUY

What is the normal equation method about?

Normal equation method is to minimize J by explicitly taking its derivatives with respect to the theta j's, and setting them to zero. This allows us to find the optimum theta without iteration.

What is the normal equation equation?

thetha=(X^T*X)^-1*x^T*y

What does the normal equation do?

Find the optimal value of theta

What are the advantages and disadvantages of gradient descent and normal equation?

With gradient descent, you need to choose the alpha and it needs many iterations to find the optimal value of theta but it still works well even when n is large. Compared to the normal equation which doesn't need an alpha and no need to iterate. But you need to compute its equation and its slow when n is very large.

What features is considered too large for normal equation?

when N is and exceeds 10,000. Unless the computer is really fast at computing.

What happens if X^TX is non-invertible?

You can delete a feature that is linearly dependent with another or deleting one or more features when there are too many features.

If X^TX noninvertible, the common causes might be having...

1. Redundant features (linearly dependant) | 2. Too many features (eg. m<=n). In this case, delete some features or use regularization.

Week 2 Flashcards

(38 cards)