Supervised Learning: Regression and Classification Flashcards

1
Q

What are the two common types of supervised learning?

A

Regression and Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of these is a type of unsupervised learning:
A.) Regression
B.) Classification
C.) Clustering

A

C.) Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For linear regression, the model is fw,b(x) = wx + b. Which of the following are the inputs, or features, that are fed into the model and with which the model is expected to make a prediction?

A.) m
B.) x
C.) w and b
D.) (x,y)

A

B.) x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For linear regression, if you find parameters w and b so the J(w,b) is very close to zero, what can you conclude?

A.) The selected values of the parameters w and b cause the algorithm to fit the training set really poorly.

B.) This is never possible – there must be a bug in the code.

C.) The selected values of the parameters w and b cause the algorithm to fit the training set really well.

A

C.) The selected values of the parameters w and b cause the algorithm to fit the training set really well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Gradient descent is an algorithm for finding values of parameters w and b that minimize the cost function J. When the derivative of J(w,b) is a negative number, what happens to w after one update step?
A.) w increases.

B.) It is not possible to tell if w will increase or decrease.

C.) w decreases

D.) w stays the same

A

A.) w increases. The learning rate is always a positive number, so if you take W minus a negative number, you end up with a new value for W that is larger (more positive).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following are the potential benefits of vectorization? Please choose the best option.
A.) It makes your code run faster
B.) It can make your code shorter
C.) It allows your code to run more easily on parallel compute hardware
D.) All of the above

A

D.) All of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True/False? To make gradient descent converge about twice as fast, a technique that almost always works is to double the learning rate alpha.

A

False; Doubling the learning rate may result in a learning rate that is too large and cause gradient descent to fail to find the optimal values for the parameters w and b.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Of the circumstances below, for which one is feature scaling particularly helpful?

A.) Feature scaling is helpful when one feature is much larger (or smaller) than another feature.
B.) Feature scaling is helpful when all the features in the original data (before scaling is applied) range from 0 to 1.

A

A.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You are helping a grocery store predict its revenue, and have data on its items sold per week, and price per item. What could be a useful engineered feature?

A.) For each product, calculate the number of items sold times price per item.
B.) For each product, calculate the number of items sold divided by the price per item.

A

A.) This feature can be interpreted as the revenue generated for each product.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True/False? With polynomial regression, the predicted values f_w,b(x) does not necessarily have to be a straight line (or linear) function of the input feature x.

A

True; A polynomial function can be non-linear. This can potentially help the model to fit the training data better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following is a valid step used during feature scaling?

A.) Subtract the mean (average) from each value and then divide by the (max - min).
B.) Add the mean (average) from each value and and then divide by the (max - min).

A

A.) This is called mean normalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which is an example of a classification task?

A.) Based on the size of each tumor, determine if each tumor is malignant (cancerous) or not.
B.) Based on a patient’s blood pressure, determine how much blood pressure medication (a dosage measured in milligrams) the patient should be prescribed.
C.) Based on a patient’s age and blood pressure, determine how much blood pressure medication (measured in milligrams) the patient should be prescribed.

A

A.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Given the sigmoid function, if z is a large positive number, then:

A.) g(z) is near one (1)

B.) g(z) will be near 0.5

C.) g(z) is near negative one (-1)

D.) g(z) will be near zero (0)

A

A.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A cat photo classification model predicts 1 if it’s a cat, and 0 if it’s not a cat. For a particular photograph, the logistic regression model outputs g(z) (a number between 0 and 1). Which of these would be a reasonable criteria to decide whether to predict if it’s a cat?
A.) Predict it is a cat if g(z) < 0.5
B.) Predict it is a cat if g(z) = 0.5
C.) Predict it is a cat if g(z) < 0.7
D.) Predict it is a cat if g(z) >= 0.5

A

D.) Think of g(z) as the probability that the photo is of a cat. When this number is at or above the threshold of 0.5, predict that it is a cat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True/False? No matter what features you use (including if you use polynomial features), the decision boundary learned by logistic regression will be a linear decision boundary.

A

False; The decision boundary can also be non-linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

“Cost” and “loss” have distinct meanings. Which one applies to a single training example?
A.) Loss
B.) Cost
C.) Both
D.) Neither

A

A.) Loss; loss is calculated on a single training example. It is worth noting that this definition is not universal.

17
Q

Which of the following two statements is a more accurate statement about gradient descent for logistic regression?

A.) The update steps are identical to the update steps for linear regression.
B.) The update steps look like the update steps for linear regression, but the definition of f_w,b(x^i) is different.

A

B.) For logistic regression, f_w,b(x^i) is the sigmoid function instead of a straight line.

18
Q

Which of the following can address overfitting?

A.) Apply regularization
B.) Remove a random set of training examples
C.) Select a subset of the more relevant features.
D.) Collect more training data

A

A.); Regularization is used to reduce overfitting. C.); If the model trains on the more relevant features, and not on the less useful features, it may generalize better to new examples. and D.);If the model trains on more data, it may generalize better to new examples.

19
Q

Suppose you have a regularized linear regression model. If you increase the regularization parameter λ, what do you expect to happen to the parameters w 1,w 2 ,…,w n?

A.) This will reduce the size of the parameters w 1,w 2 ,…,w n
B.)This will increase the size of the parameters w 1,w 2 ,…,w n

A

A.) Regularization reduces overfitting by reducing the size of the parameters w1,w2 ,…,wn