Supervised Learning: Regression and Classification Flashcards

Question 1

Q

What are the two common types of supervised learning?

Answer

A

Regression and Classification

Question 2

Q

Which of these is a type of unsupervised learning:
A.) Regression
B.) Classification
C.) Clustering

Answer

A

C.) Clustering

Question 3

Q

For linear regression, the model is fw,b(x) = wx + b. Which of the following are the inputs, or features, that are fed into the model and with which the model is expected to make a prediction?

A.) m
B.) x
C.) w and b
D.) (x,y)

Question 4

Q

For linear regression, if you find parameters w and b so the J(w,b) is very close to zero, what can you conclude?

A.) The selected values of the parameters w and b cause the algorithm to fit the training set really poorly.

B.) This is never possible – there must be a bug in the code.

C.) The selected values of the parameters w and b cause the algorithm to fit the training set really well.

Answer

A

C.) The selected values of the parameters w and b cause the algorithm to fit the training set really well.

Question 5

Q

Gradient descent is an algorithm for finding values of parameters w and b that minimize the cost function J. When the derivative of J(w,b) is a negative number, what happens to w after one update step?
A.) w increases.

B.) It is not possible to tell if w will increase or decrease.

C.) w decreases

D.) w stays the same

Answer

A

A.) w increases. The learning rate is always a positive number, so if you take W minus a negative number, you end up with a new value for W that is larger (more positive).

Question 6

Q

Which of the following are the potential benefits of vectorization? Please choose the best option.
A.) It makes your code run faster
B.) It can make your code shorter
C.) It allows your code to run more easily on parallel compute hardware
D.) All of the above

Answer

A

D.) All of the above

Question 7

Q

True/False? To make gradient descent converge about twice as fast, a technique that almost always works is to double the learning rate alpha.

Answer

A

False; Doubling the learning rate may result in a learning rate that is too large and cause gradient descent to fail to find the optimal values for the parameters w and b.

Question 8

Q

Of the circumstances below, for which one is feature scaling particularly helpful?

A.) Feature scaling is helpful when one feature is much larger (or smaller) than another feature.
B.) Feature scaling is helpful when all the features in the original data (before scaling is applied) range from 0 to 1.

Question 9

Q

You are helping a grocery store predict its revenue, and have data on its items sold per week, and price per item. What could be a useful engineered feature?

A.) For each product, calculate the number of items sold times price per item.
B.) For each product, calculate the number of items sold divided by the price per item.

Answer

A

A.) This feature can be interpreted as the revenue generated for each product.

Question 10

Q

True/False? With polynomial regression, the predicted values f_w,b(x) does not necessarily have to be a straight line (or linear) function of the input feature x.

Answer

A

True; A polynomial function can be non-linear. This can potentially help the model to fit the training data better.

Question 11

Q

Which of the following is a valid step used during feature scaling?

A.) Subtract the mean (average) from each value and then divide by the (max - min).
B.) Add the mean (average) from each value and and then divide by the (max - min).

Answer

A

A.) This is called mean normalization.

Question 12

Q

Which is an example of a classification task?

A.) Based on the size of each tumor, determine if each tumor is malignant (cancerous) or not.
B.) Based on a patient’s blood pressure, determine how much blood pressure medication (a dosage measured in milligrams) the patient should be prescribed.
C.) Based on a patient’s age and blood pressure, determine how much blood pressure medication (measured in milligrams) the patient should be prescribed.

Question 13

Q

Given the sigmoid function, if z is a large positive number, then:

A.) g(z) is near one (1)

B.) g(z) will be near 0.5

C.) g(z) is near negative one (-1)

D.) g(z) will be near zero (0)

Question 14

Q

A cat photo classification model predicts 1 if it’s a cat, and 0 if it’s not a cat. For a particular photograph, the logistic regression model outputs g(z) (a number between 0 and 1). Which of these would be a reasonable criteria to decide whether to predict if it’s a cat?
A.) Predict it is a cat if g(z) < 0.5
B.) Predict it is a cat if g(z) = 0.5
C.) Predict it is a cat if g(z) < 0.7
D.) Predict it is a cat if g(z) >= 0.5

Answer

A

D.) Think of g(z) as the probability that the photo is of a cat. When this number is at or above the threshold of 0.5, predict that it is a cat.

Question 15

Q

True/False? No matter what features you use (including if you use polynomial features), the decision boundary learned by logistic regression will be a linear decision boundary.

Answer

A

False; The decision boundary can also be non-linear

Question 16

Q

“Cost” and “loss” have distinct meanings. Which one applies to a single training example?
A.) Loss
B.) Cost
C.) Both
D.) Neither

Answer

Study These Flashcards

A

A.) Loss; loss is calculated on a single training example. It is worth noting that this definition is not universal.

Question 17

Q

Which of the following two statements is a more accurate statement about gradient descent for logistic regression?

A.) The update steps are identical to the update steps for linear regression.
B.) The update steps look like the update steps for linear regression, but the definition of f_w,b(x^i) is different.

Answer

Study These Flashcards

A

B.) For logistic regression, f_w,b(x^i) is the sigmoid function instead of a straight line.

Question 18

Q

Which of the following can address overfitting?

A.) Apply regularization
B.) Remove a random set of training examples
C.) Select a subset of the more relevant features.
D.) Collect more training data

Answer

Study These Flashcards

A

A.); Regularization is used to reduce overfitting. C.); If the model trains on the more relevant features, and not on the less useful features, it may generalize better to new examples. and D.);If the model trains on more data, it may generalize better to new examples.

Question 19

Q

Suppose you have a regularized linear regression model. If you increase the regularization parameter λ, what do you expect to happen to the parameters w 1,w 2 ,…,w n?

A.) This will reduce the size of the parameters w 1,w 2 ,…,w n
B.)This will increase the size of the parameters w 1,w 2 ,…,w n

Answer

Study These Flashcards

A

A.) Regularization reduces overfitting by reducing the size of the parameters w1,w2 ,…,wn

Supervised Learning: Regression and Classification Flashcards

(19 cards)