Supervised Learning: Regression and Classification Flashcards
What are the two common types of supervised learning?
Regression and Classification
Which of these is a type of unsupervised learning:
A.) Regression
B.) Classification
C.) Clustering
C.) Clustering
For linear regression, the model is fw,b(x) = wx + b. Which of the following are the inputs, or features, that are fed into the model and with which the model is expected to make a prediction?
A.) m
B.) x
C.) w and b
D.) (x,y)
B.) x
For linear regression, if you find parameters w and b so the J(w,b) is very close to zero, what can you conclude?
A.) The selected values of the parameters w and b cause the algorithm to fit the training set really poorly.
B.) This is never possible – there must be a bug in the code.
C.) The selected values of the parameters w and b cause the algorithm to fit the training set really well.
C.) The selected values of the parameters w and b cause the algorithm to fit the training set really well.
Gradient descent is an algorithm for finding values of parameters w and b that minimize the cost function J. When the derivative of J(w,b) is a negative number, what happens to w after one update step?
A.) w increases.
B.) It is not possible to tell if w will increase or decrease.
C.) w decreases
D.) w stays the same
A.) w increases. The learning rate is always a positive number, so if you take W minus a negative number, you end up with a new value for W that is larger (more positive).
Which of the following are the potential benefits of vectorization? Please choose the best option.
A.) It makes your code run faster
B.) It can make your code shorter
C.) It allows your code to run more easily on parallel compute hardware
D.) All of the above
D.) All of the above
True/False? To make gradient descent converge about twice as fast, a technique that almost always works is to double the learning rate alpha.
False; Doubling the learning rate may result in a learning rate that is too large and cause gradient descent to fail to find the optimal values for the parameters w and b.
Of the circumstances below, for which one is feature scaling particularly helpful?
A.) Feature scaling is helpful when one feature is much larger (or smaller) than another feature.
B.) Feature scaling is helpful when all the features in the original data (before scaling is applied) range from 0 to 1.
A.)
You are helping a grocery store predict its revenue, and have data on its items sold per week, and price per item. What could be a useful engineered feature?
A.) For each product, calculate the number of items sold times price per item.
B.) For each product, calculate the number of items sold divided by the price per item.
A.) This feature can be interpreted as the revenue generated for each product.
True/False? With polynomial regression, the predicted values f_w,b(x) does not necessarily have to be a straight line (or linear) function of the input feature x.
True; A polynomial function can be non-linear. This can potentially help the model to fit the training data better.
Which of the following is a valid step used during feature scaling?
A.) Subtract the mean (average) from each value and then divide by the (max - min).
B.) Add the mean (average) from each value and and then divide by the (max - min).
A.) This is called mean normalization.
Which is an example of a classification task?
A.) Based on the size of each tumor, determine if each tumor is malignant (cancerous) or not.
B.) Based on a patient’s blood pressure, determine how much blood pressure medication (a dosage measured in milligrams) the patient should be prescribed.
C.) Based on a patient’s age and blood pressure, determine how much blood pressure medication (measured in milligrams) the patient should be prescribed.
A.)
Given the sigmoid function, if z is a large positive number, then:
A.) g(z) is near one (1)
B.) g(z) will be near 0.5
C.) g(z) is near negative one (-1)
D.) g(z) will be near zero (0)
A.)
A cat photo classification model predicts 1 if it’s a cat, and 0 if it’s not a cat. For a particular photograph, the logistic regression model outputs g(z) (a number between 0 and 1). Which of these would be a reasonable criteria to decide whether to predict if it’s a cat?
A.) Predict it is a cat if g(z) < 0.5
B.) Predict it is a cat if g(z) = 0.5
C.) Predict it is a cat if g(z) < 0.7
D.) Predict it is a cat if g(z) >= 0.5
D.) Think of g(z) as the probability that the photo is of a cat. When this number is at or above the threshold of 0.5, predict that it is a cat.
True/False? No matter what features you use (including if you use polynomial features), the decision boundary learned by logistic regression will be a linear decision boundary.
False; The decision boundary can also be non-linear