Module 3 Flashcards
Consider 5 points on a line with coordinates and labels as follows:
- Point A: x = 10, label = “+”
- Point B: x = 25, label = “-“
- Point C: x = 30, label = “+”
- Point D: x = 36, label = “+”
- Point E: x = 37, label = “-“
Using nearest neighbor algorithm with k=3 to assign label, what would be the label for point F with x = 32.50?
a.
“+”
b.
“-“
a. “+”
Logistic regression is used
a.
to predict a binary variable from continuous or binary variables.
b.
to predict a continuous variable from binary variables.
c.
to predict any categorical variable from several other categorical variables.
d.
to predict a continuous variable from binary or continuous variables
a.
to predict a binary variable from continuous or binary variables.
Both k-NN and logistic regression are supervised machine learning algorithms.
True
False
True
Which of the following is true about data scaling:
a.
required in both logistic regression and k-NN
b.
optional in both logistic regression and k-NN
c.
optional in logistic regression, required in k-NN
d.
required in logistic regression, optional in k-NN
c.
optional in logistic regression, required in k-NN
True or false? Increasing the number of neighbors k always increases the accuracy.
False
Consider 4 points on a line with coordinates and labels as follows:
- Point A: x = 10, label = “+”
- Point B: x = 29, label = “-“
- Point C: x = 30, label = “+”
- Point D: x = 31, label = “+”
Are these points linearly separable?
True
False
False
Which of the following is Euclidean distance between the two data points A(4,2) and B(10,10)?
Answers:
a.
8
b.
10
c.
9
d.
11
b. 10
Which of the following is assumed by logistic regression?
Answers:
a.
There is no dependent variable.
b.
The dependent variable is divided into two equal sub-categories.
c.
The dependent variable is continuous.
d.
The dependent variable consists of two categories.
d.
The dependent variable consists of two categories.
Suppose you have data which is normally distributed with mean 10 and standard deviation 2. Approximately what percent of the data will fall within the range 8 to 12?
About 95%
About 68%
About 99.7%
100%
About 68%
You record the number of deliveries at your workplace every workday for a few months, and you find the average amount of deliveries per day is 3. Which discrete probability distribution would be the most appropriate for calculating the probability that there are 4 deliveries at your workplace on a given day?
Poisson
Answers:
Binomial
Uniform
Poisson
Bernoulli
Poisson