Module 3 Flashcards

Question 1

Q

Consider 5 points on a line with coordinates and labels as follows:

Point A: x = 10, label = “+”
Point B: x = 25, label = “-“
Point C: x = 30, label = “+”
Point D: x = 36, label = “+”
Point E: x = 37, label = “-“

Using nearest neighbor algorithm with k=3 to assign label, what would be the label for point F with x = 32.50?

a.
“+”

b.
“-“

Answer

A

a. “+”

Question 2

Q

Logistic regression is used

a.
to predict a binary variable from continuous or binary variables.

b.
to predict a continuous variable from binary variables.

c.
to predict any categorical variable from several other categorical variables.

d.
to predict a continuous variable from binary or continuous variables

Answer

A

a.
to predict a binary variable from continuous or binary variables.

Question 3

Q

Both k-NN and logistic regression are supervised machine learning algorithms.

True
False

Question 4

Q

Which of the following is true about data scaling:

a.
required in both logistic regression and k-NN

b.
optional in both logistic regression and k-NN

c.
optional in logistic regression, required in k-NN

d.
required in logistic regression, optional in k-NN

Answer

A

c.
optional in logistic regression, required in k-NN

Question 5

Q

True or false? Increasing the number of neighbors k always increases the accuracy.

Question 6

Q

Consider 4 points on a line with coordinates and labels as follows:

Point A: x = 10, label = “+”
Point B: x = 29, label = “-“
Point C: x = 30, label = “+”
Point D: x = 31, label = “+”

Are these points linearly separable?
True
False

Question 7

Q

Which of the following is Euclidean distance between the two data points A(4,2) and B(10,10)?

Answers:
a.
8

b.
10

c.
9

d.
11

Question 8

Q

Which of the following is assumed by logistic regression?

Answers:
a.
There is no dependent variable.

b.
The dependent variable is divided into two equal sub-categories.

c.
The dependent variable is continuous.

d.
The dependent variable consists of two categories.

Answer

A

d.
The dependent variable consists of two categories.

Question 9

Q

Suppose you have data which is normally distributed with mean 10 and standard deviation 2. Approximately what percent of the data will fall within the range 8 to 12?

About 95%

About 68%

About 99.7%

100%

Answer

A

About 68%

Question 10

Q

You record the number of deliveries at your workplace every workday for a few months, and you find the average amount of deliveries per day is 3. Which discrete probability distribution would be the most appropriate for calculating the probability that there are 4 deliveries at your workplace on a given day?

Poisson

Answers:
Binomial

Uniform

Poisson

Bernoulli