Module 12: Naive Bayes and Perceptrons Flashcards

Question 1

Q

T/F
The Naive Bayes model is “naive” because it assumes that the features are conditionally independent of each other, given the class.

Question 2

Q

Write down the form of the joint probability model 𝑃(𝑋1,𝑋2,𝑋3,𝑌)
for this data using the Naive Bayes assumption. (Y is the class)

Answer

A

𝑃(𝑌)𝑃(𝑋1|𝑌)𝑃(𝑋2|𝑌)𝑃(𝑋3|𝑌)

In naive bayes, the features are conditionally independent of each other, given the class.

Question 3

Q

T/F
In the perceptron training algorithm, weights are updated after every training instance.

Answer

A

False

Training is error-driven - weights are only updated when the predicted label is incorrect.

Question 4

Q

Consider a two-dimensional data distribution where points belonging to class A are arranged around the origin in a circle with radius 𝑟𝑎
, and points belonging to class B are arranged around the origin in a circle with radius 𝑟𝑏
, where 𝑟𝑎<𝑟𝑏

When applied to this classification problem, the perceptron algorithm

Answer

A

can separate the two classes if we use feature augmentation.

If we augment the input with the features 𝑥2,𝑦2
, and a bias term, the problem becomes linearly separable. Decaying the learning rate will ensure that the weight vector converges, but since the unaugmented data is not linearly separable, this will still fail to separate the classes. Changing the order in which the instances are fed into the perceptron will not allow the perceptron to find a separating boundary.

Question 5

Q

For which of the following datasets would a single perceptron be unable to find a separating boundary (assuming we do not add additional features to the data):

Answer

A

Class A: {(0,0), (1,1)}; Class B: {(0,1), (1,0)}

The correct answer corresponds to the XOR problem, which is not linearly separable. All other options are linearly separable.

Question 6

Q

T/F
Given a non-random dataset, the perceptron algorithm guarantees that the weight vector will converge after a finite number of training iterations.

Answer

A

False

This is only true if the dataset is linearly separable. If it is not, the weight vector may fluctuate infinitely. The fluctuation can be remedied by techniques such as learning rate decay, but the basic perceptron algorithm is not guaranteed to converge.

Question 7

Q

The inference procedure for the perceptron classification algorithm is best summarized as:

Answer

A

Compute a weighted sum, then apply a threshold.

First, we compute the dot product of the weight vector and the input vector (that is, a weighted sum of the input features), and then threshold this value to output either true or false. If you picked the “datum separatus” option, you should consider transferring to Hogwarts.