CSCI 343 Quiz 2 Flashcards
K-nearest neighbor (KNN) is a ? learner
lazy
a lazy learner has no…
no training, no model
in a typical machine learning algorithm, ? takes more time; in a KNN algorithm, ? takes more time
typical: training takes time
KNN: testing/prediction takes time
KNN uses a
similarity measure
the parameter k for KNN is usually
odd & small
K-nearest neighbor works by
finding the k closest samples, using majority rules to pick the label; can also use weighted average
the parameter k for KNN is defaulted to be
5
Naive Bayes Classification can be used for (examples)
spam classification, medical diagnosis, weather
Naive Bayes: given ? predict ?
given features x1, x2, …, xn, predict label Y
conditional probability
the probability that the event B will occur, given event A has already occurred; P(B|A)
if A and B are independent, the conditional probability of B given A is
P(B)
if A and B are NOT independent, the probability of both events occurring is
P(A,B) = P(A) * P(B|A)
the conditional probability of B given A (NOT independent) is
P(B|A) = P(A,B) / P(A)
Bayes Rule written out
P(Y | X1, …, Xn) = [P(X1, …, Xn | Y) * P(Y)] / P(X1, …, Xn)
Bayes Rule in simpler terms
(likelihood * prior) / normalization constant
likelihood in Bayes Rule
the probability of everything given that it’s Y
prior in Bayes Rule
assume uniform distribution P(Y)
how many parameters are required to specify the prior for the digit recognition example?
one – Y (so what number it is)
how many parameters are required to specify the likelihood for the digit recognition example?
2(2^900 - 1) because for each Y, all X’s could either exist or not exist (0 or 1) for all 900 pixels
the problem with explicitly modeling P(X1,…,Xn | Y) is
there are usually too many parameters (we’ll run out of space, time, and need tons of training data)
Naive Bayes Assumption
assume that all features are independent given the class label Y
Naive Bayes Model equation
P(X1,…,Xn | Y) = the product from i to n of P(Xi | Y)
How to use the Naive Bayes Model:
assume independence and use whichever likelihood is equal to the product of the probabilities
Why is the Naive Bayes Model useful?
changes the number of parameters from exponential to linear
How to train in Naive Bayes
estimate the prior P(Y=v) as the fraction of records with Y=v & estimate P(Xi=u | Y=v) as the fraction of records with Y=v for which Xi=u
In practice, some counts can be zero. Fix this by adding “virtual” counts. This is called
smoothing
What’s nice about Naive Bayes is that it returns probabilities, which tell us ?
how confident the algorithm is
Naive Bayes assumption
all features are independent given the class label Y