Building a feature-based Classifier Flashcards
What is a classifier? What is its goal?
An agent which takes some input data and predicts the class of the data. Make as few mistakes as possible and represent its belief
How do we extract a simple set of features?
average each MFCC across all time segments
What approach is used to build a classifier?
Bayes theorem. Find p(Class | feature x observed)
Give Bayes Theorem
p(a | b) = p(b | a)p(a) / p(b | a)p(a) + p(b | c)p(c) + …
What are p(x | C_1) and p(x | C_2) ?
probability density functions
What is a probability density function p(x)? What is the mean? What is the variance?
represents the probability p(x).dx that the variable x lies in the interval [x, x+dx]
mean = integral from -infinity to infinity x.p(x).dx variance = standard deviation^2 = integral from -infinity to infinity (x - mean)^2p(x).dx
What is the probability density function of the normal distribution?
p(x) = 1 / root(2 pi standarddev^2) e^(-(x - mean)^2 / 2.standarddev^2)
How do we approximate p(x | C_1) and p(x | C_2)?
normal distributions fitted to some training data
How do we fit a normal distribution to data? How do we calculate these?
set its mean and variance equal to the empirical mean and variance of the data.
M = 1/n sum x from 1 to n S^2 = 1/n sum (x(n) - m)_2 from 1 to n
What is the probability of the feature vector x interpreted to mean? How can we think of the probability?
p(x) = p(x_1 and x_2 and … and x_d)
The probability can be thought of as a function with d arguments
How do we create a classifier from multiple features? What is this known as?
Bayes theorem still: p(c1 | x) = p(x | c1)p(c1) / p(x | c1)p(c1) + p(x | c2)p(c2)
we calculate p(x | c1) and p(x | c2) as:
p(x | c1) = p(x1 | c1)p(x2 | c1)…p(xd | c1)
p(x | c2) = p(x1 | c2)p(x2 | c2)…p(xd | c2)
Naive Bayes classifier
Why is a naive Bayes classifier naive?
Because of the assumption that features are conditionally independent given knowledge of the class