3. Generative models for discrete data Flashcards
<b>The beta-binomial</b>
Likelihood
- what’s the sufficient statistics?
- likelihood in beta-binomial model
p. 75
<b>The beta-binomial</b>
Prior
- what’s the conjugate prior?
- what’s the conjugate prior of the Bernoulli (or Binomial) distribution?
- how are the parameters of the prior called?
- exercise 3.15
- exercise 3.16
- hyperparameters in uniform prior the the beta-binomial model
p. 76
<b>The beta-binomial</b> Posterior - posterior in beta-binomial - what pseudo counts are? - what's equivalent sample size? - posterior MAP - posterior MLE - when MAP = MLE? - posterior mean - posterior variance
p. 77
<b>The beta-binomial</b> Posterior predictive distribution - p(x|D) - add-one smoothing - beta-binomial distribution (def, mean, var)
p. 79
<b>The Dirichlet-multinomial model</b>
Likelihood and prior
p. 81
<b>The Dirichlet-multinomial model</b>
Prior
p. 81
<b>The Dirichlet-multinomial model</b>
Posterior
- MAP and MLE
p. 81
<b>The Dirichlet-multinomial model</b>
Posterior predictive
p. 83
<b>Naive Bayes</b>
- NBC definition
- binary, categorical, and real-valued features
p. 84
<b>Naive Bayes</b> Model fitting - log p(D|theta) - MLE - BNBC
p. 85
<b>Naive Bayes</b>
Using the model for prediction
- p(y=c|x,D)
- special case if the posterior is Dirichlet
- what if the posterior is approximated by a single point?
p. 87
<b>Naive Bayes</b>
The log-sum-exp trick
p. 88
<b>Naive Bayes</b>
Feature selection using mutual information
p. 89
<b>Naive Bayes</b> Classifying documents using bag of words - Bernoulli product model (binary independence model) - x_ij and theta_jc interpretation - adapt the model to use the number of occurrences of each word - burstiness phenomenon - Dirichlet Compound Multinomial (DCM) - What's Polya urn?
p. 90