#2 Flashcards
Intrinsic evaluation
define a metric and check which system does the best
Extrinsic evaluation
check how much the performance is improved when u use the output as an input to a larger system
issues with NB Classifier
a missing feature
out-of-vocabulary words
stop words
how you can deal with a missing feature in the dataset
smoothing
stop words
very frequent, uninformative words
smoothing
add 1 to the count of features
generative classifier
learn a model how the data is generated
discriminative classifier
learn which features best predict a certain class
How to prevent overfitting
by fine-tunning, K-fold-cross validation
Prior probability
represents your initial belief about the likelihood of the hypothesis
likelihood
the evidence that an instance has been generated by a given class
Posterior probability
represents your updated belief about the likelihood of the hypothesis after taking into account the evidence
How NB treats a text document
Bag of Words
Advantage of Rule systems
- Robust
- no need of large dataset
Disadvantage of Rule systems
- Expensive to write
- Require domain knowledge
- Rigit when dealing with ambiguity