#2 Flashcards
Intrinsic evaluation
define a metric and check which system does the best
Extrinsic evaluation
check how much the performance is improved when u use the output as an input to a larger system
issues with NB Classifier
a missing feature
out-of-vocabulary words
stop words
how you can deal with a missing feature in the dataset
smoothing
stop words
very frequent, uninformative words
smoothing
add 1 to the count of features
generative classifier
learn a model how the data is generated
discriminative classifier
learn which features best predict a certain class
How to prevent overfitting
by fine-tunning, K-fold-cross validation
Prior probability
represents your initial belief about the likelihood of the hypothesis
likelihood
the evidence that an instance has been generated by a given class
Posterior probability
represents your updated belief about the likelihood of the hypothesis after taking into account the evidence
How NB treats a text document
Bag of Words
Advantage of Rule systems
- Robust
- no need of large dataset
Disadvantage of Rule systems
- Expensive to write
- Require domain knowledge
- Rigit when dealing with ambiguity
4 terms of learning
generalize (understanding new info) ,
infer( use it within different situations),
analogize (interpret the results),
adaptive ( the moving system adapts quickly when the environment is changing)
Discriminative classifier
Learn which features best predict a certain class
Text simplification
Mapping a text into another text
main difference between intrinsic and extrinsic evaluation
- what was your algorithm trained to do
Intrinsic: how well the system is doing what’s trained to do
Extrinsic: using the system in a larger environment, how well the algorithm is performing on a bigger environment
Does bag of words crack co-occurence
No, bag of words does not crack co-occurrence
Why NBC is a “naive”
because assumes that the features are conditionally independent of each other
out-of-vocabulary words
words that appear at the test set, but not in the training set
how to treat OOV words?
-ignore them
- make a dedicated feature in the training set