Lecture 6 - Naïve Bayes Flashcards
Classification & clustering
Both result in a categorisation of records into one or more classes based on their values
Classification
- Trains a model that allows classifying new records to one of the classes
- Assumes the existence of predefined classes
Clustering
- Divides the records into clusters
- Records with high similarity reside inside a cluster and records of two clusters are dissimilar
Example of classification & clustering of e-mails
Classification: does e-mail go to inbox or spam?
Clustering: e-mail to work, friends or family folder?
Why do we need classification?
- Organising documents is hard work
- Route email messages into folders
- Route help-desk inquiries to correct staff
- Place documents in predefined categories/topic hierarchy
- Decided about (predefined) user interests/skills/…
- User modelling
- Instead of using human-authored expert system, let computer to induce rules or models from log data
Classification: Learning & applying
Classification Techniques
- Naïve Bayes
- Nearest Neighbor
- Decision Trees
- Support Vector Machines
- Logistic regression
- Deep learning
- Ensemble classification
- …
Naïve Bayes example
Naïve Bayes Classifier
Bayes theorem
Bayes rule is a standard formula for inverting conditional probabilities
Naïve Bayes Assumption
Naive conditional independence: assume that all features are independent given the class label y
Laplace Smoothing
- Having a probability zero is problematic, because it wipes out all information in other probabilities
Solution:
Laplace Smoothing, or Correction, or Estimator
- Incorporates a small-sample correction in every probability computation
- Increase the numerator/denominator
- Thus, no probability will be zero
Lecture Summary
Naïve Bayes is not so Naïve:
- Its beauty is in its simplicity
- Ability to handle categorical variables directly
- Computational eefficient
- Good classification performance, especially when the number of predictors is very large
Negative aspects:
- Requires a very large number of records to obtain good results
- Independence assumption may not hold for some attributes