Lecture 6 - Naïve Bayes Flashcards

1
Q

Classification & clustering

A

Both result in a categorisation of records into one or more classes based on their values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification

A
  • Trains a model that allows classifying new records to one of the classes
  • Assumes the existence of predefined classes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Clustering

A
  • Divides the records into clusters
  • Records with high similarity reside inside a cluster and records of two clusters are dissimilar
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Example of classification & clustering of e-mails

A

Classification: does e-mail go to inbox or spam?

Clustering: e-mail to work, friends or family folder?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we need classification?

A
  • Organising documents is hard work
    • Route email messages into folders
    • Route help-desk inquiries to correct staff
    • Place documents in predefined categories/topic hierarchy
  • Decided about (predefined) user interests/skills/…
    • User modelling
    • Instead of using human-authored expert system, let computer to induce rules or models from log data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Classification: Learning & applying

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Classification Techniques

A
  • Naïve Bayes
  • Nearest Neighbor
  • Decision Trees
  • Support Vector Machines
  • Logistic regression
  • Deep learning
  • Ensemble classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Naïve Bayes example

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Naïve Bayes Classifier

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bayes theorem

A

Bayes rule is a standard formula for inverting conditional probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Naïve Bayes Assumption

A

Naive conditional independence: assume that all features are independent given the class label y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Laplace Smoothing

A
  • Having a probability zero is problematic, because it wipes out all information in other probabilities

Solution:

Laplace Smoothing, or Correction, or Estimator

  • Incorporates a small-sample correction in every probability computation
  • Increase the numerator/denominator
  • Thus, no probability will be zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lecture Summary

A

Naïve Bayes is not so Naïve:

  • Its beauty is in its simplicity
  • Ability to handle categorical variables directly
  • Computational eefficient
  • Good classification performance, especially when the number of predictors is very large

Negative aspects:

  • Requires a very large number of records to obtain good results
  • Independence assumption may not hold for some attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly