Chapter 19 Naive Bayes Flashcards
WHAT ARE THE ASSUMPTIONS OF NAÏVE BAYES and GAUSSIAN NAÏVE BAYES? P93,P95
The attributes don’t interact (Nevertheless, the approach performs surprisingly well on data where this assumption does not hold.)
Gaussian distribution (for Gaussian Naïve Bayes)
WHAT IS THE PURPOSE OF DEVELOPING GAUSSIAN NAÏVE BAYES? P95
To extend Naïve Bayes to real-valued attributes
HOW CAN WE PREPARE DATA FOR NAÏVE BAYES? P96
Categorical Inputs: Naive Bayes assumes label attributes such as binary, categorical or nominal.
Gaussian Inputs: If the input variables are real-valued, a Gaussian distribution is assumed. In which case the algorithm will perform better if the univariate distributions of your data are Gaussian or near-Gaussian. This may require removing outliers (e.g. values that are more than 3 or 4 standard deviations from the mean).
Classification Problems: Naive Bayes is a classification algorithm suitable for binary and multiclass classification.
Log Probabilities: The calculation of the likelihood of different class values involves multiplying a lot of small numbers together. This can lead to an underflow of numerical precision. As such it is good practice to use a log transform of the probabilities to avoid this underflow.
Kernel Functions: Rather than assuming a Gaussian distribution for numerical input values, more complex distributions can be used such as a variety of kernel density functions.
Update Probabilities: When new data becomes available, you can simply update the probabilities of your model. This can be helpful if the data changes frequently.
WHICH QUANTITIES ARE CALCULATED FROM THE DATASET FOR THE NAÏVE BAYES MODEL? P99
Class Probabilities.
Conditional Probabilities.
WHAT DOES GAUSSIAN NAÏVE BAYES USE FOR PREDICTING PROBABILITIES? P104
The Gaussian Probability Density Function (PDF) will calculate the probability of a value given the mean and standard deviation of the distribution from which it came.