Chapter 5: Naive Bayes Flashcards
1
Q
Formal Definition of Classification
A
- Naive Byes = form of classification
- Classification: Given a database π· = {π₯1, π₯2, β¦ , π₯π} of tuples (items, records) and a set of classes πΆ = {πΆ1, πΆ2, β¦ , πΆπ}, the classification problem is to define a mapping π: π· πΆ where each xi is assigned to one class. A class, πΆπ, contains precisely those tuples mapped to it; that is,
πΆπ = {π₯π|π(π₯π) = πΆπ, 1 π π, and π₯π element π·}.- The logistic regression can be used for classification.
- Prediction is similar, but usually implies a mapping to numeric values instead of a class πΆπ
2
Q
Example Applications
A
- Determine if a bank customer for a loan is a low, medium, or high risk customer.
- Churn prediction (typically a classification task). Leaves or not ) Determine if a sequence of credit card purchases indicates questionable behavior.
- Identify customers that may be willing to purchase particular insurance policies.
- Identify patterns of gene expression that indicate the patient has cancer.
- Identify spam mail etc.
3
Q
Algorithms for Classification
A
- (Logistic) Regression
- Rudimentary Rules (e.g., 1R)
- Statistical Modeling (e.g., NaiΜve Bayes)
- Decision Trees: Divide and Conquer
- Classification Rules (e.g. PRISM)
- Instance-Based Learning (e.g. kNN)
- Support Vector Machines
4
Q
1-Rule (1R)
A
- Generate a one-level decision tree
- One attribute β easy to explain
- Basic idea:
- Rules βtestingβ a single attribute
- Classify according to frequency in training data
- Evaluate error rate for each attribute
- Choose the best attribute
- Thatβs all!
- Performs quite well, even compared to much more sophisticated algorithms!
- Often the underlying structure of data is quite simple
βVery simple classification rules perform well on most commonly used datasetsβ Holte, 1993
5
Q
Apply 1R on weather data
A
data:image/s3,"s3://crabby-images/ce3d2/ce3d2c637f7f3fcb55a994000f22553125ec2727" alt=""
6
Q
Other Features of naive bayes
A
- Missing Values
- Include βdummyβ attribute value missing
- Numeric Values
- Discretization :
- Sort training data by attribute value
- Split range of numeric temperature values into categories
- Threshold values 64.5, 66.5, 70.5, 72, 77.5, 80.5, 84 between eight categories
- 72 can be removed and replaced by 73.5 to get a bigger class
- Problem of too many temperature classes -> define a min class size
- Discretization :
- Merge adjacent partitions having the same majority class
data:image/s3,"s3://crabby-images/f5d50/f5d5063f397c22ca881b4294db0cff76adb47469" alt=""
7
Q
NaiΜve Bayes Classifier
A
- 1R uses only a single attribute for classification
- Naive Bayes classifier allows all attributes to contribute equally
- Assumes
- All attributes equally important
- All attributes independent
- This means that knowledge about the value of a particular attribute doesnβt tell us anything about the value of another attribute
- Although based on assumptions that are almost never correct, this scheme works well in practice!
8
Q
Bayes Theorem: Some Notation
A
- Let π(π) represent the prior or unconditional probability that proposition π is true.
- Example: Let π represent that a customer is a high credit risk. π(π) = 0.1 means that there is a 10% chance a given customer is a high credit risk.
- Probabilities of events change when we know something about the world
- The notation π(π|h) is used to represent the conditional or posterior probability of π
- Read βthe probability of e given that all we know is h.β
β- π(π = hππh πππ π| h = π’πππππππ¦ππ) = 0.60
- The notation π(πΈ) is used to represent the probability distribution of all possible values of a random variable
- π(π ππ π) = < 0.7,0.2,0.1 >
9
Q
Conditional Probability and Bayes Rule
A
- The product rule for conditional probablities
- π (π | h) = π(π β© h)/π(h)
- π(π β© h) = π(π|h)π(h) = π(h|π)π(π) (product rule)
- π(π β© h) = π(π)π(h) (for independent random variables)
- Bayesβ rule relates conditional probabilities
- π(π β© h) = π(π|h)π(h)
- π(π β© h) = π(h|π)π(π)
- P(h |e) = P(e |h)P(h) / P(e)
data:image/s3,"s3://crabby-images/4727e/4727ee2147898d740fc2ae47a1bb3f9c9a9898bf" alt=""
10
Q
Bayes theorem
A
data:image/s3,"s3://crabby-images/3e85e/3e85ea6cf86309de89510f2b4a1bbdbc9dc1a43b" alt=""
11
Q
Bayes cancer example
A
data:image/s3,"s3://crabby-images/f509a/f509aece7a11d7df4ce8a2ef963d5f56aa7d323f" alt=""
12
Q
Frequency Tables
A
data:image/s3,"s3://crabby-images/5aa61/5aa61331c0313224903da1ac6ddca738594c7cad" alt=""
13
Q
Naive Bayes β Probabilities
A
data:image/s3,"s3://crabby-images/1ec73/1ec738b14bfe5033f39d89a5427fe8ea564e8905" alt=""
14
Q
Predicting a New Day
A
data:image/s3,"s3://crabby-images/e669f/e669f5007103155b4e7e1a822dbead582a1ed12d" alt=""
15
Q
Naive Bayes - Summary
A
- Want to classify a new instance (π1, π2, β¦ , ππ) into finite number of categories from the set h.
- Choose the most likely classification using
- Bayes theorem MAP (maximum a posteriori classification)
- Are assumptions stronger or less stronger? All attributes are treated equally. So be careful which attribute to include. Maybe it is without any relation and corrupts the data
- Assign the most probable category hππ΄π given (π1, π2, β¦ , ππ), i.e. the and maximum likelihood
- βNaive Bayesβ since the attributes are treated as independent only then you can multiply the probabilities