Classification Flashcards
What are the two types of Classification methods, build tree first or create model for each query.
Eager builds the tree first. Creating model takes a while, processing queries doesn’t.
Lazy doesn’t build a tree. Queries take longer. Lazy adapts better to new data.
What is entropy for a 50/50 split?
What is entropy for a 25,25,25,25 split (one attribute)?
What is entropy for two 50/50 splits?
50-50: 1.
25,25,25,25: 2
(50/50)(50/50): 1
Information Gain and Gain Ratio
Gain: old entropy - new entropy
Ratio: new Entropy / Old Entropy
What is Gini for 50/50 split?
Gini approaches 1 as the split gets worse.
50/50: .5
.25,.25.25.25: .75
Three ways to turn a numerical value into an ordinal value. (Whats this called?)
Discretizing:
1) Equal Width: select arbitrary range size
2) Equal Depth: Select cluster size
3) Distance-based ???
Whats an entropy based Discretization method
Done recursivly. For all possible splits in your data, find the split that makes entropy lowest. Repeat on either side of split…
Do as long as information gain is greater than threshold
How do you pre-prune or post-prune a decision tree?
Prepruning: accomplished by an information gain threshold
Post-pruning: Sub-tree replacement.
Sub-tree raising.
3 methods for dividing sample and test data
Holdout: divide a dataset into 2/3 training 1/3 test.
Random Sampling: Train on 100% of data, test on random subsets.
Cross Validation. (K-fold subsets)
Discribe a method to test your classifier that penalizes mis-predictions
Cost matrix. Confusion matrix where costs are assigned to each quadrant. In cancer screen example, TP = -1 FP = 1 FN = 100 TN = 0
Whats the rule-based eager method?
Sequential Covering Algorithm.
Ponder: “As rules grow, certanty increases, coverage decreases.”
What weakness of trees does Sequential Covering alg really beat
Subtree Duplication:
Two attributes of a good rule set
Exclusive and Exaustive:
Exclusive: no entry wll match two rules.
Exaustive: any inbound query will hit.
What two attributes do you ranke rules on?
Coverage: Fraction of records that satisfy Antecedent.
Accuracy: Fraction of records that satisfy both!
If X, then Y.
Lots of X’s, high coverage. Lots of XY, then good Accuracy
Naive Bayes probability summary
For all attributes, you need the percentage of (d1 | X). So of 9 days to play golf, 3 of them were sunny. 1/3.
Multiply all of those times the P(X)/P(d1d2d3X)
Remind yourself how to handle zeros in data
Its that shit where you add one of each ordinal…
SVM: Support Vector Machine basics
This is the one that trys to find a hyperplane to classify all records on.
What are four applications of classification analysis
Explaining Why. Credit approval Target Marketing (likely to buy) Is medical treatment effective? Medical Diagnosis
What is KNN classification?
K-Nearest Neighbors. I’m betting it is cosine of attribute vectors.
List all the lazy and non-lazy methods you can think of.
Lazy: KNN, Range-query (similar to knn). Naive Bayse
Eager: Support vector machine?
Tree Generation
Sequential Covering Algorithm
Supervised versus Unsupervised learning
Supervised (like tree generation) Training data has labels or classes. This is classification. Unsupervised, class labels are unknown, try to discover classes/labels. (clustering)
What is an advantage of Gain Ratio over Information Gain
Gain ratio tries to penalize larger splits (split on first-name example)
What is a class-based discretion process
Place breakpoints in a numeric data. 1). Place breakpoints in between numeric values where the class changes. 2). Set a minimum number of values to have in a class. Place breakpoints where majority of class changes. (YNYYY) (NNYYY) (NYYN)