Classification - Part 2 Flashcards

Question

What are some common practical approaches for cross validation?

Answer 1

- Use stratified sampling to generate subsets - k = 10 because from experience it delivers accurate estimate while still using as much data for training as possible - Very computational intensive, especially in combination with random forests

Answer 2

- Labeled dataset is large (>5000 examples) and | - long computation time or exact replicability of results matters (for data science competitions)

Answer 3

- Accuracy

Answer 4

- Precision - Recall - and F1

Answer 5

1. If imbalanced dataset, balance it. Oversample the number of positive examples. 2. Optimize hyperparameters of the learning algorithm 3. avoid overfitting

Answer 6

- Is a eager learning approach which delivers explainable results - Classify records by using a collection of "if.. then..." rules Rule-based classifier = set of classification rules

Answer 7

Classification rule: Condition -> y condition: conjunction of attribute tests y: class label

Answer 8

- Fraction of all records that satisfy the condition of a rule - Fraction of all records that are covered by the rule

Answer 9

- Fraction of covered records that are classified correctly

Answer 10

Mutually Exclusive Rule Set: - the rules contained in the classifier are independent of each other - every record is covered by at most one rule Exhaustive Rule Set: - classifier has exhaustive coverage if it accounts for every possible combination of attribute values - each record is covered by at least one rule

Answer 11

Solution 1: Ordered Rules - by accuracy - classify according to highest-ranked rule Solution 2: Voting - all matching rules vote and assign majority class label - votes may be weighted by rule quality (accuracy)

Answer 12

- Add default rule: () -> Y

Answer 13

1. Direct Method - Extract rules directly from data - e.g. RIPPER 2. Indirect Method - Extract rules from other classification models - e.g. C4.5rules

Answer 14

- Generate a rule for every path from the root to one of the leave nodes - Rule set contains as much information as the tree -> Generated rules are mutually exclusive and exhaustive!

Answer 15

It applies rule simplification to the rule set. This makes the rule set no longer mutually exclusive. Thus, need to apply ordered rule set or voting schemes. Approach: 1. extract rules from an unpruned decision tree 2. for each rule: 1. consider alternative rule by removing one of the conjuncts 2. compare the pessimistic error rate for r against all r's 3. prune if one of the r's has lower pessimistic error rate 4. repeat until we can no longer improve generalization error

Answer 16

- learns a ordered rule set from training data | - approach depends on 2-class or multi-class problem

Answer 17

- Choose the less frequent class as positive class and the other as negative class - learn rule for the positive class - negative class will be default class

Answer 18

- Order classes according to increasing frequency - Learn the rule set for smallest class first, treat rest as negative class - repeat with next smallest class as positive class

Answer 19

1. Start from a empty rule list 2. Grow a rule that covers as many positive examples as possible 3. remove training records covered by the rule 4. Repeat step 2 and 3 till stopping criterion is met

Answer 20

1. Rule Growing 2. Rule Pruning 3. Instance Elimination 4. Stopping Criterion

Answer 21

1. Start with empty rule {} -> class 2. Step by step add conjuncts that (based on FOIL's information gain measure) a. improve accuracy of the rule b. rule covers many examples Goal: Prefer rules with high accuracy and high support count

Answer 22

- Because of the stopping criterion, the learned rule is likely to overfit the data - > Rule is pruned afterwards using a validation dataset (similar to post-pruning of decision trees)

Answer 23

1. Remove one of the conjuncts in the rule 2. compare error rate on validation dataset before and after pruning 3. if error improves, prune the conjunct Measure for pruning: v= (p - n) / (p + n) p: # positive examples covered by the rule in validation set n: # negative examples covered by the rule in the validation set

Answer 24

Goal: Decrease generalization error of the rule

Answer 25

- Otherwise the next rule is identical to the previous rule

Answer 26

- To prevent underestimating accuracy of a rue

Answer 27

- error rate of new rule on validation set must not exceed 50% - minimum description length should not increase more than d bits

Answer 28

- Easy to interpret for humans (eager learning) - Performance comparable to decision trees - Fast classification of unseen records - Well suited to handle imbalanced data sets, because they learn rules for minority class first

Classification - Part 2 Flashcards

(52 cards)