Chapter 8 Flashcards

Question

continous valued partition

Answer 1

there is a slit point resaulting in two partitions either below or over the split

Answer 2

A question is asked and the outcome is the partition or line

Answer 3

splitting rules

Answer 4

the attribute habving the best score fot the measure, either maxiamise or minimize, is chosen for the given tuple.

Answer 5

is the state of disorder, confusion, surprise. uncertainty, and disorganization ( second law of thermal dynamics, ethropy increases over time)

Answer 6

less heterogeneous, more homogenous thge event, less uncertainty

Answer 7

the probability of the event

Answer 8

is the heterogeneity or the imurity denoted by H(X) - Entropy (impuraty measure)

Answer 9

pr(x=0) no uncertainty, pr(x=0.0) maximum uncerainty, pr(x=1) no uncertainty

Answer 10

higher uncertainty

Answer 11

lower uncertainty

Answer 12

is just the average amount of infromation needed to identify the class label of a tuple in D

Answer 13

minimizes the information needed to classify truple in the resulting partitions, and reflects the least randomness or impurity in the partitions

Answer 14

defined as the differnce between the priginal information required ( based on just the proportion of classes) and the new requirement (obtained after partitioning on A)

Answer 15

tells us how much would be ained by brachinbg on A. it is the expected reduction in the information requirement caused by knowing the value of A. THe attribute A with the highest information gain is chosen as the splitting attribute node N

Answer 16

information gain meaure is biased towars atrributes with a large number of values. IT uses gain ratio to overcome the problem (normalization to information gain)

Answer 17

the attribute is slected as the splitting attribute

Answer 18

if a data set D contins examples form n classes =, giuni index , gini(d) is define The gini index considers a binary split for each attribute

Answer 19

the larges reduction in impuity is chosen to split the node

Answer 20

biased towards multivaalued attributes ( attributes with a large nu,ber of values)

Answer 21

tends to prefer unbalanced splits in which one p[partition is much smaller than the others

Answer 22

biased to multivalued attributes has difficulty when number of classes is large ttends to favor test that result in equal-sized portions and purity in both partitions

Answer 23

how can we measure accuracy? other metrics to consider?

Answer 24

tuples of the main class of interest

Answer 25

all other tuples

Answer 26

the postitve tuples that were correctly labled by the classsifier

Answer 27

the negative tuples that were correctly labled by the classifier

Answer 28

the negaive tuples that were incorrectyl labled as psoitive

Answer 29

the poistive tiuples that were mislabled as negaive

Answer 30

error rate on training set insted of a test set

Answer 31

percentage of test set tuples that are correctly classified (TP + TN)/All

Answer 32

misclassification rate; 1-accuracy | (fp+fn)/All

Answer 33

``` one calss may be rare, e.g. fraud or cancer significant majority of the negative class and minority of the positive class ```

Answer 34

true pisitve recognition rate; tp/p

Answer 35

true negatice recognition rate; tn/n

Answer 36

exactness, what % of tuples that the classifier labeled as positive are actually psitive

Answer 37

completeness, what % of positive tuples did the classifier label as positive, the perfect score is 1.0

Answer 38

1.0 for a class C means that every tuple that the classifier leaves as belonging to class C does indeed being to class C. it does not tell us anything about the number of class C tuples that the classifier mislabled

Answer 39

1.0 for C means that very item from class C was labeled as such, but ut does not tell us how many thoeth tuples were incorrectly labeled as belonging to class C.

Answer 40

combine precision and recall in one formula, harmonic mean of precision and recall gives equal weight to ptrecision and recall

Answer 41

classifier accuracy, predicting class label

Answer 42

time to sonstruct the model, taining time | time to use the model. classification or prediction time

Answer 43

handling noise and missing values

Answer 44

efficieny in disk resident database

Answer 45

undersatading and insight provided by the model