Chapter 9 Quiz Flashcards

1
Q

splitting node name

A

decision node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

end node name

A

terminal node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

drop new observation down until terminal nodes, assign its class by taking a vote/average of all the training data that belonged to the terminal nodes when the tree was grown

A

decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

dividing up the p-dimensional space of the X variables into non-overlapping multidimensional rectangles

A

recursive partitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what should each rectangle in recursive partitioning be?

A

as homogeneous/pure as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

impurity reduction

A

sum of impurities before split minus sum of impurities for resulting rectangles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

stops tree growth before it starts overfiting, assessed whether splitting a node improves the purity in a statistically significant amount

A

CHAID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

uses validation data to prune tree created by training data

A

CART

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

tree that minimizes the misclassification error rate of the validation set

A

minimum error tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

smallest tree in pruning sequence with error within one standard error of the minimum error tree

A

best-pruned tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how are classification rules set up

A

IF…AND…THEN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

random forest

A

fit trees to samples, combine individual predictions and take vote/average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

boosted trees

A

each new tree concentrates on misclassification records from the previous tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the two measures of impurity?

A

gini measure
entropy measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

gini measure

A

1 minus sum of observations in rectangle a that belong to call k squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

if owner is 50% and non-owner is 50%, what is the gini measure

A

1- (0.5^2 + 0.5^2) = 0.50

17
Q

if owner is 0% and non-owner is 100%, what is the gini measure?

18
Q

entropy measure

A

log of gini measure

19
Q

can handle missing values, does not need to standardize

A

benefits of decision trees

20
Q

structure is unstable (depends too much on training), can overfit, does not look at correlations, needs large dataset to construct good classifier

A

negatives of decision trees

21
Q

what kind of model is decision tree?

A

clear box, nonlinear, nonparametric

22
Q

bootstrap aggregating, drawing random samples with replacement (subsets of rows and columns)

23
Q

each tree made independent of the one before it

24
Q

what types of techniques are bagging and boosting?

A

perturb (make different models) and combine (create a prediction)