Chapter 9 Quiz Flashcards

Question 1

Q

splitting node name

Answer

A

decision node

Question 2

Q

end node name

Answer

A

terminal node

Question 3

Q

drop new observation down until terminal nodes, assign its class by taking a vote/average of all the training data that belonged to the terminal nodes when the tree was grown

Answer

A

decision tree

Question 4

Q

dividing up the p-dimensional space of the X variables into non-overlapping multidimensional rectangles

Answer

A

recursive partitioning

Question 5

Q

what should each rectangle in recursive partitioning be?

Answer

A

as homogeneous/pure as possible

Question 6

Q

impurity reduction

Answer

A

sum of impurities before split minus sum of impurities for resulting rectangles

Question 7

Q

stops tree growth before it starts overfiting, assessed whether splitting a node improves the purity in a statistically significant amount

Question 8

Q

uses validation data to prune tree created by training data

Question 9

Q

tree that minimizes the misclassification error rate of the validation set

Answer

A

minimum error tree

Question 10

Q

smallest tree in pruning sequence with error within one standard error of the minimum error tree

Answer

A

best-pruned tree

Question 11

Q

how are classification rules set up

Answer

A

IF…AND…THEN

Question 12

Q

random forest

Answer

A

fit trees to samples, combine individual predictions and take vote/average

Question 13

Q

boosted trees

Answer

A

each new tree concentrates on misclassification records from the previous tree

Question 14

Q

what are the two measures of impurity?

Answer

A

gini measure
entropy measure

Question 15

Q

gini measure

Answer

A

1 minus sum of observations in rectangle a that belong to call k squared

Question 16

Q

if owner is 50% and non-owner is 50%, what is the gini measure

Answer

A

1- (0.5^2 + 0.5^2) = 0.50

Question 17

Q

if owner is 0% and non-owner is 100%, what is the gini measure?

Question 18

Q

entropy measure

Answer

A

log of gini measure

Question 19

Q

can handle missing values, does not need to standardize

Answer

A

benefits of decision trees

Question 20

Q

structure is unstable (depends too much on training), can overfit, does not look at correlations, needs large dataset to construct good classifier

Answer

A

negatives of decision trees

Question 21

Q

what kind of model is decision tree?

Answer

A

clear box, nonlinear, nonparametric

Question 22

Q

bootstrap aggregating, drawing random samples with replacement (subsets of rows and columns)

Question 23

Q

each tree made independent of the one before it

Question 24

Q

what types of techniques are bagging and boosting?

Answer

A

perturb (make different models) and combine (create a prediction)