Chp 5 Decision Tree Flashcards

1
Q

Greedy Strategy

A

Split the records based on an attribute test that optimizes certain criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Binary Split

A

Divides values into two subsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Multi-way split

A

Use as many partitions as there are distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to specify continuous attributes

A

sort attribute
create split positions, at halfway points
Determine gini index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The lower the gini index

A

the better it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Gini Index

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Entropy

A

Amount of uncertainty involved in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In decision tree algorithms, entropy measures

A

Purity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Purity

A

The fraction of observations belonging to a particular class in a node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pure node

A

If all observations belong to the same class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Information

A

Amount of certainty involved in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

We need to choose a split in a decision tree that maximizes

A

Information Gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Information Gain formula

A

Entropy before split - Entropy after split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Steps to determine split?

A

Calculate entropy at root node
Calculate each information gain for each attribute split
Pick the attribute split that has the highest information gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Decision trees are non/parametric

A

Non parametric because you are not specifying any parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Time complexity of building a decision tree

A

O(log n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Does multicollinearity affect decision tree accuracy?

A

No, just added extra height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Finding optimal decision tree is

A

expensive, NP complete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Decision tree can create

A

rectilinear boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Decision trees cannot create

A

non straight (only up or down) lines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Model =

A

Algorithm + hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

4 steps of selecting a model

A

Prepare training data
Choose hypothesis set and algorithm
Tune algorithm
Train the model, fit the model to out of sample data (test set) and evaluate results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Goal of model selection

A

Select the best model from training phase

24
Q

Two ways to evaluate models

A

Model Checking
Performance Estimation

25
Q

The best model is the one that gives you _ and _ well on testing set

A

Smallest prediction error
Generalizes

26
Q

Model Checking

A

Given a data set, divide it into training and testing dataset

27
Q

What happens if you randomly select test points that are not representative of the population in general?

A

Cross Validation

28
Q

Cross Validation approaches 3

A

Holdout
K-fold cross validation
Leave one out cross validation

29
Q

Cross validation

A

an approach to
systematically create and evaluate multiple models on
multiple subsets of the dataset

30
Q

Holdout method

A

Split 80% training data and 20% testing data randomly

31
Q

k-fold cross validation

A

Split data into k chunks, train on k-1 chunks and test on the kth chunk. Do this k times and calculate average error

32
Q

Leave one out cross-validation

A

Extreme version of k-fold where k=1 observation (n chunks)

33
Q

Use of kfold or LOOC resampling methods are more robust if

A

data is split into training, validation, and testing

34
Q

Typical application of holdout methods is to

A

determine a stopping point with respect to error, stop when test set error starts increasing (this is overfitting)

35
Q

why split data into three part?

A

If you model has certain hyper parameters, then you can adjust the hyperparameters on the validation dataset

36
Q

two ways to do performance evaluation of a model

A

Confusion Matrix
Receiver Operating Characteristics (ROC) curve

37
Q

Confusion Matrix

A

Provides numerous metrics computed from the matrix

38
Q

ROC curve

A

Characterize the trade off between positive hits and false alarms

39
Q

Can you minimize both FP and FN?

A

No

40
Q

What does the ROC plot?

A

The true positive rate against the false positive rate

41
Q

What does the middle line mean

A

a random guess

42
Q

AUC standards

A

.5-.6 fail
.6-.7 worthless
.7-.8 poor
.8-.9 good
>.9 excellent

43
Q

Overfitting

A

picking up nuances in training data, matches too much with training data

44
Q

Underfitting

A

model is too simple that it cant capture patterns

45
Q

Prepruning

A

halt growth of tree based on some constraint
good for shorter trees
dont know when to stop

46
Q

Post pruning

A

Grow tree to maximum size, then trim
Gives better results
Wastes computer cycles

47
Q

How to prune

A

Focus on complexity parameter (cp), keep splitting until cp reaches a certain value

48
Q

How to find cp

A

Use cross validation error and see where it starts to rise again after decreasing, use cp at this value

49
Q

What does pruning do

A

Gives tree that is more generalized

50
Q

Mostly all datasets have what type of class distribution

A

imbalanced

51
Q

Cost sensitive learning

A

Penalizes the model when it commits a false negative error

51
Q

Mitigation balance techniques 3

A

Cost sensitive learning
Sampling techniques
Synthetic data

52
Q

Synthetic data

A

May be generated, if possible, to ensure that the class distribution is equivalent

53
Q

Sampling techniques

A

modify the class distribution such that the rare class is well represented in the training set

54
Q

Undersampling + con

A

Gathers less of the majority class observations for training
Useful observations may not be part of the sample

55
Q

Oversampling + con

A

gathers more of the minority class observations for training
If training data is noise, oversampling may amplify the noise