Trees Analysis Flashcards
A benefit of a TREE analysis is its visual outcome.
TRUE. This is one of the main reasons of TREES being so widely used.
The benefit of TREES is much clearer when there is not a single causation model for the whole population of interest
TRUE. A TREE can handle complexity in causation in a flexible way.
OVERFITTING is about the risk of a poor “generalization” of our results in new samples
TRUE. This is a way of saying that, if we let the algorithm to “overfit“ we take the risk of getting good results in the training sample and not that good in new samples.
If you increase the minimum number of cases in a parent node to be divided or in a child node, you take a higher risk of overfitting
FALSE. By increasing the minimum of cases needed, we avoid splitting small nodes into even smaller nodes so we limit the flexibility of the tree looking for a better generalization of results
The GAIN of a node in a tree measures the % of “hits” in the node compared to % of “hits” in the whole sample
TRUE. This is the exact definition
CHAID, CRT, TWO-STEP and PCA are all different types of TREES algorithms
FALSE. TWO-STEP is a clustering algorithm and PCA is about factor analysis.
In a CREDIT DEFAULT RISK exercise, a false positive is much more expensive than a false negative
FALSE. A false positive means to reject a credit but a false negative means to give credit to a risky customer
If we predict a YES/NO target using a TREE, the main output we will get in terms of prediction is directly a YES/NO
FALSE. When we use a TREE to predict a YES (Vs NO) target, we will get a “propension” to “YES”, a kind of numerical score that we will later transform into a YES/NO according to a give threshold. Modeler is able to generate this YES/NO automatically (apart from the score) but we have to control the cutoff value as analysists.
An Chi – Squared is used by CHAID tree algorithm to select the best predictors
TRUE. The name itself means Chi-Suqared Automated Interaction Detection. In effect, it uses Chi-Square TEST in order to select the best predictor at each split
TREES can be used to predict a CATEGORICAL variable with more than two categories
TRUE, there is no restriction in the number of categories
We normally have to hold out a part of our dataset / sample to validate our TREES
TRUE. We try to avoid the algorithm to “memorize” the sample of analysis (OVERFITTING) so we test the accuracy of our TREE in a holdout sample (not used to train the model)
We should put a lot of work in pre- selecting the main predictors (as explanatory candidate variables) before launching a TREE analysis.
FALSE. A TREE algorithm has the ability of selecting the best predictors among a long list of candidates. This is, in fact, one of the advantages of this type of algorithm.
TREES are part of the algorithm’s family called “RULE INDUCTION models”
TRUE. That name comes from the idea that these models derive a set of rules that describe distinct segments within the data in relation to the target. The model’s output shows the reasoning for each rule and can therefore be used to understand the decision-making process that drives a particular outcome.
TREES is a kind of “classification” analysis
TRUE. It is used to predict CATEGORICAL targets.
CRT is a bit different to other TREES algorithms because it can be used to predict SCALE targets.
TRUE. In fact, the name CRT comes from Classification & Regression TREE. The word “regression” (Vs Classification) means that it can be used to predict scale variables
TREES analysis can also be understood as a type of CLUSTER algorithm because, at the end, it could be used to find similar groups according to the values of a set of variables.
FALSE. A TREE does not find similar groups according to a set of variables. The segments identified by a TREE (NODES) consist in groups of customers that have similar propension in relation to A TARGET VARIABLE. In this sense, the groups are CONDITIONED to the target variable; we use this target variable as SUPERVISOR for the result.
TREES can be called SUPERVISED technique BUT ONLY if we build the TREE in an interactive way, “supervising” the outcome.
FALSE. The tag “Supervised” is because a variable is used as a SUPERVISOR, as a target, for the whole result.
Imagine that we run a FACTOR analysis using a group of satisfaction variables for our customer dataset. Could we use the factor/s score/s as input variables (may be among others) in a TREES analysis to predict “churn”?
TRUE. Why not? A factor score is a metric variable and WE CAN use metric variables as input variables in a TREE. If we have several different satisfaction indicators, a FACTOR would be a good way of introducing our own “satisfaction measure” in our TREE.
In a SPAM EMAIL FILTER exercise, a false positive means to receive a junk mail in your inbox
FALSE. A false positive is about predicting “SPAM” instead of actual “HAM” so we will move safe email to our junk folder
Normally, it is easy to increase TRUE POSITIVES if you are willing to accept also FALSE POSITIVES
TRUE. If you tend to predict POSITIVES, you will capture TRUE POSITIVES but also FALSE POSITIVES
We call CLASSIFICATION techniques those used to predict or explain SCALE variables
No, classification is for categorical or ordinal variables
A TREE is somewhere in the middle between pure predictive and pure explanatory techniques
YES, it can be used to predict but also to give some information about the determinants of our variable of interest
TREES are flexible classification algorithms in the sense that they can capture complex relationships in the presence of lots of explanatory variables
TRUE. The benefit of TREES is much more clear when there is not a single causation model for the whole population of interest. At the same time, the algorithms are able to discriminate good and bad predictors.
CHAID, CRT, C5 and QUEST are different types of TREES algorithms
TRUE. These are very common TREES algorithms
An F - test is used by CHAID tree algorithm to select the best predictors
FALSE. It uses Chi-Square TEST
We normally hold out a part of our dataset / sample to validate our TREES
TRUE. We try to avoid the algorithm to “memorize” the sample of analysis (OVERFITTING) so we test the accuracy of our TREE in a holdout sample (not used to train the model)
In a CREDIT DEFAULT RISK exercise, a false positive means to give credit to a risky customer
FALSE. If it is about DEFAULT, a false positive means to predict RISK for a safe customer (and thus not to give him credit)
In a SPAM EMAIL FILTER exercise, a false negative means to receive a junk mail in your inbox
TRUE. We predict false “ham” instead of real SPAM ang we let the junk email to enter our inbox
In a CREDIT DEFAULT RISK exercise, a false negative is much more expensive than a false positive
TRUE. Because a false negative means give credit to a risky customer
In a CHAID exercise, lower p-values from chi-squared tests are used to identify and select the best predictors
TRUE. A low p-value for a crosstab chi-square test means evidence of association between predictor and target
Scale variables can also be used as predictors in CHAID analysis
TRUE. Scale variables are automatically transformed to ORDINAL by CHAID algorithm
An interactive session permits the user to grow a tree applying his criteria in the selection of predictors
TRUE. By using this interactive way, the analyst may influence the TREE result for the sake of a better model in terms of the business goal
The more a tree growths, the better the result we get in terms of VALIDATION
FALSE. An excessive growth increases the risk of overfitting (WORSE RESULTS IN TERMS OF EVALUATION)
We normally have to control the tree growth in order to avoid OVERFITTING
TRUE. We need to balance accuracy in the TRAIN and TEST sample
A variable may appear as predictor in a TREE more than one time, in different tree levels
TRUE. Yes, it is possible. Age may appear as the main predictor and appears again in a subset of the sample
A TREE algorithm has the ability of selecting the best predictors among a long list of candidates
TRUE. This is, in fact, one of the advantages of this type of algorithm
For ordinal predictors only adjacent categories are compared and possibly merged in a CHAID analysis
TRUE. This is because, normally, it is nonsense to merge categories that are not adjacent (people below 18 and over 65 for instance)
The GAIN of a node in a tree measures the % of “hits” in the node
FALSE. It measures this % of hits compared to overall % of hits (in the whole sample)
For categorical targets YES/NO, a classification table will always be a 2x2 table
TRUE. YES/NO predicted VS YES/NO observed
Normally, it is easy to increase TRUE POSITIVES if you are willing to accept also FALSE POSITIVES
TRUE. If you tend to predict POSITIVES, you will capture TRUE POSITIVES but also FALSE POSITIVES