Foundation Flashcards
How does Predictive Modelling work?
Predicts OUTCOME (target), based on set of INPUTS
What are the 2 types of Prediction?
Classification, Estimation
What criteria is needed to use “Classification”?
Target must be CATEGORICAL
Hint: “Class” in “Classification” => Categorical
What criteria is needed to use “Estimation”?
Target must be CONTINOUS (numerical)
Hint: “Estimate” => Numbers hence, continuous
What percentage should we split the data?
70% Training, 30% TESTING
Using what node in SPSS Modeler can we split the data?
Partition node
Why is it neccessary to SPLIT data?
Aim of predictive model: It should be trained to be accurate on UNSEEN data
What is UNSEEN data? It is the TRAINING data!
Why we use “Seed” in IBM SPSS?
It helps to RANDOMLY select records (datarows) to be either training/testing data.
Why remember the exact seed number?
To ensure that it does not randomly select a record to be training/testing data
This further ensures that the model’s result is consistent as the same records are chosen to be training and testing data respectively.
What does CART stand for?
Classification And Regression Tree
When to use Regression Tree?
When it is to estimate (the type of predict is to ESTIMATE numerical target)
What are the 2 types of impurities to measure for Classification?
Gini Index, Entrophy
What are the impurity measure for Regression?
Sum of Squared Error (SSE)
Is lower or higher GINI better?
Lower! Because Gini shows impurity. Gini = 1-purity.
Why is lower GINI more desirable?
The nodes are more homogenous = better prediction = better model