Explain vs Predict, Data Preprocessing and Predictive Modeling Flashcards
What is the goal of:
Explanatory modeling?
Explanatory modeling is a theory based approach to test causal hypotheses.
What is the goal of:
Predictive modeling?
Predictive modeling is using data science methods to make predictions.
How is explanatory modeling evaluated?
Explanatory modeling is evaluated by the strength of relationship in statistical model.
How is explanatory modeling evaluated?
Explanatory modeling is evaluated by the strength of relationship in statistical model
How is predictive power evaluated?
Predictive power is evaluated by ability of the model to accurately predict new observations.
How does data preparation for ‘explanatory modeling’ and ‘predictive modeling’ differ concerning ‘missing values’?
Explanatory modeling: throw away
Predictive modeling: throwing away is not an option if we need to make a prediction for these, since it can even be predictive information.
How does data preparation for ‘explanatory modeling’ and ‘predictive modeling’ differ concerning ‘data partitioning’?
Predictive: test set (we’ll come back to this) crucial: How well can we predict on new, unseen data instances?
Explanatory: much less common
How does data preparation for ‘explanatory modeling’ and ‘predictive modeling’ differ concerning ‘the choice of variables’?
Explanatory: operationalization of constructs
Predictive: more broad, but it must be available at time of prediction
What is the difference in methods for Explanatory vs Predictive Modeling?
Explanatory: interpretable, statistical methods
Predictive: accurate data mining methods (neural networks, random forests, etc. But also logistic regression)
What is the difference in validation of the results of Explanatory vs Predictive Modeling?
Explanatory modeling: model fit, R-squared
Predictive modeling: generalization, (test)-accuracy, etc.
What is wrong with this statement:
“We checked for multicollinearity of the input variables before their use for prediction.”
For statistical exercices the coefficient estimates of the multiple regression may change erratically.
Multicollinearity will not affect the ability of the model to predict.
What is wrong with this statement in data mining:
“Variables income and car_brand were very explanatory for the model.”
Variables are not explanatory but predictive
What is:
Sampling?
Why would you do it?
Sampling is the act, process or technique of selecting a suitable sample or a representative part of the population for the purpose of determining parameters or characteristics of the whole population.
There are various reasons to do so: Economic advantage: less costs Time factor: less time, quickly Large Populations and partly accessible populations Computation Power Required
What is:
Descretization?
What is:
Normalization?