Explain vs Predict, Data Preprocessing and Predictive Modeling Flashcards

1
Q

What is the goal of:

Explanatory modeling?

A

Explanatory modeling is a theory based approach to test causal hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of:

Predictive modeling?

A

Predictive modeling is using data science methods to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is explanatory modeling evaluated?

A

Explanatory modeling is evaluated by the strength of relationship in statistical model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is explanatory modeling evaluated?

A

Explanatory modeling is evaluated by the strength of relationship in statistical model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is predictive power evaluated?

A

Predictive power is evaluated by ability of the model to accurately predict new observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does data preparation for ‘explanatory modeling’ and ‘predictive modeling’ differ concerning ‘missing values’?

A

Explanatory modeling: throw away

Predictive modeling: throwing away is not an option if we need to make a prediction for these, since it can even be predictive information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does data preparation for ‘explanatory modeling’ and ‘predictive modeling’ differ concerning ‘data partitioning’?

A

Predictive: test set (we’ll come back to this) crucial: How well can we predict on new, unseen data instances?

Explanatory: much less common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does data preparation for ‘explanatory modeling’ and ‘predictive modeling’ differ concerning ‘the choice of variables’?

A

Explanatory: operationalization of constructs

Predictive: more broad, but it must be available at time of prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference in methods for Explanatory vs Predictive Modeling?

A

Explanatory: interpretable, statistical methods

Predictive: accurate data mining methods (neural networks, random forests, etc. But also logistic regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference in validation of the results of Explanatory vs Predictive Modeling?

A

Explanatory modeling: model fit, R-squared

Predictive modeling: generalization, (test)-accuracy, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is wrong with this statement:

“We checked for multicollinearity of the input variables before their use for prediction.”

A

For statistical exercices the coefficient estimates of the multiple regression may change erratically.
Multicollinearity will not affect the ability of the model to predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is wrong with this statement in data mining:

“Variables income and car_brand were very explanatory for the model.”

A

Variables are not explanatory but predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is:
Sampling?

Why would you do it?

A

Sampling is the act, process or technique of selecting a suitable sample or a representative part of the population for the purpose of determining parameters or characteristics of the whole population.

There are various reasons to do so: 
Economic advantage: less costs
Time factor: less time, quickly
Large Populations and partly accessible populations
Computation Power Required
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is:

Descretization?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is:

Normalization?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are:

Missing Values?

A
16
Q

What are:

Outliers?

A
17
Q

What is:

Encoding?

A
18
Q

What is:

Stratified sampling?

A

Stratified sampling is the act of giving the same distribution (of some variable/s) in the sample as it is in the population.

19
Q

What is:

kNN?

A

kNN, or short for K-Nearest Neighbor, is a technique that classifies an instance according to the most similar observation in (for example) Euclidean distance. So the predicted class is the same as the most similar observation.

20
Q

What is a problems of kNN?

A

A model could potentially assume order in a nominal variable, which is not there, Euclidian distances won’t work for the kNN-technique.

Solution for nominal variable: dummy encoding

21
Q

What is another problem is kNN?

A

A model could have multiple variables of a higher order (income vs amount of cars in a household), so these have a higher weight, without being more important in the distance.

Solution: Thermometer encoding or standardization