Attewell Chapter 2 Flashcards
Is the statement, “… [the] conventional statistical approach focuses on the individual coefficients
for the predictors, and doesn’t care as much about predictive power” true or false?
True
Does the likelihood of finding a statistical significance increase as more predictors are entered into a model?
Yes! One in twenty predictors will be significant at p ≤ 0.05 through chance alone.
How could a researcher solve the problem of multiplicity?
By using a Bonferroni correction
How does the Bonferroni correction for multiple comparisons work?
By simply dividing the conventional value of
0.05 by the number of predictors (If there are 5 predictors, then the new significance threshold would be 0.01)
In what field is the problem of multiplicity especially prominent?
The problem of multiplicity has grown evermore acute in the medical research and analyses of gene sequences, because it is increasingly common for thousands of significance tests to be tried, before reporting the significant ones.
How does the Data Mining approach avoid the multiplicity problem entirely?
By using a form of replication known as Cross-Validation
When are “residuals” said to be “homoscedastic”?
When they are normally distributed, with a constant variance and a mean of zero, and are independent of one another
What does the Greek word “homoscedastic” mean?
Having equal variances
What does the Greek word “heteroscedastic” mean?
Having unequal variances
Does Data Mining provide ways for circumventing the problem of heteroscedasticity?
Yes
Are many Data Mining methods nonparametric?
Yes, because, “…they do not require the kinds of statistical assumptions about the distribution of error terms that underlie many conventional modeling methods”.
Is bootstrapping a type of nonparametric technique?
Yes
Does bootstrapping make assumptions about the shape of the sampling distribution?
No
Does significance testing play a crucial role in the conventional statistical approach?
Yes
When do the most serious problems with significance testing occur?
When modelers add many predictors to models, and especially when they search through hundreds of predictors before deciding which to include in a model