Attewell Chapter 3 Flashcards
“…searching through data until one finds statistically significant relations” is called ______.
Data dredging
Is the statement: “Researchers should generate their hypothesis before beginning their statistical analyses” True or False
True
What is a group or a random amount of observations known as?
the training sample
“Cross-validation can be thought of as a type of quality control for DM models”. True or False?
True
Does cross-validation require a very large dataset?
It does not ~require~ it.
A plot of predicted values against observed values should be a _______ line, if the model is calibrated.
straight
An uncalibrated model resembles a _______ line.
curved
“A researcher tries to identify variables that produce the curved pattern, adding those to the regression model in order to correct the curvature” True or False?
True
____ refers to the accuracy of a predictive model.
fit
An ideal ROC model “closely follows the Y-axis on the left and then sharply turns parallel to the X axis”. True or False?
True
Ensemble learning refers to the act of combining several predictive models to provide the best possible prediction. True or False?
True
Binning treats a dataset as if it were a population, rather than a sample. True or False?
False (its Bagging… not Binning!)
Averaging several generated tree models to obtain the best prediction refers to random forests. True or False?
True
Are large datasets enough to allow for a comprehensive/exhaustive search for structure?
No… “no big data is big enough”