Yarkoni & Westfall (2017): Choosing prediction over Explanation in Psychology Flashcards
The goal of scientific psychology is to understand human behaviour. Historically this has meant being able both to explain behaviour—that is, to accurately describe its causal underpinnings—and to predict behaviour. How is this not the case now?
These two goals are rarely distinguished. The understanding seems to be that the two are so deeply intertwined that there would be little point in distinguishing them, except perhaps as a philosophical exercise. According to this understanding, explanation necessarily facilitates prediction.
Is it the case that the model that best approximates the mental processes that produces an observed prediction?
Unfortunately, although explanation and prediction may be philosophically compatible, there are good reasons to think that they are often in statistical and pragmatic tension with one another.
From a statistical standpoint why may explanatory models not have the best precision?
From a statistical standpoint, it is simply not true that the model that most closely approximates the data-generating process will in general be the most successful at predicting real-world outcomes. Overfitting can often lead to a biased, psychologically implausible model to outperform a mechanistically more accurate but also more complex model.
What may scientists in many areas of psychology have to choose between?
(a) developing complex models that can accurately predict outcomes of interest but fail to respect known psychological or neurobiological constraints and
(b) building simple models that appear theoretically elegant but have very limited capacity to predict actual human behavior.
What does this decision mean practically speaking?
A researcher cannot know whether there is a relatively simple explanatory model waiting to be found, so they must decide on a case by case basis what to prioritise; identifying abstract generalisable principles or prediction without caring how that goal is achieved.
Why is it posited that explanation has been. favoured in the past? Why might this change?
Successful predictive science were poorly understood and rarely deployed in most fields of social and biomedical science. Recent advances in machine learning where prediction of unobserved data is the gold standard and explanation is typically of little interest as well as the availability of large datasets of human behaviour.
What are the two separate senses in which psychologists have been deficient when it comes to predicting behaviour?
First, research papers in psychology rarely take steps to verify that the models they propose are capable of predicting the behavioral outcomes they are purportedly modelling.
Second, there is mounting evidence from the ongoing replication crisis that the published results of many papers in psychology do not, in fact, hold up when the same experiments and analyses are independently conducted at a later date
Instead of testing predictions, what are psychological models typically evaluated on?
Instead, research is typically evaluated based either on “goodness of fit” between the statistical model and the sample data or on whether the sizes and directions of certain regression coefficients match what is implied by different theoretical perspectives.
What implications do the lack of replicability have for prediction?
Models that are held up as good explanations of behavior in an initial sample routinely fail to accurately predict the same behaviors in future samples—even when the experimental procedures are closely matched.
What is likely the reason for this replication failure?
P-hacking
large number of psychology articles prominently feature the word prediction in their titles. What is the problem with this?
Such assertions reflect the intuitive idea that a vast range of statistical models (e.g regression) are, in a sense, predictive models. When a researcher obtains a coefficient of determination of, say, b0, 0, and thus reports that she is able to “predict” 50% of the variance of x using y predictors, she is implicitly claiming that she would be able to make reasonably accurate predictions about x for a random person in the same underlying population. The problem lies in the inference that the parameter estimates obtained in the sample at hand will perform comparably well when applied to other samples drawn from the same population. Instead, the R^2 statistic answers whether in repeated random samples similar to this one, if one fits a model with the form of equation 1 in each new sample—each time estimating new values of b0, b1, and b2—what will be the average proportional reduction in the sum of squared errors? In other words, R^2 does not estimate the performance of a specific equation 2 but rather of the more general equation 1. It turns out that the performance of equation 1 is virtually always an overly optimistic estimate of the performance of equation 2.
Why is the performance of equation 1 is virtually always an overly optimistic estimate of the performance of equation 2?
The values of b0, b1, and b2 estimated in any given sample are specifically selected so as to minimise the sum of squared errors in that particular sample.
Why does machine learning not have the same problems with overfitting?
They train it against a different test dataset and distinguish training error and test error
When are the problems of overfitting negligible and when is it most pronounced?
When predictors have strong effects and researchers fit relatively compact models in large samples, overfitting is negligible. As the number of predictors
increases and/or sample size and effect size drop, overfitting increases, sometimes dramatically.
What do Yarkon and Westfall refer to as procedural overfitting?
P-hacking can be usefully conceptualised as a special case of overfitting. Specifically, it can bethought of as a form of procedural overfitting that takes place prior to (or in parallel with) model estimation