Confirmatiry Data Analysis Flashcards
Confirmatory data analysis
Using research methods & quantitative method to verify/probe existing theories
Specify all research questions & hypotheses a priori
In hardline CDA would never use post hoc tests with corrections
Tukey, CDA & Juries
Can think of research & data analysis like justice system
EDA is generating indictments of suspects
CDA is putting them on trial & getting conviction
Hypothesis testing & falsification
Popper defined science as generating & testing falsifiable ideas
Generate specific hypotheses & plan your analyses in fill prior to any data collection or analysis
False positives & replication crisis
Shady practices violate assumptions of CDA & the way that stats & p values were designed to be used
Overwhelming number of significant results which don’t replicate
Solution: pre-registration
Register method, hypotheses & data analysis before start data collection
Publish data regardless of findings
Vital to maintain clear distinctions between EDA & CDA or run risk or assuming all science is quantitative CDA
Philosophies of science
Single studies by themselves are not science
Kuhn argues that science isn’t necessarily a real thing in the universe with an immutable definition but rather an agreed framework of approaches & theorising
CDA is not more scientific that EDA & vice versa
Making models
Can make models through pure theorising, inductive use of existing data or EDA
EDA & CDA can be complementary & used together
Testing models
P values & simple yes/no statistics don’t rlly work for testing the validity of more complex models
So instead have to use model fitting
Useful for situations where can’t frame entire theory in terms of simple hypothesis based on single result/finding
Let us draw out an entire theory as model & then test how well this fits our data compared to other theories/models
How we can run CDA for more complex formulations of theories
Model fitting & structural equation modelling
Structural equation modelling is an umbrella term for range of different stats analyses including general linear models, paths analysis, confirmatory factor analysis & latent factor modelling
Observed variables
Refer to both observed & unobserved variables
Observed (manifest) variables are directly measured
Represented by rectangle diagrams
Unobserved variables
Unobserved (latent) variables are theoretical/statistical assumptions by the researcher
Theoretically similar to how a factor analysis might generate explanatory factors that don’t appear in the variables/dataset
Represented by circle/eclipses in diagrams
Observed vs unobserved variables
Latent variables often things we care most about as we can almost never measure anything directly (in psychology)
When to use SEM/model fitting
When simple linear models aren’t sufficient to describe your data
Or when you have specific model that you think explains finding & want to test of data fits it
Or when you have multiple models you want to test against each other
How SEM differs to other stats e.g. SPSS
SEM uses slightly different terms for stats that we might be more familiar with
Significant might be used to refer to specific parameters but when considering the model overall we talk about (relative) fit of the models
Still occasionally use p values but more common to use confidence intervals
Model testing
Testing models against each other to see what fits data best
Unlike significance testing, this looks at relative fit
Balance between how well model describes data & how complex model is