Secondary Data Analysis Flashcards
What is secondary data analysis?
Secondary data analysis ; retrospective analysis ; using existing data sets
Conducting a research project without collecting ‘fresh’ data - i.e. using data that has already been collected in a fresh context.
When would secondary data analysis occur?
- your supervisor / a colleague might have data collected from a previous study
- you could use a publicly available research dataset where lots of things are measured (these are often big, longitudinal, population representative data)
- you could use historical or archival data
What are some data repositories?
- APA
- ICPSR (database for versions of datasets): Inter-university Consortium for Political and Social Research i
What are the benefits of secondary data analysis?
- easier (can be faster, doesn’t involve you collecting data)
- its richer (access to populations and resources that you wouldn’t have access to - time, money, vulnerable populations) and involves types of research studies you couldn’t do yourself (longitudinal data, birth cohorts)
- ethical use of data (advancing the benefits of the data, no additional distress on participants)
- can increase research transparency
What are the disadvantages of using secondary data?
- you don’t have control over how the data were collected
- often mismatch between the study design, measures used and your rq/hypotheses (study was designed for a different RQ to yours, often non-experimental, maybe not ideal measures used)
- sometimes it costs to access the data set
- working with other people
- easy to engage in questionable research practices
What are the traditional research steps?
observation
question
hypothesis
prediction
experiment
results
theory
What are the steps involved for secondary data analysis?
- Determine RQ (possibly tentative hypotheses)
- Find appropriate dataset for your RQ
- Identify the study’s design, variables
- Refine hypotheses in light of actual variables
- Analyse data
- Make conclusions and write up results
The first step of secondary data analysis
Determine RQ
- what is the area?
- literature review (what’s known, what is the gap?)
- define research question
- hypotheses -what do you want to investigate, what are your predictions?
Step 2
Find appropriate dataset for your RQ
- fits your area / phenomena
- may need to apply to access and use the data for research purposes
Sometimes the first 2 steps are reversed because data presents itself!
Step 3
Identify the study’s design, variables, etc
- what was the original RQ
- what was the study design
- who was the population
- what constructs were targeted, how were they operationalized
Step 4
Refine hypotheses in light of actual variables
- planned RQ might not be testable on this data
- revise and refine hypotheses in light of which data is actually there
Step 5
Analyse data
- descriptive statistics to summarise
- formal analyses to address hypotheses
Step 6
Make conclusions and write up results
Important considerations for secondary data analysis
- may still need ethical approval at the start as the data will be used for purposes other than the original intention
- at what point does compromise undermine your conclusions Too much
- analytical methods need to match the sampling design
- don’t data snoop
Important considerations for BIG datasets
- longitudinal projects often have a lot of missing data –> is this differential attrition? how will you deal with this?
- need to consider and make decisions about significance levels and effect size (as p value is inversely proportionate to sample size)