Secondary data analysis Flashcards
Secondary data
Any data that is collected by others (ie: not yourself) at an earlier point in time
Primary data
Data that we collect ourselves; original information collected by the researcher themselves
Expert-coded datasets
Experts provide estimates and assessments of various measures; usually opinion based
V-Dem: Varieties of Democracies; experts code country democracy scores based on certain criteria
CHES: Chapel Hill Expert Survey; experts estimate party ideology and party positions
Researcher coded datasets
Coded by teams of researchers based on publicly available information (ie: news sources, academic articles, etc.)
COW, UCDP: teams of researchers code data on conflicts and casualties
Types of secondary data
- You can collect your own surveys
- Official statisitcs
- Quantified texts, party manifestos
- Exisitng qualititative sources
Categorical variable
Binary variable: There are only two categories (i.e. dead or alive).
Nominal variable: There are more than two categories (i.e. whether someone is an omnivore, vegetarian, vegan, or fruitarian).
Ordinal variable
The same as a nominal variable, but the categories have a logical order (i.e. whether a student got a fail, pass, merit, or distinction in an exam).
Continous variable
Entities get a distinct score
Interval variable: Equal intervals on the variable represent equal differences in the property being measured (i.e. the difference between 6 and 8 is equivalent to the difference between 13 and 15).
Ratio variable: The same as an interval variable, but the ratios of scores on the scale must also make sense (i.e. a score of 16 on an anxiety scale means that the person is, in reality, twice as anxious as someone scoring 8).
Benefits of secondary data - Quality
Secondary data are often of higher quality, part of a collective project and usually of larger scale which ensures quality
Benefits of secondary data - time scale
Because these data collection efforts are part of larger projects, they are often collected over periods of time as opposed to single surveys that capture a moment in time
disadvantages of secondary data
Unable to make causal inferences , because secondary data sources are often not experimental (ie: we are not manipulating a treatment as the researcher) it is difficult to make casual inferences
Inaccuracies