Lecture 5 Flashcards
In data sets rows capture
Obersvations (on e.g. consumers or firms)
Columns display
Variables. A variable can take on different values for different subjects
Dummy variables
Variables that only take on the values 0 or 1
codebook
A list of all the codes used in a dataset
The variables in your data set need to match the unit of analysis in a study. Specifically:
The dependent variable is measured at the level of the unit of analysis. So are mediator vairables
Independent and moderator vairables are measured at the level of the unit of analysis or at a more aggregate level
Population
Entire group of people, firms, events, or things of interest for which you would like to make inferences
Sample
A subset of the population of interest
Why use samples in the first place
Impossible to study the entire population
The sampling process consists of the following steps
1) define the population you are interested in
2) Determine the sampling frame. The sampling frame is the physical representation of the pupulation through which one can reach out to that population
3) Decide on the sampling design
How to define the target population and choose the sampling frame
1) Define the target population: (Students at tisem, employees at philips etc.)
2) Determine the sampling frame
-Physical representation of the target population (Examples: students at Tisem –> Database students TiSEM)
3) Determine the sampling design
Coverage error
Sampling frame =/ population
Under coverage
Ture population members are excluded
Miss-coverage
Non population members are included
Solution to coverage error
If small, recognize but ignore
If large, redefine the population in terms of the sampling frame
Probability sampling
Each element of the population has a known chance of being selected as a subject
Results generalizable to population
More time and resource intensive
Nonprobability sampling
The elements of the population do not have a known chance of being selected as a subject
Less time and resource intensive
Results not generalizable to population
Probability sampling: simple random sampling
Each population element has an equal chance of being chosen
highest generalizability but costly
Systematic sampling
Select random starting point then pick every ith element (e.g every third starting from person 5)
Simplicity (adds a degree of system or process)
Low generalizability if there happens to be a systematic difference between every nth observation
Stratified sampling (probability sampling)
Divide the population in meaningful (homogenous) groups, then apply SRS withing each group
All groups are adequately sampled, allowing for group comparisons
More time consuming and requires homogenous subgroups
Cluster sampling
Divide the population in heterogeneous groups, randomly select a number of groups and selsct each member within these groups
Cluster population –> sample (clusters)
Geographic clusters
Subsets of naturally occuring clusters are typically more homogeneous than heterogeneous
Classification of sampling designs
Sampling of sampling designs
1) probability:
simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
2) Nonprobability
Convenience sampling
Quota sampling
Judgement sampling
Snowball sampling
Convenience sampling (nonprobability sampling)
Select subjects who are conveniently available
Convenient (inexpensive and fast)
Lower generalizability
Nonprobability sampling (quota sampling)
Fix quota for each subgroup
E.g. do you think dog owners should pay taxes for their pet
Household with dog (mainly no)
household with no dog (mainly yes)
When minority participation is critical (good)
Lower generalizability
Nonprobability sampling: judgement sampling
Select subjects based on t hier knowledge/professional judgement
Convenient (inexpensive and fast) when a limited # of people has the info you need
Lower generalizability
Nonprobability sampling (snowball sampling)
Do you know people who…
Good for rare characteristics (experts)
First participants strongly influence the sample
Measurement or operationalization means
Turning abstract conceptual variables into measurable observations
Nominal scales
A scale that allows you to classify your data into categories
E.g. states in the united states that are either democrat or repubican
You assign 1 to democrat and 2 to republican
Ordinal scale
Ranked or ordered
Rank orders he categories in a meaningful way
More information than a nominal scale; here three is more than 2
E.g. Best to worst, first to last etc.
Interval scale
Allows you to compare differences between values
Meaningful differences between values, but no natural zero point
E.g. IQ
Compared to ranked order; 1 –> 2 is not the same as 2 –> 3 when ranking chili peppers. Iq is standardized and comparable
Ratio scales
Meaningful differences and ratios between values due to a natural zero point
Ratios are meaningful for this scale
E.g. Distance
Zero point is possible
Measures of central tendency
Mean (average), median(central variable in an ordered group of variables) or mode( most common variable)
Measures of dispersion
Range, standard deviation, variance or interquartile range
Indiffeential statistics
Methods to draw conclusions (or to make inferences)
E.g. Mean difference tests
Choosing between descriptive statistics
Nominal scale
Measure of central tendency: mode
Measure of dispersion —
Ordinal scale
Measure of central tendency: median
Measure of dispersion (interquartile range)
Interval scale
Measure of central tendency: mean
Measure of dispersion (standard deviation, variance)
Ratio
Measure of central tendency: mean
Measure of dispersion (standard deviation, variance)
Choosing between inferential statistics:
Check slides
When there are multiple IVs in a study, with different measurement scales:
The highest scale determines the statistical technique
Choosing inferential statistics: T-test or ANOVA
T-test: compaares two means (two levels of an IV)
Anova: can compare more than two levels
Choice, as such:
Depends on the number of IVs
Depends on the number of levels (conditions or groups) of the IV
Choosing inferential statistics: rating scales (Likert scale)
strongly disagree, disagree, undecided, agree and strongly agree
Choosing inferential statistics: rating scales (semantic differential)
Organized _ _ _ _ _ _ _ Unorganized
Cold _ _ _ _ _ _ _ _ Warm
Modern _ _ _ _ _ _ _ _ old fashioned
Treated as interval scales
From a statistical point of view, a moderator is
Also considered an IV.
To test the moderating effect of M on the relationship between X and Y you have to include three IVs in your regression model:
The main effect of X
The interaction effect between X and M (=X*M) to capture the moderating effect of M on the relationship between X and Y
The main effect of M (to statistically control for the impact of M on Y; if you would not include M, the effect of X*M would not be correctly estimated)