M5 - Research Design Flashcards
Building a sample - selection
Random - non random
Unrestricted random - restricted random
Restricted random : stratified, cluster, multi-stage sampling
- conscious - arbitrary
Selection of extreme/ typical cases
Concentration / snowball/ ratio method
Random selection - unrestricted random
Each element has the same probability to be part of the sample
+ easy to impelement
Random selection - stratified sampling
Classification of the population into disjoint groups “strata”
then unrestricted random within the groups
+ greater precision with the same effort
E.g: you have 3 locations and wnat 1/3 of each lication
- complex extrapolation
Randlm selection - cluster sampling
Random selection of clusters within the population
+ cheaper than oure randon
- large errors
Random selection - multi-sage sampling
Sequence of random sampling
E.g.: random selection lf electoral districts, then random selection of voters, then random selection of …)
- large errors
- complex extapolation
+ cheaper
Non random selection - arbitrary
Select cases of the population that are easilyaccesible
Non random - Conscious selection :
Dependent on survey object
- -> snowball
- –> quota
- -> concentration method
Non random - conscious - snowball method
Selection of members of rare & unknown populations
Non random - conscious - concentration method
Selection of cases for which a certain feature is so distinct that the distribution of it is thought to be alone in the pop.
Non random - conscious - quota method
The sample fulfills certain quotas that are known from the pop.
Which is representative?
Random or non-random?
Representativeness is only ensured by random selection!
Central Limit Theorem
The distribution of mean values of the size N that were drawn from the population converges with increasing N to a normal distribution.
Implication: for samples of a size >=30, probabilties can be quantified as estimates of the mean
Central Limit Theorem implication
O rule
2o rule
For samples of above 30, probabilities can be quantified as estimates of the mean
O-rule: with a probability of 68% the mean of a random variable is in the range of
y +- o
2O-rule: with a probability of 95.5% the mean of a random variable is in the range of
y +- 2o
How does missing data occur?
- unrecorded items/data
- item non-response : some variables for a survey unit are not indicated
- unt non-response: survey unit not included
Why are random missing data points no problem?
Only systematic missings are a problem, because the characteristics of the object cause the non-response –> biased result
How to handle missing data?
Economically model the missings through correction terms
Invest in high response rate
Cross sectional analysis
Characteristics
- adv
- disadv
- 1 point in time
- several units of obs
- 1/more variables
Adv:
Cheap, quick, simoel extrapolation
Disadv:
no testing of causal rel-ships
Statements restricted to a date –> limits external validity
Jow to choose the type of date set?
Question
Criteria
At what time(s) and for how many units do i collect?
Criteria:
Time horizon
Fundings, resources
Type of research question (static/ snapshot)
Type of hypotheses (difference / correlation / change)
Internal validity
Is achieved when the treatment is actually responsible for the variation of the dependent variable
External validity
Possibility of generalization of the experimental results to other studies/ situations/ people.
Longitudinal analysis
- what
- adv
- disadv
- several points in time
- one unit
- 1/more variables
Used in finance markets, macroeconomics
Adv:
High internal validity
Often easily performed by use of historical data
Disadv:
Data collection takes time
Low external validity
Heterogeneity
Uneinheitlichkeit der elemente hinsichtlich eines/mehrerer merkmale
–> difference between properties in the dataset
Endogeneity
Occurs when the explanatory variable is correlated with the error term
Can cause omitted variables, measurement errors and autoregression
Panel design
What
Adv
Disadv
-several points in time
- one/more variables
- several units
Used in marketing, labor economics
Adv:
Good internal and external validity
Good control over latent heterogeneity within the units
Disadv: Cost Data collection takes time Panel mortality - unit drops out Unit non-response in certain time periods
Experimental studies
What
Adv
Disadv
Possible random assignment of subject to ‘treatments’ and repetition
–> psychological research
Adv:
High internal validity
Used for testing causal hypotheses
Disadv:
Low external validity (no reality)
High cost
Laboratory artifacts possible
Natural experiments
What
Adv
Disadv
Exogeneous effects on objects of study in real life
Adv:
Low cost
No endogeneous interferences
Use of exogeneous produced variation
Disadv
Rare
Suitable for testing vausal hypotheses
Homogeneity
Similarity between properties in datasets