Quiz 3 Flashcards
Experiment
A method in which researchers randomly assign individuals to experimental conditions
Benefits of randomness
- Gold standard for causal inference
- We can make sure that the treatment is the only systematic difference between the two groups under study
Fundamental problem of causal inference
We can’t observe a unit in both its treated and controlled statuses
Characteristics of lab experiments
- Researcher controls the environment
-Ideal for when experimental manipulations don’t exist in the real world - Make sure there is full randomization
- Everyone in the treatment group receives the same treatment
Disadvantages of lab experiments
- Artificial environment
-Hard to measure long term behavior
-Non-representative
Field Experiments
Experiments which apply the logic of randomization and variable manipulation to naturally occurring situations
Ex: the ability of sanction message to reduce hate speech on Twitter
Advantages and Disadvantages of field experiments
A: more true to life than lab experiments
D:
-Greater chance of failing to fully randomize
-Participant might not fully comply
-Costly
Survey Experiments
Experimental manipulation takes the form of a survey, with the same issues that lab experiments have
Natural ‘experiments’
Observational studies that carry some characteristics of experiments, but researchers can’t randomize
Criteria for evaluating experiment quality
Internal and external validity
(Lab experiments are stronger in the former while field experiments are in the latter
Internal Validity
The degree to which the research procedure demonstrates a true causal relationship
Reasons why causality might be compromised
-Failure to fully randomize selection bias
-Non-compliance with treatment
-Maturation
-Contamination
External Validity
The extent to which the results of a study can be generalized across populations, times, and settings. Generalizable to the entire population of interest and to all time periods
Causal inference from strongest to weakest
-Experiments
-Natural experiments
-Observational studies
Observational Study
Designs in which researcher doesn’t interact with or intervene in the data generation process, but instead merely observes causal sequences and covariations
Cross-sectional design
Measurements of the independent and dependent variables are taken at the same time
Longitudinal/time series designs
Measurements of the independent and dependent variables are taken at different points in time
3 Vs of Big Data
Volume, Variety and Velocity (lots of data, in a variety of formats, being created constantly)
Characteristics of big data
-big (lots of info useful for studying rare events and for heterogeneity)
-Constantly changing (good for the study of unexpected events)
-non-reactive (lack of subject behavioral change when they know they’re being observed)
-incomplete (missing data to operationalize concepts, demographic information, and behavior on other platforms)
-inaccessible
-nonrepresentative
-drift in subject over time
-influenced by algorithm
-dirty
-sensitive
Farber’s findings
Taxi drivers work more on days when wages are higher
Now casting
Attempts to use ideas from forecasting to measure the current state of the world
What went wrong with Ginsberg’s now casting
- outperformed by simpler modeling
-drift and algorithmic confounding
Ginsberg flu data study
-Combined Google Trends with CDC fluency data and used it to now cast flu prevalence
Seigle Findings (Trump Article)
No empirical support for the hypothesis that Trump’s divisive campaign increased hate speech on Twitter
Visconti Findings (Chile and natural disasters)
Material damage caused by disaster increased the probability of voters selecting left wing and independent candidates
Population
Any well-defined set of units of analysis
Sample
Any subset of units collected in some manner from a population
Coverage bias
Incomplete or inappropriate sample frame
Two types of samples
Random (probability) sample
Non probability sample
Simple Random Sample
Each individual or group has an equal change of selection. Assumed in most statistics formulas