Evidence Base + DiD & RD Flashcards by Andres Nunez

What are the three components to classification of data science?

Description: Using data to provide a quantitative summary of certain features of the world

Prediction: using data to map some features of the world to other features of the world

Counterfactual prediction: using data to predict features of the world as if the world had been different

How well did you know this?

Not at all

Perfectly

What is often required for description, prediction, and counterfactual prediction?

Statistical inference

How well did you know this?

Not at all

Perfectly

What is the regression equation?

Y = alpha + beta(x) + episilon

How well did you know this?

Not at all

Perfectly

What does epsilon represent in the regression equation?

Random Error term (tries to correct for the “other stuff”)

How well did you know this?

Not at all

Perfectly

What does beta represent in the regression?

The change in y due to x (duh)

How well did you know this?

Not at all

Perfectly

What are common regression models?

linear regressions (super common)
logistic regression (nonlinear relationships)
two-part model

How well did you know this?

Not at all

Perfectly

Does a regression prove causation?

No…duh

How well did you know this?

Not at all

Perfectly

What are some of the shortcomings of regressions?

can be subject to reverse causation, simultaneity, or third factor causation
regression is inconsistent if dist of estimates does not converge to true value as sample grows
bias and inconsistency lead to false + and/or -

How well did you know this?

Not at all

Perfectly

What is a CDAG (Casual Directed Acyclic Graph)?

Way to guide regression analysis as it shows the director of hypothesized causal effects

How well did you know this?

Not at all

Perfectly

What can DAGs include?

Exposure (E): independent variable
Outcomes (O): dependent variable
Mediator (M): mechanism that transmits effect from E to O
Confounder (C): causer of both E and O
Collider (S): something caused by both E and O

EOMCS - everyone owes me cash sir

How well did you know this?

Not at all

Perfectly

What is done with mediators and colliders?

Mediators - usually controlled for or analyzed w/ careful mediation analysis
Colliders - hard to deal with but often used to help think of limitations to the study

How well did you know this?

Not at all

Perfectly

What are ways to deal with data missing at random

complete case analysis
last observation carried forward
mean value imputation
random imputation

How well did you know this?

Not at all

Perfectly

What is internal validity?

Stat concept that requires any inferences about casual effects to be valid in the population

English: Degree to which findings can be attributed to an independent variable

How well did you know this?

Not at all

Perfectly

What is external validity?

That findings can be generalized from the population and setting to other populations and settings

How well did you know this?

Not at all

Perfectly

How randomized is health care policy?

Like never lol (20% of studies are)

How well did you know this?

Not at all

Perfectly

Do health policies usually make an evaluation plan before rolling out?

Nope

What are two key changes that need to be made to improve data quality in healthcare?

reducing measurement burden by developing systems that autocollect date
favor electronic health records

What do randomized control trials get rid of that make it so great?

selection bias

What was the Rand Health Insurance Experiment?

An experiment from 1974-1982 in which individuals were randomized to various insurance plans w/ diff levels of cost sharing

Free care lowered blood pressure, improved hyperptension, and reduced serious symptoms among poor patients

What was the Oregon Medicaid Lottery?

Granted adults the opportunity to apply for Medicaid in 2008, Finkelstein & Baicker found that health care utilization increased, and self reported health increased on average. However, ED visits increased

Why aren’t RCTs more common in health studies?

They can be infeasible, invalid, or not generalizable

What kind of experiments are great when randomization isn’t possible or ethical?

Natural or Quasi Experiments!

What are the 3 Quasi-Experimental Approaches?

DiD
Regression Discontinuity
Instrumental variables (Uses an exogenous element that nudges some subject towards receiving a certain treatment like a lottery)

What are the 3 Quasi-Experimental Approaches?

DiD
Regression Discontinuity
Instrumental variables (Uses an exogenous element that nudges some subject towards receiving a certain treatment like a lottery)

What are some examples of things that were breakthroughs that didn’t come out of RCTs?

- hand washing reduces infant mortality - smoking increases the risk of lung cancer - repeated head injury may CTE

RCTs are the best approach in terms of ___ validity, but ___ validity is another matter

Internal, external

Describe the “gold standard” RCT

- trial: experimenter controls elements of study design - control: one group that receives treatment and the other doesnt - randomized: no selection bias, observed difference in outcomes = treatment effect + selection bias

Describe a quasi-experimental design (non-randomized control studies)

- trial turns into study: experimenter no longer has direct control of the setting - control: still a treatment and control - non-randomized: selection bias is a potential issue

What settings are common for the use of Difference in Differences?

Estimating the impact of a policy change on some outcome of interest, where the policy change occurs at some specific point in time

Why can’t you just compare health status before and after a health policy?

Other things could have happened that affected health besides just the policy

What is the actual difference in differences?

The change in the treatment group minus the data from an alternate reality where the policy didnt happen

What kind of data is required for DiD?

Panel data (units over time)

What is the key idea that allows DiD to find the effect of policies on healthcare?

You can isolate the treatment effect by taking the difference of the change in the outcome between your treatment and control group, removing the effect of other stuff as it effects both groups outcomes

What is the assumption of DiD?

The “other stuff” is assumed to be identical between treatment and control groups (called the parallel trends assumption)

When is RD applied?

Estimating the impact of some treatment on some outcome of interest when in regards to a threshold

What do the treatment and control groups look like for an RD experiment?

Treatment: units that receive treatment because they are barely just above threshold Control: units that do not receive the treatment because they are just barely below the threshold of treatment

What are the assumptions of RD?

Assumption 1: the difference is entirely due to the treatment of interest Assumption 2: the difference in the “other stuff” is likely to be minimal