Midterm Flashcards
What is the Sample Average Treatment Effect? (SATE)
How do you find it?
SATE = mean of the Treatment variable - mean of the control variable
formula:
SATE 1/n * sum (Yi(1) - Yi(0))
How do you find Mean?
What is a detriment to using mean?
add together all of the numbers and divide the sum by the total amount of numbers
Detriment: can be influenced by outliers which pull the average too high or too low.
How do you find the median?
What is the benefit of using median
If you have an odd amount of numbers locate the exact middle number.
If you have an even amount of numbers locate the two middle numbers, add them together, and divide the sum by 2.
benefit: more robust against the impact of outliers
How do you find Range?
subtract the minimum number from the maximum number
How do you find the interquartile range?
Subtract Q1 from Q3
How do you determine if a number is an outlier?
You must find the highest and lowest limit of the dataset for non-outlier numbers. To find the lowest acceptable number take Q1 - 1.5IQR. To find the highest acceptable number take Q3 + 1.5IQR. If the number in question is below or above either of these numbers it is an outlier.
How do you find the three Quartiles?
Start by finding the median of the entire list. The median is considered Q2. The median then separates the list into two halves. Locate the median of the first half of the list, this median is Q1. Locate the median of the second half of the list, this median is Q3.
How do you find Standard Deviation? What is the formula?
Formula:
SD = sqrt (1/n-1 * sum (xi-mean of X)^2
Steps:
1.) find the mean of X
2.) Subtract the mean from each x variable
3.) Square each result from step 2
4.) Add together all the squares
5.) Divide the sum of the squares by the total number of observations minus 1
6.) Square root the result of step 5
The result of step 6 is the standard deviation
How do you find variation
SD^2
Square the standard deviation
What is the formula for the Correlation Coefficient (r)
<you will not compute this by hand!>
What is r telling you?
How is it written?
how is it described?
r= 1/(n-1) * sum of ((Xi-mean of x/ SD of X) * Yi-mean of y/SD of Y))
R tells you
-The strength and direction of a relationship between variables.
-How similar the measurements of two or more variables are across a dataset.
- How close the variables move together
will be between -1 and 1.
Can be described high or low, positive or negative, or no correlation (0).
How do you find mean prediction error?
1.) for each variable point subtract the predicted value from the actual value of the point.
2.) Add all of those values together.
3.) divide the sum by the total number of values.
How do you find the root mean square error (RMSE)?
RMSE = sqrt (RSS/n)
1.) find the value of RSS (subtract predicted y from real y, square the results, add the squares
2.) divide RSS by the total number of values
3.) square root the result
What is the equation for a linear regression model?
Y= α +βX + ε
What do the variables in the linear regression model mean?
Y= α +βX + ε
Y: dependent variable, what you are trying to predict
α: alpha, is the y-intercept. Where y is when X=0
β: Beta, slope, the increase in Y when X has a one-unit increase
X: independent variable, the predictor
ε: error term, the observed error.
How do you find residuals (the error term)
Actual y - predicted y
How do you find the residual sum of squares (RSS)?
(What is the formula)
RSS= sum of (Yi-Ŷ)^2
1.) subtract the predicted value of y from the actual value of why for each data point.
2.) square each result
3.) add together all of the squares
The result of step 3 is the RSS
How do you find the total sum of squares (TSS)?
(What is the formula)
TSS = sum of (Yi-Ȳ)^2
1.) subtract the mean of y from each y value in the data set.
2.) square the results of each subtraction in step 1
3.) add together all of the squares
The result of step 3 is the TSS
What is R2 and how do you find it?
what does it tell you?
What is the formula?
R2 is the proportion of variation in Y explained by the model.
Tells you how well a model fits the data
R2 = 1 - (RSS/TSS)
1.) find RSS
2.) find TSS
3.) Divide RSS by TSS
4.) Subtract the result to step 3 from 1
Result of step 4 is R2
What is the counterfactual?
What is the factual?
Counterfactual = what would have happened absent a condition or treatment, what would have been observed
Factual = What was actually observed
What is the fundamental problem of causal inference?
The counterfactual can never be observed
- you must infer the counterfactual outcomes as accurately as possible, but will never actually know what would have happened.
What is the rule of causality?
(ie. ice cream sales and suicide)
association does not equal causation!
How can you figure out counterfactuals?
What is the problem with this tactic?
Matching- find a similar unit that matches as close as possible
Problem: you cannot match everything and this introduces confounders
What are confounders?
How do you minimize confounders
variables associated with treatment
and outcome, they impact the results and make it difficult to attribute changes to the treatment.
Can be observed or unobserved.
Minimize by using randomized controlled trials
What are Randomized Controlled Trials and how do they work to minimize confounders?
RCT is when scientists randomize the treatment to make the treatment and control groups identical on average.
The groups are similar in terms of all, observed and unobserved, characteristics. This allows scientist to be able to attribute any differences in outcome to the treatment variable and rule out confounders.
What are double-blind experiments?
An experiment where neither the scientists nor the study participants know who is receiving the treatment and who is part of the control. Often used to prevent bias in the experiment.
What is the placebo effect?
when a “fake” treatment produces a result that cannot be attributed to the placebo itself and is therefore caused by the patient’s belief in the “treatment”
- people think they receive treatment and affect the result
(ex. the subject says pills work to cure illness even though they received just a sugar pill that did nothing.)
What is the Hawthorne Effect?
the phenomenon where study subjects behave differently because they know they are being observed by researchers.
What are observational Studies
Studies where the treatment is naturally assigned. Scientists don’t DO anything, they just observe what is happening in nature.
Why can observational studies not be randomized?
Ethical and logistical reasons
Ex. Ethical: smoking and lung cancer, it would be unethical to force a group of humans to smoke just to observe if they got lung cancer
Ex, Logistical: wars occur naturally, you cannot feasibly make countries go to war just to see what happens in the UN assembly. (This is ethical too)
Compared to controlled experiments do observational studies have weak or strong external validity? Why is this good
has better external validity for generalization beyond the experiment than RCT experiments. This is because the events occur naturally and are not confined to the extreme specifics of lab work.
Strong external validity is good because it means the findings can be applied very broadly.
Compared to controlled experiments do observational studies have weak or strong internal validity?
Why?
They have weaker internal validity.
Because:
pre-treatment variables may differ between treatment and
control groups
* confounding bias may exist due to these differences
* selection bias from self-selection into treatment may occur
* statistical control is needed (subclassification, variables)
* unobserved confounding poses a threat
What is external validity?
The extent to which the conclusions of a study can be generalized beyond the particular study
What is internal validity?
the extent to which causal assumptions are satisfied in the study
The extent to which the effect of the treatment in a study can be attributed solely to the treatment itself and not other confounders.
This is the main advantage of Randomized controlled Trials
what are the three strategies of observational studies?
1.) Cross-section comparison
2.) Within-unit effects (AKA Before and After comparison)
3.) Differences-in-differences
No strategy is best!