Chapter 1-8 Flashcards
Describe the four hurdles to establishing causal relationships and their implications.
- Credible causal mechanism: does X cause Y?
- fails then: throw out and revise theory - Reverse causality: Can we rule out the possibility that Y causes X?
- fails then: we proceed with caution, reverse causality remains possible - Covariation: is an increase in X associated w +/- Y?
-fails then: might be a confounding variable - Confounding Variables– can the effect of X on Y be caused by another variable, Z
fails then- not controlling for the effects of Z on Y or X produces a misunderstood relationship between X and Y.
If we find that X and Y covary, but once we control for Z, the relationship between X and Y disappears, then..
the relationship between X and Y is spurious– two variables move together due to coincidence or the presence of some third factor.
Describe the difference between theory and hypothesis
Theory: broad conjecture about the causes of some phenomenon “I think…”
Hypothesis: narrow, concrete, and operational conjecture about what we would observe if theory correct.
If one theory is true, strong to believe hypothesis under the umbrella could be true as well
Identify the two types of research design
Observational studies– observing real world data to draw causal connections
Randomized, controlled experiment (RCE): randomly assigning values of I.V. , creating treatment and control groups.
Random Assignment = Random Sampling ????
No, randomly assigning subjects to treatment and control groups is not randomly sampling subjects for participation. (Sampling for inclusion from the broader population, not to control for potential C.V.s)
Explain validity, its two forms, and their threats
Validity: accurate representation of the concept we’re measuring.
Internal validity: degree to which a study is able to establish credible causal mechanisms
Threats: failure to randomly assign treatment and control groups, attrition– participants drop out in non-random way during study , history: events outside study occurring before and after measurements, contamination: compromising information communicated between treatment and control groups.
External validity: degree to which results of a study can be generalized to other contexts (applicability)
Threats: Population– sample may not be representative of broader population of interest, (2/3 of study were men) Environment: setting of study may not be representative of other settings of interests (i.e. cultural differences).
Do observational studies have higher internal or external validity? (answer with complex comparison)
Observational studies while feasible, cost efficient and with higher external validity than experiments, are open to confounding variables thus have low internal validity.
Define content validity
Does a measure contain all of a concept’s essential elements? Forces researcher to come up with all elements that define the concept we wish to measure. (Ex: measuring democracy included accounting for checks and balances. c, openness, competitiveness , etc..
Do experimental studies have higher internal or external validity? (answer with complex comparison)
While experimental studies have high internal validity because they can control for confounding variables, they often have lower external validity because we can be limited in attributing our results to the broader pop (ex: if our study tested 2/3 men)
Describe the types of observational studies
Cross sectional: examine diff subjects at a single point in time
Time series: examine a single subject over multiple points in time
Panel datsa: examine both CS and TS in one
Define feasibility
Asks how easy it is to test something w randomized, controlled variables.
Identify the equation for sample mean
Mean= average value
(sum all values, divide by sample size)
Identify Median
centermost value when ranked small to large
Identify mode
commonly-occurring value
Identify and Describe Variance
sum of all squared differences between each value and the mean, divided by (n-1).
large variance– values more spread around the mean
0 = values all the same.
Identify and Describe Standard deviation
square root of the variance
- average distance between the values and the mean.
Identify skewness
measure of the symmetry of the distribution around the mean.
symmetrical when skew=0»_space; Mean=median=mode
(-) : left tail longer , data concentrated on right 3Ms less than each other
(+): right tail longer, data concentrated on left, 3Ms greater than each other.
identify kurtosis
Kurtosis: measure of peak/flatness of distribution.
(+): peaked, values highly concentrated near mean
(-): flat, values not highly concentrated around mean
Random sample vs non-random sample
Random sample: members of population have equal likelihood of being chosen for the sample
Nonrandom: some members have higher likelihoods of being chosen than others»_space; leads to less external validity.
Describe statistical inference and an example
SI: using what we know about a sample to infer what’s true about a population
Suppose 1000 marbles in a bag, 550 blue, 450 red. If you have your friend chose 100, that’s a sample. if she pulls out 54 blue and 46 red, determines percentage, thats a SI.
Describe degree of freedom
We must perform Bessel’s correction: divide by a slightly smaller number to yield a larger (unbiaded) sample variance since the sample is naturally closer to the sample mean than the pop. mean.
Explain central limit theorem
if we collected an infinite number of random samples, those sample means would be distributed normally around the true population mean.
Describe normal distribution
probability distribution stating there’s symmetry around the mean, such that the mean, median, and mode are all the same. has 95% confidence out of 68/95/99 rule = two standard deviations from mean captures a range of values that contain pop. mean w probability of 95% its the true pop mean.
Define and Identify standard of error equation
the standard deviation of the sampling distribution of the mean, capturing how representative a sample is of the broader population . It is the standard deviation divided by the square root of sample size
the larger a sample size, the …. the confidence interval
narrower , i.e. more precise we can make inferences about population
Explain bivariate hypothesis testing
making statistical inferences about the relationship between two variables: how confident does this causal relationship also exist in the population? Typically using the p-value, or the probability value we would see a relationship due to random chance. small pvalue, typically below 5% threshold means relationship is statistically relevant.
2 forms: difference of means and correlation coefficient
Identify and Describe difference of means test and the equations involved.
Used when Y is continuous and X is categorical.
Use t-test: difference in means divided by standard of error of difference of means.
Degree of freedom: we estimate two means so we do ((n1 +n2 )-2)
large difference, large sample size = smaller SD, less variance in sample
if a t-test exceeds a critical value of t, then…
we can conclude relationship is stat. significant.
Step one of every difference in means test is…
code responses: approval = 1 , disapproval = 0
Explain the three step process of correlation coefficient
- Covariance– summarize general pattern of association between two continuous variables
(equation given) - Correlation Coefficient: measures the direction and strength of association between X and Y (equation given, but just cov/ root variance of X times variance of Y)
- perform t-statistic with degree of freedom of 2