Final Exam Part 2 Flashcards
Two variable bivariate hypothesis tests exmples
- tabular analysis
- difference of means
- correlation coefficient and regression
Tabular Analysis
- Categorical (IV and DV)
- goal: is difference bw groups statistically significant?
- How: calculate the chi2 stat and p-value
Degrees of Freedom
max amount of independent values, that have freedom to vary in data sample
- Formula: degrees of freedom=size of data sample-1
critical value
point on the test distribution that is compared to the test stat to determine whether to reject the null hypothesis
level of significance
- in a p-value approach
- level of significance= alpha
alpha=1-confidence interval - probability of incorrectly rejecting the null hypothesis
Difference of Means
- continuous DV and IV
- use sample means and SD to make inferences about unobserved pop.
- are means different across values of independent variable?
- How: calculate the t-statistic and p-value
correlation coefficient
- are they correlated/associated?
- how do we know they are correlated?
- measure of the strength of the relationship
- more of X associated w more/less of Y
- can use scatter plot or pearsons r regression
perfect negative correlation
-1
perfect positive correlation
1
no linear relationship bw two variables
0
The pearsons r
- no unambiguous classification rule for the quantity of a linear relationship bw two variables
- tell us how confident we can be that relationship is different from what we would see if our IV and DV were not related in population
examples of non-linear relationships
- quadratic/culvilinear
- cubic/polynomial
common mistakes w pearsons r
- it is not applicable for non-linear relationships
- it does not equal slope
R-square
- always bw 0-1
- proportion of variation in the DV(y) that is explained in the IV (x)
coefficient in the regression
represents mean increase of Y for every additional unit of X
Simple regression model
no controls (only X and Y)
multiple regression model
controls for confounding variables (Z)
- more applicable in the real world, bc realworld is multivariate
most causal theories are…
bivariate, does X cause Y
two-variable regression
fitting the best line through a scatter plot of data
statistical model
slope and y-intercept
systematic components of population regression model
- aplha= fixed value, value of Y depends on alpha, so no random value in systematic components
- when X changes by 1 unit, then value of Y changes by beta units
random components of population regression model
- cannot predict values of this part (no systematic patterns)
- condition or mood
names for the estimated stochastic component
- residual: leftover part of Yi, after line was drawn
- sample error mean/pop. error term
how to calculate the smallest sum of residual values (ordinary-least- squares)
- add the squared value of each the residuals for each line
- choose line that has the smallest total value–> draw line that minimizes the sum of the squared residuals
goodness of fit measures (root-mean squared error)
- measures of the overall fit bw a regression model and the dependent variable
- quantifies how well the OLS regression we have obtained fits the data
- r-squared statistic
p value and estimated beta
how likely it is that we observe this sample slope of estimated beta from the real world if the true (but unobserved) population slope beta is equal to 0
two-tailed (non-directional) hypothesis tests
most common hypothesis tests about parameters from the OLS regression model
in any observational study, how do we control for the effects of other variables?
multiple regression is the most common method in social science
omitted variables bias
bias from the failure to include a variable that belongs in regression model
(result of omitting a variable, Z, that should have been in the model)
small bias
if either or both components of bias term are close to zero (red area overlapped is small)
large bias
if both components are likely large (red area overlapped is quite large)
positive bias
if beta1 and beta2 correlate bw X and Y are positive
standardized coefficient
remove metric of each variable to make them comparable to one another
- coefficients on a standardized metric
unstandardized coefficients
coefficients in the table each exist in the native metric of each variable. Normally not comparable