Final Exam Part 2 Flashcards
Two variable bivariate hypothesis tests exmples
- tabular analysis
- difference of means
- correlation coefficient and regression
Tabular Analysis
- Categorical (IV and DV)
- goal: is difference bw groups statistically significant?
- How: calculate the chi2 stat and p-value
Degrees of Freedom
max amount of independent values, that have freedom to vary in data sample
- Formula: degrees of freedom=size of data sample-1
critical value
point on the test distribution that is compared to the test stat to determine whether to reject the null hypothesis
level of significance
- in a p-value approach
- level of significance= alpha
alpha=1-confidence interval - probability of incorrectly rejecting the null hypothesis
Difference of Means
- continuous DV and IV
- use sample means and SD to make inferences about unobserved pop.
- are means different across values of independent variable?
- How: calculate the t-statistic and p-value
correlation coefficient
- are they correlated/associated?
- how do we know they are correlated?
- measure of the strength of the relationship
- more of X associated w more/less of Y
- can use scatter plot or pearsons r regression
perfect negative correlation
-1
perfect positive correlation
1
no linear relationship bw two variables
0
The pearsons r
- no unambiguous classification rule for the quantity of a linear relationship bw two variables
- tell us how confident we can be that relationship is different from what we would see if our IV and DV were not related in population
examples of non-linear relationships
- quadratic/culvilinear
- cubic/polynomial
common mistakes w pearsons r
- it is not applicable for non-linear relationships
- it does not equal slope
R-square
- always bw 0-1
- proportion of variation in the DV(y) that is explained in the IV (x)
coefficient in the regression
represents mean increase of Y for every additional unit of X
Simple regression model
no controls (only X and Y)
multiple regression model
controls for confounding variables (Z)
- more applicable in the real world, bc realworld is multivariate
most causal theories are…
bivariate, does X cause Y
two-variable regression
fitting the best line through a scatter plot of data
statistical model
slope and y-intercept
systematic components of population regression model
- aplha= fixed value, value of Y depends on alpha, so no random value in systematic components
- when X changes by 1 unit, then value of Y changes by beta units
random components of population regression model
- cannot predict values of this part (no systematic patterns)
- condition or mood
names for the estimated stochastic component
- residual: leftover part of Yi, after line was drawn
- sample error mean/pop. error term
how to calculate the smallest sum of residual values (ordinary-least- squares)
- add the squared value of each the residuals for each line
- choose line that has the smallest total value–> draw line that minimizes the sum of the squared residuals