SOCI 3040 - Finals Review Flashcards
Controlling for Variables
Multiple regression accounts for relationship between independent variables and allows to sort out how much variation in the dependent variable is attributable to each independent variable separately.
Interpreting Multiple Regression
1) analyzing the correlation and directionality of the data
2) estimating the model i.e., fitting the line, and
3) evaluating the validity and usefulness of the model.
Standardized Coefficients (BETA)
a way of converting the slope of coefficient to standard coefficient.
Dummy Variables
Acts as a switch variable.
Reference Category/Group
A category that is theoretically important in the researcher’s view also could be converted into the reference group
Nested Regression
One way to build a regression model of the given data is to generate a series of regressions where: The dependent variable Y will remain the same. More independent variables will be added at each step to improve the model, keeping all the previous IVs in place.
Parsimonious Factors
The simplest and the most efficient plausible explanations.
Regression Assumptions
(1) A regression model strives to achieve a high degree of explained variance, encompassed by R2.
(2) A good regression model should avoid the issue of collinearity (correlation) between the independent variables.
(3) Normally Distributed
Centring Independent Variables
center around it mean instead of 0.
Hypothesis Testing in Regression
used to confirm if our beta coefficients are significant in a linear regression model.
R-Squared/Coefficient of Determination
statistics based on variation that summarize how well the regression matches the data. Shows how well the relationship between two variables fits the straight line.
Influential Cases
Cases that have a large effect on the model + introduce bias (may have a high residual + be an outlier).
- Have high leverage.
Zero-order/Partial Correlation
The correlation within the sample overall is called zero-order correlation, and the correlation for each subgroup in the sample is called partial correlation.
Spearman’s Rank-order Correlation
Popular, but requires several assumptions to be valid:
(1) Both variables must be continuous or at least count variables w/ a wide range of values
(2) Must have a linear relationship
(3) Must be normally distributed
Reading Correlation Matrices
The diagonal cells always show the correlation of 1. The order of the variables is always the same.
Adjusted R-Squared
look at the negative impact of adding variables to our model.
Collinearity/Multicollinearity
Occurs when the linear relationship between one independent variable (e.g. age) and the dependent variable Y (e.g. weekly wages) is very similar to the relationship between another variable (e.g. year of birth)
Variance Inflation Factor (VIF)
gives the estimation of how much the variance of the slope coefficient is likely to be inflated because the independent variable is correlated with other IVs in the regression
Multiple Regression Model
an equation that describes the relationship between a dependent variable and more than one independent variable.
Monotonic Relationships
a consistent increase on one variable is associated with a consistent increase (or decrease) on the other variable. E.g., linear relationships
Limitations of Chi-square
It is sensitive to small sample sizes. Likely to give false positive.
Chi-square hypothesis testing
compares the observed frequencies in a table to the expected frequencies we could receive if the two variables are independent in the population
Elaboration Model
Looks at how the relationship between variables X and Y changes after controlling for a third variable Z (replication, specification, distortion, etc).
CHI-Square Test of Independence
a statistical test in which both variables are categorical. This test generally tells us if the distribution of participants across categories is different from what would happen if there were no difference between the groups