SOCI 3040 - Finals Review Flashcards
Controlling for Variables
Multiple regression accounts for relationship between independent variables and allows to sort out how much variation in the dependent variable is attributable to each independent variable separately.
Interpreting Multiple Regression
1) analyzing the correlation and directionality of the data
2) estimating the model i.e., fitting the line, and
3) evaluating the validity and usefulness of the model.
Standardized Coefficients (BETA)
a way of converting the slope of coefficient to standard coefficient.
Dummy Variables
Acts as a switch variable.
Reference Category/Group
A category that is theoretically important in the researcher’s view also could be converted into the reference group
Nested Regression
One way to build a regression model of the given data is to generate a series of regressions where: The dependent variable Y will remain the same. More independent variables will be added at each step to improve the model, keeping all the previous IVs in place.
Parsimonious Factors
The simplest and the most efficient plausible explanations.
Regression Assumptions
(1) A regression model strives to achieve a high degree of explained variance, encompassed by R2.
(2) A good regression model should avoid the issue of collinearity (correlation) between the independent variables.
(3) Normally Distributed
Centring Independent Variables
center around it mean instead of 0.
Hypothesis Testing in Regression
used to confirm if our beta coefficients are significant in a linear regression model.
R-Squared/Coefficient of Determination
statistics based on variation that summarize how well the regression matches the data. Shows how well the relationship between two variables fits the straight line.
Influential Cases
Cases that have a large effect on the model + introduce bias (may have a high residual + be an outlier).
- Have high leverage.
Zero-order/Partial Correlation
The correlation within the sample overall is called zero-order correlation, and the correlation for each subgroup in the sample is called partial correlation.
Spearman’s Rank-order Correlation
Popular, but requires several assumptions to be valid:
(1) Both variables must be continuous or at least count variables w/ a wide range of values
(2) Must have a linear relationship
(3) Must be normally distributed
Reading Correlation Matrices
The diagonal cells always show the correlation of 1. The order of the variables is always the same.
Adjusted R-Squared
look at the negative impact of adding variables to our model.
Collinearity/Multicollinearity
Occurs when the linear relationship between one independent variable (e.g. age) and the dependent variable Y (e.g. weekly wages) is very similar to the relationship between another variable (e.g. year of birth)
Variance Inflation Factor (VIF)
gives the estimation of how much the variance of the slope coefficient is likely to be inflated because the independent variable is correlated with other IVs in the regression
Multiple Regression Model
an equation that describes the relationship between a dependent variable and more than one independent variable.
Monotonic Relationships
a consistent increase on one variable is associated with a consistent increase (or decrease) on the other variable. E.g., linear relationships
Limitations of Chi-square
It is sensitive to small sample sizes. Likely to give false positive.
Chi-square hypothesis testing
compares the observed frequencies in a table to the expected frequencies we could receive if the two variables are independent in the population
Elaboration Model
Looks at how the relationship between variables X and Y changes after controlling for a third variable Z (replication, specification, distortion, etc).
CHI-Square Test of Independence
a statistical test in which both variables are categorical. This test generally tells us if the distribution of participants across categories is different from what would happen if there were no difference between the groups
Proportionate Reduction of Error Measure (PRE)
shows how much the error in predicting the attributes of the dependent variable can be reduced if the attributes of the independent variable are known.
The Hypothesis Testing
To find the difference in group means.
Steps in Hypothesis Testing
(1) Set up the H₀ and Hₐ
(2) F-distribution and decide on the critical value. F𝒸ᵣ will depend on the α (usually, less than 0.05) level of significance and degrees of freedom.
(3) Calculate the F-statistics based on variation between groups, within groups, total variation and the degrees of freedom.
(4) compare p-value to a.
(5) Decide whether to reject the null hypothesis
F-Distribution in ANOVA
a theoretical distribution used to compare differences between multiple means. Used to find the likelihood of randomly selecting a sample with the observed ratio of between-group variation to within-group variation
Post-Hoc Test
- Practically the LSD test.
- A means of comparing all possible pairs of groups to determine which ones differ significantly from each other
Between Group
the difference between the group means
Within Group
shows how the cases distribute around the mean in each group
Residuals
Not normally distributed indicate that there might be one or more important IVs that were omitted from the regression.
Pearson’s Correlation Coefficient (R)
shows the strength and direction of linear relationship between two variables. A coefficient of -1/+1 shows that there is a perfect negative/positive relationship between the variables, and the coefficient of 0 indicates no relationship.