midterm 2 Flashcards
Mediation
when one independent variable (x1) has n indirect, main effect on a dependent variable (y) through the main effect of the third variable (x2)
x1 - x2 - y
spuriousness
when a third variable causes both the independent and dependent variables, making their relationship illusory or non-causal.
multicollinearity
when two independent variables have the basically the same effect, making it difficult to determine their individual effects on the dependent variable.
how to check for multicollinearity
VIF (Variance Inflation Factor)
n > 5 alarming
n > 10 highly alarming
to calculate interaction, merely adding the variable (for ex. gender) won’t address the interaction
therefore, you must add an interaction term:
y = Bo + B1X1 + B1X2 + B3X1X2
ex. coordination = Bo + B1 * drinks + B1 * female + b3 drinks * female
conditions for regression (4)
- linearity
- nearly normal residuals
- constant variability
- independent observations
Quadratic Term
Add to a non-linear equation x^2 to draw a curve
Logarithms
How many of one number (the base) does it take to make another number?
ex. log2(8) = 3
natural logarithm
logarithm with a base of e. It is written as ln(x) and represents the power to which e must be raised to obtain x. turns a non-linear relationship linear.
logistic regression has what kind of independent and dependent variable?
it has a numeric IV and categorical DV
generalized linear model
a linear regression model that predicts a transformation of the dependent variable.
transforms y = to f(y) = log(y)
f(y) is the “link function” which means “some function of y”
glm() in R studio
probability vs odds
probability is the # of possible successes divided by the # of possible outcomes divided by the # of possible failures
probability equals
odds/1 + odds
odds equals
probability/1 - probability
Doing generalized linear model in R
glm(dv ~ iv, family = binominal (link = “logit”), data = dataset)
how to transform the output of glm to percentage of probability
example number is 2
logit(y) = 2
odds(y) = e^2
probability(y) = e^2/1+e^2
whatever the number is, put % and move 2 decimal points (ex. # is 0.023 = 2.3%)
when you’re wanting to find interactions for the glm equation in R, you’re answering what question?
is the effect of one dv (ex. years of experience) to get an iv (ex. calback) different for dv2 (ex. black true)
MANOVA
multivariate analysis of variance
investigates the combination of two or more numeric variables differ across groups. looks at two + dv simultaneously
What is the Null and Alternative Hypothesis for MANOVA
null: there is no difference between the groups on the combined dependent variables
alternate: at least one group is different on the combined dv.
Assumptions of MANOVA
- each observation is independent
- the dv are multivariate normal
- no multivariate outliers
- the groups have equal variance and covariance
for MANOVA, how do you test for equality of variance
Box’s M-test
Discriminant Function Analysis
a statistical technique used to classify observations into predefined groups based on predictor variables. It draws a line to maximize distinction between two groups
What can MANOVA vs DFA tell you about (ex. sadness and lethargy’s impact on depression)
MANOVA: depressed & non-depressed people are different on some linear combination of sadness & lethargy
DFA: what weights on sadness & lethargy best distinguish depressed and non-depressed people?
ex. LDA = 0.428 * sadness + 0.208 * lethargy (it’s figuring out those numbers^)
What does DFA allow you to do?
lets you predict group membership (ex. if you’re in the depressed or non-depressed group)
Also! DFA with more groups (4…? groups i think?) has two linear functions and the proportion of trace (in R) tells you how important each function is
confusion matrix
a table used to evaluate the performance of a classification model by comparing predicted vs. actual values
logistic regression
statistical method for binary classification, meaning it predicts whether an observation belongs to one of two categories (e.g., Yes/No, 0/1, Pass/Fail). Predicted output is probability ranging from 0 to 1.