Midterm two Flashcards
Mediation
When one independent variable (X1) has an indirect, main effect on a dependent variable (y) through the main effect of the third variable
x1 -> x2 -> y
Spuriousness
When a third variable causes both the independent and dependent variables, making their relationship illusory or non - casual
Multicollinearity
when two independent variables have the basically the same effect, making it difficult to determine their individual effects on the dependent variable
how to check for multicollinearity
VIF (Variance Inflation Factor)
n > 5 alarming
n >10 highly alarming
to calculate interaction, merely adding the variable (for ex. gender) won’t address the interaction
therefore, you must add an interaction term:
y = B0 + B1x1 + B2X2 + B3X1X2
ex. coordination = B0 + B1 * drinks + B1 * female + B3 drinks * female
Conditions for regression (4)
- linearity
- nearly normal residuals
- constant variability
- independent observations
Quadratic Term
Add a non-linear equation x^2 draw a curve
Logarithms
How many of one number (the base) does it take to make another number?
ex. log2(8) = 3
222 = 8
natural logarithm
logarithm with a base of e. It is written as In(x) and represents the power to which e must be raised to obtain x. turns a non-linear relationship linear
logistic regression has what kind of independent and dependent variable?
It has a numeric IV and categorical DV
generalized linear model
A line regression model that predicts a transformation of the dependent variable
transforms y = to f(y) = log(y)
f(y) is the “link function” which means “some function of y”
glm() in R studio
Probability vs.odds
probability is the # of possible successes divided by the # of possible outcomes divided by the # of possible failures
Probability equals
odds/1+odds
odds equal
probability/1-probability
doing generalized linear model in R
glm(dv ~ iv, family = binominal (link =”logit”), data = dataset)
how to transform the output of glm to percentage of probability
example number is 2
logit(y) = 2
odds(y) = e^2
probability(y) = e^2/1+e^2
whatever the number is, put % and move 2 decimal points (ex. # is 0.023 = 2.3%)
When you’re wanting to find interactions for the glm equation in R, you’re answering what question?
Is the effect of one dv (ex. years of experience) to get an iv (ex. callback) different for dv2 (ex. black true)
MANOVA
multivariate analysis of variance
investigates the combination of two or more numeric variables differ across groups. Looks at two + dv simultaneously
Assumptions of MANOVA
- each observation is independent
- The dv are multivariate
- no multivariate outliers
- the groups have equal variance and covariance
For MANOVA, how do you test for quality of variance
Box’s M-test
Discriminant Function Analysis
A statistical technique used to classify observations into predefined groups based on predictor variables. It draws a line to maximize distinction between two groups
What can MANOVA vs DFA tell you about (ex. sadness and lethargy’s impact on depression)
MANOVA: depressed and non-depressed people are different on some linear combination of sadness and lethargy
DFA: what weights on sadness and lethargy best distinguish depressed and non-depressed people?
ex. LDA = 0.428*sadness + 0.208 * lethargy (it’s figuring out those numbers)
What does DFA allow you to do?
lets you predict group membership (ex. if you’re in the depressed or non-depressed group)
ALSO! DFA with more groups has two linear functions and the proportion of trace tells you how important each function is
confusion matrix
a table used to evaluate the performance of a classification model by comparing predicted vs. actual values
logistic regression
statistical method for binary classification, meaning it predicts whether an observation belongs to one of two categories (e.g., Yes/No, 0/1. Pass/Fail). Predicted outputs is probability ranging from 0 to 1