Regression Flashcards
covariance
how 2 variables covary with each other. uses raw score units. unable to tell strength this way. s
correlation
scale-free degree of linear relationship between 2 varibles. r
scatterplot
strength of association, direction of it, shape of it
pearson correlation
magnittude and direction of linear relationship
change in magnitude
outliers, extreme #’s inflating the mean, curvlinear, errori in x or y
correlation does not equal
causality
correlation coefficient
strength of relationship between 2 variables. r
Y=mx + b in regression form
Y= a +bx, Y=bo + b1x
bo
regression constant, intercept
b1
regression coefficient, slope
when does regression line pass through y axis?
when x=0
regression line always passes through
x bar, y bar
means of predicted y
= means of observed y
Error
actual - predicted Y
least squares criterion
slope and intercept to minimal distance between actual and predicted y. decreases sum of square residuals
improves ability to predict Y from using predictors (x)
regression
R
corrleation betwen observed and predicted = absolute value of correlation
Rsquared
proportion of variance in Y that is accounted for in linear relationship in x. biased, overestimates population
adjusted R squared
unbiased estimate of population
stand error orf estimate
error present in predicting y from x. decrased SEE is more accurate
constant unstandardized B
Regression constant, bo, intercept
underneath the constant in unstandardized B
slope, regression coefficient
unstandardized coeff
for every unit increase in x, there is a [ ] increase in y
R2 = .43
approxing 43% of the variance in y is accounted for by its linear relationship with x
for every 1 unit increase in x, y increases by [what factor]
slope
multiple regression
looking at multiple predictors
1st order model
x has linear relationship with y and does not interact but can correlate
holding, controlling, partialling out
studing x1 on y, but x2 can affect that relationship (can be corr with x1, y or both, so we remove the effect of x2
effect size R^2
proportion of y that is accounted for in model
F test tells what is significantly different than zero
R, Rsquared
T test tells what is significantly different than zero
regression coeffecition (slope) when controlling for effects of other variables
APprox [ ] percent of the variance in y is accounted for in its linear relationship with x
adjusted r^2
a 1 unit increase in x1 is associated with a [-] in loans
decrease [slope]
unique contributors
looking at variance over one predictor over and above another
holding x2 constant, the predicited y increases by [. ] for every aditional unit of x1
positive slope
can pull regression line one way or another
outliers
cooks, leverage,
looks at outliers. >2 needs to delete
partial correlation
removing the effects of 1 or more varianbles (x2) from both variables (x1, y)
how to know variance with part or partial
square
part (semi-partial) correlation
removing the effects of one variable (x2) from 1 varianble (x1) but not the other (y)
variance accounted for in. x2, over and above the variance accounted for in x1
part
variables need to be controlled by
theory
common cause hypothesis
if a and b correlate, they share a common cause. Partialling out the effects of the common cause variable will yeild zero
common cause example
not wearing seatbelts in 80’s caused astronaughts to die in space
mediation
if a and b correlate, b/c a causes b through 1 ore more mediator variables, then the correlation of a & b partional out the effects of. the mediator should equal zero
mediator efect on significance
can drastically decrease or make not significant
medeator causes what
outcome and explains underyling mechanism of relationshp
mediator example
grades cause happiness. grades lead to greater self esteem which causes happiness
moderation
when relationship of 2 variables depends on another in order to have an effect
moderator causes what
significant increase or decrease effect of x on y
moderator example
pscyhotherapy decreases depresion fro men more than women. gender is moderator
suppressor variables
uncorrelated w/ y but highly correlated with x causing an artificial improvement in x and y when x2 has no bearing on the relationship
suppressor reg coeff=
non zero
standard regression coeff for x1 is ___ correlated with criterorio
greater
x1 < x2(y) who is suppressor?
x1
spss suprressor. what will be greater than zero order?
Part
Correlation assumptions
x and y linear relationship, data pair independence (x 1 does not relate to x2), bivariate normality
correlation assumptions modifications
check scatterplot. should look evenly dispersed, random, no patterns
Regression assumptions
normality of residuals, homoskedasticity of residuals, model properly specified
sample size for regression
5(p) bare minimum, 20(p) good
Regression modifications
homo-plots, normality-histogram
model properly specified
no unnessary x’s, all important x’s in model, lineary between x and y
multicollinearity
high correlation between 2 ore more IV’s (usually measures of similar constructs)
multicollinearity can cause
unstable coefficients, large changes in splope and sign changes, increased stand error. sig F test, but not sig t test
Detect multicollinarity
VIF & tol. VIF >3, tol
fixing multicollinarity
combine x’s into one, drope 1 or more IV’s, collect more data
Categorial values
need coding
Different types of coding
dummy, non weighted, weighted, contrast
What will change with coding
regression coeff. ANOVA output is the same
Curvlinear
1st order linear, 2nd quadratic, 3rd cubic
Determining curvlinear
a priori, scatterplot, problems in residual plot
centering
creates deviation around x & makes it meaninful an dmore interpreatable, decreases non essential multicollinearity
scaling
leads to non essential multicollinearity
enhancing interactions
strengthens relationship
buffering,
weakens or cancels out relationsihp
Buffering canceling out
life statisfaction= increased job stress, decreased marital problems
Logistic regression
predicting dichotomous outcomes
probability
uses percentages of something happening
odds
ratio of something NOT happening 1:2. (2 x’s greater it won’t happen)
logits
transfromed probabilty, linear
prob
negative logit
Prob > .5, odds >1
positive logit
Simultaneous regression
IV’s treated simultaneous on equal footing. leaves every variable with squared partial correlation. restricts amount of variaces accounted for in y
hierarchial model
iv’s entered cumlatively according to research design. needs r^2 and partial coeff at each point of additon to equation. variance get portioned by entry of variables. assigns causal priority to most appropriate. 1st entry gets greater variance in y than others
stepwise
selects predictors based on relative contribution to model. Uses algorithim, little control. a posterior; computer selects greates part2 and greatest contribution to r2. bwd-smalled statististically insignificant factors are dropped. doesn’t rely on research question
sequential
multivariate technique of using missing data by using sequential regression models