Regression Flashcards
purpose of ordinary least squares regression
technique for finding the best fitting straight line for a set of data
when would you use the sum of squares residular
some residuals are positive, some negative
simplest model (null model)
uses the mean as model
coefficent of determination
amount of variance in the outcome that is explained by the regression line compares to that explained by the mean
MSm
How much the model has improved the prediction
MSr
level of inaccuracy of the model
spearman’s correlation coefficent
non parametric statistica based on ranked data
can the coefficient of determination be used to determine causality?
nope
square of pearson’s gives you what
portion of squared variance
square of spearman’s gives you what
proportion of variance in ranks that 2 variables show
can you square kendall’s tau
nope
outcome variable
dependent variable
predictive variable
independent variable
simple regression
1 predictor
multiple regression
multiple predictors
residulars
difference between what the model predictes and observed data
how to assess error in a regression model
sum of squared residulas
F-Tests are vased on what
ratio of improvement due to the model
degrees of freedom for the sum of squares of model
number of variablesin model
degrees of freedom for sum of squares of residula
number of observations - number of parameters being estimated
standardized residulars
residulars are converted to z scores
studentized residual
unstandardized residula dvidied by an estimate of it’s standard deviation that varies point by point
deleted residual
adjusted predicted value - original observed variable
cook;s distance
considers the effect of a single case on the model as a whole
mahaladi’s distance
measure the distance of cases from the mean(s) of the predictor variable(s)
what type of distibution does mahaldi’s distance have
chi-squared
how do you determine degrees of freedom for mahalandi’s distance
number of predictors
DFBeta
parameter estimated using all cases - estimated when 1 case is excluded
DFFit
predicted value for a case when the model is calculated including that case - excluding that case
if a case is not influential what would DFFit be
0
Covariance ratio
where a case influences the variance of regression parameteres
assumptions of linear model
additivity ahnd linearity independent errors homoscedasticity normally distributed errors predictors are uncorrelated with external variables variable types
additivity
combined effect of predictors is best descrived by adding their effects together
Durbin-watson test
tests for serial correlation between errors (assumption of independent errors)
what does the Durbin Watson test statistic vary from
0 - 4
Durbin Watson test statistic of 2
residuals are uncorrelated
Durbin watson test statistic > 2
negative correlation
Durbin watson test statistic < 2
positive correlation
What happens to a linear model if the independent errors assumption is broken
CI and significance tests are invalid
method of least squares still ok
types of predictor variables allowed in a linear regression
quantiative or categorical (dichotomous)
types of outcome variables allowed in a linear regression
quantitative, continuous, and unbounded
unbounded variable
no constrants on the variability of outcome
no perfect multicollinearity
no perfect linear relationship between 2+ predictors
cross-validation fxn
assess the accuracy of a model across different samples
adjusted R^2
tells us how much of the variance in Y is accounted for if the model had to be derived from the population from which the sample was taken
Data splitting
split your data then run a regression equation on both halves of your data then compare
sample size needed for an expected large effect
77
sample size needed for an expected medium effect
160
b-value
tells us the gradient of the regression line and the strength of relationship between a predictor and the outcome
F
tells us how much variability the model can explain vs. what it doesn’t explain
hierarchiacal regression
predictors are selected based on past work and research decides in which order to enter predictors into the model (known predictors get entered first)
forced entry
all predictors are forced into model simultaneously
stepwise regression
decision about order predictors are entered are purely mathmatical
suppressor effects
predictor has a singificant effect but only when another variable is held constant
forward method has a higher risk of what type of errors
type II
akaike information criterion (AIC)
measure of fit which penalizes the model for having more variables
perfect coliinearity
at least one predictor is a perfect linear combination of the others (correlation coefficent of 1)
As collinearity increases what else increases
standard errors
test that looks at collinearity
variance inflation factor (VIF)
tolerance
1/VIF
When should we be concerned about VIF
when the largest is >10 or the average is >1
When should be be concerned about tolerance
<0.2 is potential problem
<0.1 is seirous problem