Categorical Variables in Regression Flashcards
Degrees of freedom
For correlations & simple linear regression- df=n-2
For multiple regression, report df that ANOVA gives
Predictor Variables
Typically regressions use internal-level data
Regressions are robust so can use ordinal or categorical predictors
Categorical variables
Can include categorical predictors in regression analysis Binary categorical (2 categories) easier than multicategorical (>2 non-ordinal categories) Still cannot use categorical outcome variables in normal linear regression (requires logistic regression)
Dummy variables
The way we include binary variables as predictors in a regression is to use dummy variables
Rather than using nominal categories in our binary categorical data we need to code one variable as 0 & the other as 1
Then we can look at the effect on our outcome variables of this predictor changing from 0 to 1
This lets us generate beta score for categorical data
Why code dummy variables
Regressions can only handle numerical variables
Dummy coding for multicategorical variables
Need to create a dummy variable for all but one of your category levels
This only works by keeping one category as the reference category which all other categories in your variable are compared to (usually neurotypicals)
How logistic regression lets us predict a binary outcome
Logistic regression only for binary outcome
Trying to fit straight line of best fit won’t do much good for a binary outcome
Rather than looking at simple linear effects, can approximate the change with a sigmoid function (S-shape)
Means that logistic regression is based on different set of statistical assumptions
No more assumption of linearity
In logistic regression, aren’t looking at linear effects of X’s on Y
Means that our assumptions in regression & parametric estimate break down
Changing our assumptions means that logistic regression can’t use the same parametric tests
Regression model
Effects we’re looking at are non-linear so can’t use ANOVA, can use other maths to give values interpreted in similar way
SPSS gives 2 estimates of what equivalent R^2 would be in linear regression.
Can report & interpret these just like standard R^2
Fit statistics
Can use fit statistics- -2 log likelihood (-2LL)
Not a significant/non-significant outcome
The higher the -2LL score, the better the model fits the data
Always relative, what counts as low or high score depends on sample size
Odds ratios
For each predictor in logistic regression, report B(italic) value
Also report odds ratio- odds of change in y value for single point increasing corresponding x value
Odds are another war if expressing probability
If the OR for a predictor is <1, predictor makes outcome variable less likely (like -ve B(italics) value)
If OR for predictor is >1 that predictor makes the outcome more likely
Then we fear whether ORs differ from 1 yang 95% confidence intervals
Multicategorical outcome variables
Can group multicategorical into binarys before doing a regression