6A Flashcards
What are the three main topics of this lecture?
- Regression with binary categorical independent variables. 2. Regression with non-binary categorical independent variables. 3. How to handle ordinal variables in regression.
What are the two key characteristics used to classify variables?
- Logical order – Do the values follow a natural ranking?
- Equal distances – Are the gaps between values consistent?
What are examples of variables with a logical order?
Monthly income, education level, left-right identification, immigration attitudes.
What are examples of variables without a logical order?
Vote choice (e.g., VVD, CDA, PVV), religion (e.g., Catholic, Protestant, Muslim).
What are examples of variables with equal distances?
Monthly income, left-right identification, immigration attitudes.
What are examples of variables without equal distances?
Educational level (e.g., primary, secondary, higher education).
What are the three types of variables?
- Categorical – No logical order (e.g., vote choice). 2. Ordinal – Logical order, but unequal distances (e.g., education level). 3. Continuous – Logical order with equal distances (e.g., income).
What are binary categorical variables?
Variables with only two categories, coded as 0 and 1 (e.g., voted = 1, did not vote = 0).
What are non-binary categorical variables?
Variables with more than two categories (e.g., religion: Catholic, Protestant, Muslim).
What are dummy variables (dichotomous variables)?
Binary categorical variables coded as 0 and 1 for use in regression.
Why should the dependent variable (Y) always be continuous in linear regression?
Linear regression assumes that Y follows a continuous distribution, allowing for meaningful interpretation of coefficients.
How can binary categorical variables be included in regression?
They are directly included as independent variables, coded as 0 and 1, and interpreted the same way as continuous variables.
What does the intercept represent in a regression with a binary independent variable?
The predicted value of Y when the binary variable equals 0 (i.e., the reference group).
What does the coefficient (b₁) of a binary independent variable represent?
The difference in the dependent variable (Y) between the 1 and 0 groups.
How is regression with a binary independent variable related to a t-test?
A simple regression with a binary independent variable produces the same results as an independent-samples t-test.
Why can non-binary categorical variables not be used directly in regression?
Regression requires numerical input, so categorical variables must be converted into multiple binary variables (dummy variables).
What is a reference category in dummy coding?
The category that is not assigned a dummy variable. All other categories are compared against it.
What happens if a respondent has 0s on all dummy variables?
They belong to the reference category.
How are dummy variables used in regression?
Each category (except the reference) gets a binary variable (1 = in category, 0 = not in category).
What is an example of a dummy variable transformation?
If religion has four categories (Catholic, Protestant, Muslim, Non-religious), three dummy variables are created: Catholic (1/0), Protestant (1/0), Muslim (1/0), with Non-religious as the reference category.
How is the regression coefficient of a dummy variable interpreted?
The coefficient represents the difference in the dependent variable (Y) between that category and the reference category.
What is the advantage of using SPSS’s automatic dummy variable coding?
- It is faster and easier than manual coding. 2. It provides a p-value for the entire categorical variable. 3. It provides means for each category, aiding interpretation.
What does the F-test in regression measure for categorical variables?
It tests whether the categorical variable as a whole (all dummy variables together) has a significant effect on Y.
What does R² measure in regression?
The proportion of variance in the dependent variable (Y) explained by the independent variables.
How can the effect size of a categorical variable be evaluated?
- Unstandardized effect size – Compare category means from SPSS output. 2. Standardized effect size – Calculate the Adjusted R² change when adding the categorical variable to the model.
How is Adjusted R² Change calculated for categorical variables?
Model with categorical variable – Model without categorical variable = Adjusted R² Change (indicates how much variance is explained by the categorical variable).
What are ordinal variables?
Variables with a logical order but unequal distances between values (e.g., education level: primary, secondary, higher education).
How can ordinal independent variables be modeled in regression?
They can be treated as continuous or categorical.
What is the advantage of modeling an ordinal variable as continuous?
It simplifies the analysis and avoids unnecessary complexity if the variable behaves approximately linearly.
How can we decide whether to model an ordinal variable as categorical or continuous?
Compare the Adjusted R² values for both specifications. If the categorical model does not improve Adjusted R² substantially, use the continuous specification.
What is an example of modeling an ordinal variable as categorical vs. continuous?
In a study on education and immigration attitudes, treating education as categorical (multiple dummy variables) only slightly improved Adjusted R² compared to a continuous model, so the simpler continuous model was preferable.