stats 10 Flashcards
Binary variables are usually coded as
0 or 1
dummy variable trap
perfect multicollinearity that results from the inclusion of dummy variables representing each possible value of a categorical variable
Perfect multicollinearity
when there is an exact linear relationship between any two or more of a regression model’s independent variables.
The coefficient for the binary X variable indicates the difference in the Y variable between the respective category and the
reference category (the one omitted in the dummy coding)
This coefficient provides insights into
how the response varies across different groups
Reference category
in a regression model, the value of a categorical independent for which we do not include a dummy variable
Categorical independent variables can be used in
interactions
Interactive models
multiple regression models that contain at least one independent variable that researchers create by multiplying together two or more independent variables
Use an interaction model in multiple regression if
you suspect that the effect of one independent variable on the dependent variable varies depending on the level of another independent variable
A significant interaction effect between income and voter status indicates that the increase in donations with income is greater for voters than for nonvoters, suggesting that voter status —- the relationship between income and donations
moderates
Moderation
the alteration of the relationship between two variables by a third variable, indicating that the effect of one variable on an outcome changes depending on the level or category of the modifying variable
Interaction effects can be modeled between — of two categorical variables, two numeric variables, or one of each.
any combination
Interaction Between a Categorical and a Numeric
Variable
The effect of a numeric variable on the dependent variable is modified by a categorical variable
Interaction Between Two Categorical Variables
In this case, the interaction term assesses how the effect of one categorical variable on the dependent variable changes based on the levels of another categorical variable
Interaction Between Two Numeric Variables
Here, the interaction term assesses how the relationship between one numeric variable and
the dependent variable changes at different levels of another numeric variable
When an exposure and an outcome independently
cause a third variable, that variable is termed a —
‘collider’.
Inappropriately controlling for a collider variable, by study design or statistical analysis
results in collider bias
Influential case
a case in a regression model
which has either a combination of large leverage and a large squared residual or a large DFBETA score
An influential case can be influential if it has large
leverage.
Leverage
in a regression model, the degree to which an individual case is unusual in terms of its value for a single independent variable, or its particular combination of values for two or more independent variables
A case can be influential if it has a large
squared residual value.
A large residual value indicates that the
observed data point deviates markedly from
the predicted outcome
A case can be influential if it has both
large leverage and a large squared residual value
DFBETA is a diagnostic measure used in regression analysis to
assess the influence of
individual data points on the estimated coefficients
of the model. It quantifies the change in each regression coefficient when a specific observation is removed from the dataset.
DFBETA score
A statistical measure for the
calculation of the influence of an individual case on the value of a single parameter estimate
How to deal with influential cases in regression
- Check for data collection or management problems.
- Don’t do anything.
- Delete the relevant observations.
- Dummy out the influential cases
Dummying out
adding a dummy (binary) variable to a regression model to measure and isolate the effect of an influential observation
High multicollinearity
in a multivariate regression model, when two or more of the independent variables in the model are extremely
highly correlated with one another, making it difficult to isolate the distinct effects of each variable
Signs of potential multicollinearity
- two or more of your independent variables are
theoretically associated, - two or more of your independent variables are
known to correlate, - the standard errors for your Beta coefficients are large, or
- the R2 is unexpectedly large
Micronumerosity
a situation in statistical
analysis where the number of observations or data
points is very small relative to the number of
variables being analyzed.
* This condition can lead to several issues,
including overfitting, unreliable estimates of
model parameters, and difficulty in generalizing
findings to a larger population.
* When a dataset is micronumerous, there may
not be enough data to adequately capture the
relationships between variables
If you detect multicollinearity and cannot get more data, then you need to calculate and report the
model variance inflation factor (VIF).
Variance inflation factor (VIF)
a statistical measure to detect the contribution of each independent variable in a multiple regression model to overall multicollinearity
To calculate VIF, estimate an
auxiliary regression
model
Auxiliary regression model:
a model in which
one of the independent variables, Xj, becomes the
dependent variable and all of the other independent variables remain independent variables