stats 10 Flashcards

1
Q

Binary variables are usually coded as

A

0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

dummy variable trap

A

perfect multicollinearity that results from the inclusion of dummy variables representing each possible value of a categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Perfect multicollinearity

A

when there is an exact linear relationship between any two or more of a regression model’s independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The coefficient for the binary X variable indicates the difference in the Y variable between the respective category and the

A

reference category (the one omitted in the dummy coding)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

This coefficient provides insights into

A

how the response varies across different groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reference category

A

in a regression model, the value of a categorical independent for which we do not include a dummy variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Categorical independent variables can be used in

A

interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interactive models

A

multiple regression models that contain at least one independent variable that researchers create by multiplying together two or more independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Use an interaction model in multiple regression if

A

you suspect that the effect of one independent variable on the dependent variable varies depending on the level of another independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A significant interaction effect between income and voter status indicates that the increase in donations with income is greater for voters than for nonvoters, suggesting that voter status —- the relationship between income and donations

A

moderates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Moderation

A

the alteration of the relationship between two variables by a third variable, indicating that the effect of one variable on an outcome changes depending on the level or category of the modifying variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interaction effects can be modeled between — of two categorical variables, two numeric variables, or one of each.

A

any combination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interaction Between a Categorical and a Numeric
Variable

A

The effect of a numeric variable on the dependent variable is modified by a categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interaction Between Two Categorical Variables

A

In this case, the interaction term assesses how the effect of one categorical variable on the dependent variable changes based on the levels of another categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interaction Between Two Numeric Variables

A

Here, the interaction term assesses how the relationship between one numeric variable and
the dependent variable changes at different levels of another numeric variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When an exposure and an outcome independently
cause a third variable, that variable is termed a —

A

‘collider’.

17
Q

Inappropriately controlling for a collider variable, by study design or statistical analysis

A

results in collider bias

18
Q

Influential case

A

a case in a regression model
which has either a combination of large leverage and a large squared residual or a large DFBETA score

19
Q

An influential case can be influential if it has large

20
Q

Leverage

A

in a regression model, the degree to which an individual case is unusual in terms of its value for a single independent variable, or its particular combination of values for two or more independent variables

21
Q

A case can be influential if it has a large

A

squared residual value.

22
Q

A large residual value indicates that the
observed data point deviates markedly from

A

the predicted outcome

23
Q

A case can be influential if it has both

A

large leverage and a large squared residual value

24
Q

DFBETA is a diagnostic measure used in regression analysis to

A

assess the influence of
individual data points on the estimated coefficients
of the model. It quantifies the change in each regression coefficient when a specific observation is removed from the dataset.

25
Q

DFBETA score

A

A statistical measure for the
calculation of the influence of an individual case on the value of a single parameter estimate

26
Q

How to deal with influential cases in regression

A
  1. Check for data collection or management problems.
  2. Don’t do anything.
  3. Delete the relevant observations.
  4. Dummy out the influential cases
27
Q

Dummying out

A

adding a dummy (binary) variable to a regression model to measure and isolate the effect of an influential observation

28
Q

High multicollinearity

A

in a multivariate regression model, when two or more of the independent variables in the model are extremely
highly correlated with one another, making it difficult to isolate the distinct effects of each variable

29
Q

Signs of potential multicollinearity

A
  • two or more of your independent variables are
    theoretically associated,
  • two or more of your independent variables are
    known to correlate,
  • the standard errors for your Beta coefficients are large, or
  • the R2 is unexpectedly large
30
Q

Micronumerosity

A

a situation in statistical
analysis where the number of observations or data
points is very small relative to the number of
variables being analyzed.
* This condition can lead to several issues,
including overfitting, unreliable estimates of
model parameters, and difficulty in generalizing
findings to a larger population.
* When a dataset is micronumerous, there may
not be enough data to adequately capture the
relationships between variables

31
Q

If you detect multicollinearity and cannot get more data, then you need to calculate and report the

A

model variance inflation factor (VIF).

32
Q

Variance inflation factor (VIF)

A

a statistical measure to detect the contribution of each independent variable in a multiple regression model to overall multicollinearity

33
Q

To calculate VIF, estimate an

A

auxiliary regression
model

34
Q

Auxiliary regression model:

A

a model in which
one of the independent variables, Xj, becomes the
dependent variable and all of the other independent variables remain independent variables