Lecture 7: Binary Predictors Flashcards

1
Q

Binary variables

A

Can be nominal or ordinal

Nominal: biological sex (male/female), student ethnicity (Dutch/foreign student), has a tattoo, has a pet (Yes/no)

Ordinal: performance on exam question (fail/pass), or risk of disease (low/high)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression with binary predictors

A
  • Assign the value 0 to one of the categories (e.g., women)
  • This is the “reference category”
  • Assign the value 1 to the other category (e.g., men)
  • You can enter your data this way when creating your dataset, or “recode” existing variables
  • Regression will estimate the mean of the reference category and test the difference between the two categories
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dummy coding

A

Used to ensure meaningful results. Dummy codes represent the binary variables. Dummy coding assigns the value 0 to one category (the reference category), and the value 1 to the other category. When we include this dummy variable as the predictor in a bivariate linear regression analysis, it will estimate the mean value of the reference category and test the difference between the means of the two categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Regression formula with binary variables

A

Yi = a + b * Xi

With the example of shoe size, the formula would look like this:

Yi = individual predicted value of shoesize

Xi = Sex (0 = woman, 1 = man)

When we fill this formula in for women, we’ll see Yi = a + b * 0 = a

When we fill this formula in for men, we’ll see Yi = a + b * 1 = a + b

The predicted shoesize for men is the intercept (a) plus the difference between men and women (b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Independent samples T-test

A

Used to compare the means of two independent groups. It is equivalent to the T-test of the slope in regression with a binary predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assumptions in the independent samples T-test

A

Has the same assumptions as bivariate linear regression, with slight nuances:

  1. Linearity of relationship between X and Y
    - Difference between two groups is linear by definition => therefore, we can ignore the assumption of linearity
  2. Normality of residuals: the outcome is normally distributed in each group.
  3. Independence of observations
  4. Homoscedasticity: equality of variances in both groups. Can be tested by using Levene’s test.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Levene’s test

A

We use this test to test the assumption of homoscedasticity/check for equal variances. It is an F-test, where 2 sources of variance are compared.

The null hypothesis of Levene’s test is that the population variances of the group are equal.

The alternative hypothesis is usually that at least one of the groups has a different variance than the others.

The T-test then produces a P-value that helps you assess the likelihood of observing the calculated differences in variances if the null hypothesis were true. If the p-value is below a predetermined significance level, you may reject the null hypothesis. This suggests that there is evidence to conclude that at least one group has a different variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Effect size measures

A

Help us understand how big or strong an effect or relationship is in our research. Effect size measures standardise the difference between the group means, making them interpretable on a meaningful scale (i.e., a scale on the number of standard deviations).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cohen’s D

A

Tells you how much the mean of 2 groups differs on average, considering the variability within the 2 groups.

Calculated as X1 - X2 / Spooled (where X1 and X2 are the different means and Spooled is the pooled standard deviation, showing the average spread of all data points about their group mean).

Rule of thumb:
Small effect size: d = 0.2 (mean difference is around 1/5th of the standard deviation)
Medium effect size: d = 0.5 (mean difference is about 1/2 of the standard deviation)
Large effect size: d = 0.8 (mean difference is around 4/5th of the standard deviation)

The larger d, the bigger the mean of the 2 groups differ on average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly