Week 4- Associations between categorical variables Flashcards

1
Q

What is a categorical variable

A

Categorical variables represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is chi-square- definition

A

Non-parametric tests or inference for categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is chi-square

A
  • Measures the relationship between 2 or more nominal variables, questioning whether observations are contingent upon another categorical variable
  • Tests whether frequency counts can be expected by chance or whether there’s a relationship between the categorical variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of chi-square. Testing whether2 nominal variables are associated

A
  • Is gender associated with preferred subject?
  • Is ownership of a dog associated with residence?
  • Is smoking associated with drinking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hypothesis testing

A
  • Null- there is no association between the 2 variables
  • Alternative- the two variables are associated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Steps for constructing chi-squared by hand

A
  1. Note the observed frequencies in each cell in a contingency table
  2. Calculate the expected frequency for each cell by multiplying the two relevant
    marginal totals (the row total and the column total) for that cell and divide by the
    total number of participants in the sample.
  3. Calculate chi-square (see formula above) by:
    a. Calculating the difference between the observed and the expected frequency
    for each cell and squaring that number;
    b. Dividing the result by the expected frequency for that cell.
  4. Add up the results for all cells to acquire chi-square.
  5. Determine the degrees of freedom by multiplying the number of columns minus 1
    by the number of rows minus 1 (see formula above).
  6. Look up the significance of chi-square in the table below
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Assumptions of chi-squared

A

-Independence
> Data cannot be related (must use distinct nominal categories)- cannot fall into both categories. Between subjects
- Raw frequencies
> Should be conducted on raw frequencies not percentages
- Sample/ cell size
> No expected cell frequencies should be less than 1 and no more than 20% should be less than 5
- If don’t meet this can collapse categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Other important things to note when reporting chi-squared

A
  • Percentages
    > Chi-square should be conducted on raw frequencies not percentages. But percentages are useful to report in addition to these raw frequencies
  • Cramer’s V
    > Measure of effect size
  • Variance accounted for
    > We can square the effect size to see how much variance one variable can be accounted for by other variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are standardised residuals

A
  • Help determine which cells are contributing to the ‘significant association’
  • They’re z scores indicating how many SD’s above or below an expected count an observed count is
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are degrees of freedom?

A

-the number of independent pieces of info that went into calculating the estimate
e.g. 9, 10 and mean of 30, other number must be 11

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Issues with chi-square

A
  • raw frequency counts NOT percentages
    AND
  • Larger contingency tables can be difficult to interpret
  • So can use
    > Standardised residuals to determine main contributors
    > Partitioning- carry out multiple 2 times 2 chi-squares
    > Combine categories
How well did you know this?
1
Not at all
2
3
4
5
Perfectly