The analysis of categorical data Flashcards

1
Q

Categorical and ordinal data?

A

If not continuous, then is categorical. If ordered, i.e. tumour stage I-IV, then is ordinal. If not e.g. ABO then unordered. Unlike continuous data, a patient who is stage IV is not twice as bad as stage II.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parameters in binary data?

A

Only thing that can change is the % of the population that have an attribute (π). Can also be interpreted as the probability that a randomly chosen member of the population has an attribute. As before, π is unknown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Estimating π?

A

Very simple. If have n patients, r will have the attribute and n-r will not. The estimator of π is simply r/n or 100*r/n (as a percentage). Although this is effectively a mean for the 0s and 1s, we need different methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we need new methods for binary data?

A

To do with SD. In continuous variables, have μ (estimated by sample mean m) and σ (which is independent of it). In binary, there is only one parameter which is clearly an analogue of μ. Must have SD that relates to the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SD/SE in binary?

A

Look at how well r/n estimates π: as this incorporates sample size it is SE-like. Continuous variable SE = σ/√n; for binary do √[π(1-π)/n]. Both have n on denominator so SE shrinks with larger sample. Key difference is that once π has been estimated there are no other variables needed for spread. This is why methods for continuous variables must change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Example for why binary SE formula used?

A

If have π=0, it is impossible for r (the sample) to be anything other than 0 and so the SE must be 0. This is why *π is in the numerator. Similarly, if π=1 then r=1 and there can be no error. This is why *(1-π) is in the numerator: again gives 0 and no error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is χ2 test an analogue of?

A

Unpaired T test!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Null hypothesis in χ2?

A

That π1=π2 i.e that there is no difference in the two populations from which the attributes are present. π looks at proportion; work out how many would expect to (die) if population proportions were equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

χ2 size and sign?

A

Can never be negative, and only 0 when tables exactly identical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Things to remember about χ2?

A

Must be counts, not %. This is because must account for sample sizes i.e. 2/10 not the same strength as 200/1000, and the latter will be much more sensitive to departures from expected. Must also be independent counts i.e. remember counting children again and again. Otherwise will make P value much more convincing purely because the counts are larger, but the difference is no more real. A useful way to check this is to make sure that in the margins, the bottom right number i.e. the grand total is the same as the number of independent units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is χ2 not appropriate?

A

When expected values are less than 5 in a 2*2 table. If table is larger, then if over 20% of cells have E values below 5, or any with E below 1. Use Fisher’s instead!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is Fisher’s different to others?

A

Calculates P statistic directly from the data, rather than using score like χ2 or T score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is it called Fisher’s exact?

A

No need for the asymptotic approximation seen in χ2, just use actual tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why not use Fisher’s all the time?

A

Computing power. Also, get wide CIs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Finding probability from odds?

A

If odds = 2, then probability = 2/(1+2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Finding odds from probability?

A

If probability = 2/3, then odds =(2/3)/(1-2/3)=2=2:1

17
Q

Three ways for describing a difference between parameters π1 and π2?

A
  1. Absolute difference D = π1-π2
  2. Relative risk R = π1/π2
  3. Odds ratio = (calculate odds from π normally i.e.)
    (π1/(1-π1))/(π2/(1-π2))
18
Q

Null hypothesis for absolute difference, relative risk and odds?

A

D = 0, R = 1, OR = 1

19
Q

Significance of CIs for D?

A

If P<0.05 for π1=π2, then 95% CI will necessarily not include the null value, 0.

20
Q

Standard error for OR?

A

First: SE(LnOR) = √(1/34)+(1/68)+etc. The square root of the sum of the reciprocals seen in the table.

21
Q

Confidence intervals for lnOR?

A

Calculate OR, then lnOR, then SELnOR. 95% CIs = lnOR+/-1.96*SELnOR then do antilogs of the CIs.

22
Q

Confidence interval significance for lnOR?

A

Again, if P<0.05, then 95% CIs will not include the null value i.e. 1.

23
Q

Midpoint of CIs?

A

For most things, the midpoint of the 95% CIs will be your point estimate. This is not the case for OR.

24
Q

What does interaction mean?

A

Effect of one variable depends on the level of another i.e. effect modifier

25
Q

Why does difference in P values re interaction not mean that there is a difference?

A

Because P value is a composite of standard error and treatment effect.

26
Q

Calculating SE of difference in means?

A

As means themselves have their own SEs (se1 and se2), SEdiff=√se1(squared)+se2(squared)

27
Q

Selection of subgroups problems?

A

Must choose which are to be analysed before the start of the study. Otherwise, if have ten variables, there are 45 potential interactions so reasonable chance that one would be significant.