The analysis of categorical data Flashcards by Stu Gibson

Categorical and ordinal data?

If not continuous, then is categorical. If ordered, i.e. tumour stage I-IV, then is ordinal. If not e.g. ABO then unordered. Unlike continuous data, a patient who is stage IV is not twice as bad as stage II.

How well did you know this?

Not at all

Perfectly

Parameters in binary data?

Only thing that can change is the % of the population that have an attribute (π). Can also be interpreted as the probability that a randomly chosen member of the population has an attribute. As before, π is unknown.

How well did you know this?

Not at all

Perfectly

Estimating π?

Very simple. If have n patients, r will have the attribute and n-r will not. The estimator of π is simply r/n or 100*r/n (as a percentage). Although this is effectively a mean for the 0s and 1s, we need different methods.

How well did you know this?

Not at all

Perfectly

Why do we need new methods for binary data?

To do with SD. In continuous variables, have μ (estimated by sample mean m) and σ (which is independent of it). In binary, there is only one parameter which is clearly an analogue of μ. Must have SD that relates to the mean.

How well did you know this?

Not at all

Perfectly

SD/SE in binary?

Look at how well r/n estimates π: as this incorporates sample size it is SE-like. Continuous variable SE = σ/√n; for binary do √[π(1-π)/n]. Both have n on denominator so SE shrinks with larger sample. Key difference is that once π has been estimated there are no other variables needed for spread. This is why methods for continuous variables must change.

How well did you know this?

Not at all

Perfectly

Example for why binary SE formula used?

If have π=0, it is impossible for r (the sample) to be anything other than 0 and so the SE must be 0. This is why *π is in the numerator. Similarly, if π=1 then r=1 and there can be no error. This is why *(1-π) is in the numerator: again gives 0 and no error.

How well did you know this?

Not at all

Perfectly

What is χ2 test an analogue of?

Unpaired T test!

How well did you know this?

Not at all

Perfectly

Null hypothesis in χ2?

That π1=π2 i.e that there is no difference in the two populations from which the attributes are present. π looks at proportion; work out how many would expect to (die) if population proportions were equal.

How well did you know this?

Not at all

Perfectly

χ2 size and sign?

Can never be negative, and only 0 when tables exactly identical.

How well did you know this?

Not at all

Perfectly

Things to remember about χ2?

Must be counts, not %. This is because must account for sample sizes i.e. 2/10 not the same strength as 200/1000, and the latter will be much more sensitive to departures from expected. Must also be independent counts i.e. remember counting children again and again. Otherwise will make P value much more convincing purely because the counts are larger, but the difference is no more real. A useful way to check this is to make sure that in the margins, the bottom right number i.e. the grand total is the same as the number of independent units.

How well did you know this?

Not at all

Perfectly

When is χ2 not appropriate?

When expected values are less than 5 in a 2*2 table. If table is larger, then if over 20% of cells have E values below 5, or any with E below 1. Use Fisher’s instead!

How well did you know this?

Not at all

Perfectly

Why is Fisher’s different to others?

Calculates P statistic directly from the data, rather than using score like χ2 or T score

How well did you know this?

Not at all

Perfectly

Why is it called Fisher’s exact?

No need for the asymptotic approximation seen in χ2, just use actual tables.

How well did you know this?

Not at all

Perfectly

Why not use Fisher’s all the time?

Computing power. Also, get wide CIs.

How well did you know this?

Not at all

Perfectly

Finding probability from odds?

If odds = 2, then probability = 2/(1+2)

How well did you know this?

Not at all

Perfectly

Finding odds from probability?

Study These Flashcards

If probability = 2/3, then odds =(2/3)/(1-2/3)=2=2:1

Three ways for describing a difference between parameters π1 and π2?

Study These Flashcards

Absolute difference D = π1-π2
Relative risk R = π1/π2
Odds ratio = (calculate odds from π normally i.e.)
(π1/(1-π1))/(π2/(1-π2))

Null hypothesis for absolute difference, relative risk and odds?

Study These Flashcards

D = 0, R = 1, OR = 1

Significance of CIs for D?

Study These Flashcards

If P<0.05 for π1=π2, then 95% CI will necessarily not include the null value, 0.

Standard error for OR?

Study These Flashcards

First: SE(LnOR) = √(1/34)+(1/68)+etc. The square root of the sum of the reciprocals seen in the table.

Confidence intervals for lnOR?

Study These Flashcards

Calculate OR, then lnOR, then SELnOR. 95% CIs = lnOR+/-1.96*SELnOR then do antilogs of the CIs.

Confidence interval significance for lnOR?

Study These Flashcards

Again, if P<0.05, then 95% CIs will not include the null value i.e. 1.

Midpoint of CIs?

Study These Flashcards

For most things, the midpoint of the 95% CIs will be your point estimate. This is not the case for OR.

What does interaction mean?

Study These Flashcards

Effect of one variable depends on the level of another i.e. effect modifier

Why does difference in P values re interaction not mean that there is a difference?

Because P value is a composite of standard error and treatment effect.

Calculating SE of difference in means?

As means themselves have their own SEs (se1 and se2), SEdiff=√se1(squared)+se2(squared)

Selection of subgroups problems?

Must choose which are to be analysed before the start of the study. Otherwise, if have ten variables, there are 45 potential interactions so reasonable chance that one would be significant.

The analysis of categorical data Flashcards

(27 cards)