Descriptive Comparison Of Groups Flashcards

1
Q

What does the term ‘exposure variable’ mean?

A

Often in research one of the variables in the study is thought of as the outcome variable of interest and this is usually a measure or marker of disease. You may also want to find out whether this outcome differs between different exposure groups, eg between smokers and non-smokers. In other words, you want to know about the association between outcome (disease) and exposure by comparing the amount of disease between exposure groups. The variable representing these exposure groups (eg smoking status) is known as the exposure variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What information can we gain from comparing the outcomes between exposure groups?

A

We can look at the association between outcome (eg disease) and exposure by comparing the amount of disease between exposure groups. The variable representing these exposure groups (eg smoking status) is known as the exposure variable. Looking at such associations helps us to identify risk factors for disease.

We can also look at associations between treatment and disease, for example, we may want to know whether platelet count differs between people on active treatment or placebo. In this case ‘treatment group’ is thought of as the exposure variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a study looking at how smoking status affects disease rates what is the exposure variable and what is the outcome variable?

A

Smoking status is the exposure variable and disease is the outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a study looking at whether platelet count differs between people on active treatment or placebo what is considered to be the exposure variable and what is considered to be the outcome variable?

A

The outcome variable is platelet count and the exposure variable is treatment group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the exposure variable sometimes referred to as?

A

The explanatory variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the outcome variable sometimes referred to as?

A

The response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the steps you would take in the descriptive comparison of a continuous outcome variable - for example ‘amongst survivors of lung cancer, does lung function differ between men and women?’

A

1) . Determine whether the outcome variable is normally distributed by drawing a histogram of either the whole sample or the two groups separately
2) . Compute summary measures for the two groups separately. If the outcome variable is normally distributed you will need to compute the mean and standard deviation in the two groups. If the outcome is not normally distributed then the 2 medians and their interquartile range can be represented.
3) . Compute a measure of effect. We now want some feel of how much higher one measure of an outcome variable is compared to another (i.e. some measure of effect). For normally distributed the measure of effect is the difference in means. This will normally be the mean in the exposed group minus the mean in the unexposed group.

We can calculate a 95% CI around the mean difference which tells us where the ‘true mean difference’ is likely to lie.

For non normal data we compute the difference in medians. The difference in medians is the median in the exposed group minus the median in the unexposed group. 95% confidence intervals around medians or difference in medians cannot be easily computed.

A confidence interval for the mean difference which spans 0 will always give a non-significant result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How would we go about descriptive comparison of a continuous outcome variable if we have more than 2 exposure groups?

A

1) . First we must decide which group is going to be thought of as the unexposed group. This is often known as the reference or baseline group, and it is the group against which the other groups will be compared. If the exposure variable is ordered then we usually choose the lowest or highest group. It is also best avoiding groups with only a small number of people in.
2) . The mean difference (and 95% CI) is then calculated for each group (other than the reference group) in the usual way.

With more than 2 groups, the 95% CIs cannot be used to assess whether there is likely to be a true overall association or not. An appropriate hypothesis test would need to be carried out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If an outcome is normally distributed what summary measures would you use?

A

The mean and standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is an outcome is not normally distributed what summary variables would you use?

A

The median and interquartile range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would you calculate the difference in means between two continuous outcome variables?

A

Difference in means = mean in the exposed group - mean in unexposed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How could we calculate where a true mean difference is likely to lie?

A

Calculate a 95% confidence interval for the mean difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What measure of effect between groups would be use for normally distributed data?

A

The difference in means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What measure of effect between groups would we use for non- normally distributed data?

A

The difference in medians.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How would you go about descriptive comparison of a binary outcome variable? (E.g. does the proportion of people with disease differ between smokers and non-smokers)

A

1). Compute the descriptive statistics (proportions/percentages). For cross-sectional and cohort studies we are interested in whether there is a difference in risk of developing disease in each of the exposure groups. This data can therefore be summarised by computing the percentage or proportion with disease in the exposed or unexposed groups separately. The proportion with disease in a cross-sectional study can be thought of as the risk of disease.

To calculate the percentage or proportion with disease in the exposed and unexposed groups we can create a cross-tabulation which shows how many diseased and disease-free individuals there are for smokers and non-smokers separately, as well as for the whole sample.

Risk in whole sample population = number of people with disease / total number of people in study population

Risk in exposed group = diseased in exposed group / total number in exposed group

This would not work for a case-controlled study because in case-controlled studies the number of individuals with and without disease is fixed by the investigators, rather than relying on the prevalence and incidence of a disease. Instead, you should present the proportion of each exposure group within each outcome group (row percentages).

2). We then need to compute a measure of effect. There are three common measures of effect used for a binary outcome, risk difference, risk ratio or odds ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For what kinds of studies is calculating percentages/risks in each exposure group applicable for?

A

Calculating percentages/risks in each exposure group is applicable for cross-sectional and cohort studies. It is not applicable for case-controlled studies.

17
Q

What should you do instead of calculating percentages/risk in each exposure group in case-controlled studies.?

A

Instead, you should present the proportion of individuals in each exposure group within each outcome group - show row percentages.

18
Q

In descriptive comparison of a binary outcome variable what kinds of studies would you use column percentages for in our cross tabulation table for?

A

Cross-sectional and cohort studies.

Here column percentages can tell you the prevalence of the outcome in each exposure group - E.g. the percentage of diseased individuals that smoke compared to the percentage of non-diseased individuals that smoke. This will tell you the prevalence of disease amongst smokers and non-smokers.

19
Q

In descriptive comparison of binary outcome variables for what kind of studies would we use row percentages in our cross-tabulation tables?

A

Case-controlled studies.

Row percentages can tell you the proportion of your cases (diseased individuals) that are in an exposure group (smoke) and the proportion of your controls (non-diseased) that are in an exposure group.

For example we may find that a higher proportion of our cases smoke than our controls.

20
Q

Describe what you would use risk difference computation for and how you would calculate is.

A

Risk difference is a measure of effect for descriptive comparison of a binary outcome variable for cross-sectional/cohort studies.

We can simply subtract one proportion from another to give the risk difference.

Risk difference = risk of disease in exposed - risk of disease in unexposed

= proportion (or %) with outcome in exposed - proportion (or %) with outcome in unexposed

This tells us on an absolute scale how much greater the risk of disease is in the exposed group compared to the unexposed, or in other words, the increase in disease prevalence we might expect by virtue of being a smoker in the example we used. We can then work out the 95% confidence interval around this difference.

21
Q

What is the risk ratio and what would we use it for?

A

The risk ratio is a measure of effect computation that we can use in descriptive comparison of a binary outcome in cross-sectional/cohort studies.

Ratio measures determine the strength of association between a risk factor and disease.

Instead of subtracting one risk from another, we can divide one by another to give a risk ratio (or prevalence ratio):

  • Risk ratio = risk (proportion or %) of disease in exposed / risk (proportion or %) of disease in the unexposed
  • a risk ratio of 1 means no association with exposure, as the proportion with disease will be the same in both groups.
  • a risk ratio of less than one means that exposure is protective, e.g. A value of 0.5 means the exposed group are half as likely to have disease compared to the unexposed.
  • a risk ratio greater than one means that the exposure increases the risk of disease, e.g. A value of 2 means the exposed group are twice as likely to have disease compared to the unexposed.

We can then also consider the 95% confidence interval around our risk ratio. The formula for computing a confidence interval around a risk ratio by hand is quite complex and won’t be covered here.

This tells us how much disease there is in the exposed group relative to the unexposed group. In other words, how many times more likely the exposed group are to have the disease compared to the unexposed group.

Risk ratios will always be a positive number (greater than zero).

22
Q

What is an odds ratio and what would we use it for?

A

An odds ratio is a measure of effect that can be computed as a descriptive comparison of a binary outcome variable that is used for case-controlled studies in particular.

An alternative ratio we can measure for binary outcomes is the odds ratio. This is based on the odds of disease rather than the risk (or %). The odds of disease is calculated by dividing the number of people with the disease by the number of people without the disease (rather than dividing by the total number as we do for risk).

Odds = Number of people with disease / Number of people without disease

We can then compute the odds separately for the exposed and unexposed groups using the above formula. The ratio of these is the odds ratio:

Odds ratio = Odds of disease in the exposed / Odds of disease in the unexposed

The meaning of the odds ratio is less intuitive than the risk ratio because we are dealing with odds not risks. However if the disease or outcome is fairly rare (

23
Q

Describe how you would use odds ratio as a measure of effect when you have more than 2 exposure groups.

A

When we have more than 2 exposure groups we must choose one group as our reference or baseline group. We then compute our measure of effect (and 95% CI) for each of the other groups relative to this baseline group (i.e. we will compute an odds ratio for each group compared to the baseline group). We put 1 as the odds ratio in the baseline group to indicate that this is the comparison group, and no confidence interval is computed for this group.

24
Q

How would you go about descriptive comparison of a categorical outcome variable?

A

Unlike for a binary outcome variable some outcomes or disease variables may be represented by more than just two categories. For example disease severity may be measured as mild, moderate or severe.

For example ‘does disease severity differ between those on treatment and those on placebo?’ - the outcome here is disease severity (mild, moderate or severe) and the exposure is treatment (active or placebo).

To compare our outcome between two exposure groups, we just do a cross tabulation and then compute the percentage falling into each ‘outcome’ category for exposed and unexposed separately.

It is not possible to compute a measure of effect for categorical outcome variables with more than 2 categories. All you can do is recode the outcome to make it binary - e.g. mild and moderate/severe, and then compute a risk ratio of the risk or risk difference or odds ratio where the moderate/severe people are thought of as the ‘diseased’.

25
Q

When would we likely use ratio measures as a measure of effect for binary outcome?

A

Ratio measures (risk or odds ratio) tend to be used in epidemiological studies because they are better indicators of the aetiological strength of an association than the risk difference, and so help us decide whether an association seen is likely to be casual or not. Also, if they are true, measured correctly and refer to biological phenomena, they are usually generalisable. Whilst ratio measures have come to dominate the epidemiology world, they do not provide the whole story.

26
Q

What does the term ‘outcome variable’ describe?

A

Often in research one of the variables in the study is thought of as the outcome variable of interest and this is usually a measure market of disease.

27
Q

If we are looking at a rare disease in a cross sectional study, how will the risk ratios and odds ratios compare?

A

For rare diseases the risk ratios and odds ratios will be similar.

28
Q

If the disease we are looking at is rare, is the odds ratio a good estimate of risk?

A

Yes

29
Q

For what types of study is it appropriate to use the risk ratio for?

A

Cross-sectional studies and cohort studies

30
Q

If our outcome is common (>10% prevalence) what can we say about the odds ratio?

A

It is likely to overestimate the risk

31
Q

What does the risk difference between two exposure groups tell us?

A

It is an estimate of increase or decrease in prevalence of the outcome due to the exposure.

32
Q

What formal tests can be used to assess distributions of continuous variables?

A

The KS or Shapiro-Wilk test