Descriptive Comparison Of Groups Flashcards
What does the term ‘exposure variable’ mean?
Often in research one of the variables in the study is thought of as the outcome variable of interest and this is usually a measure or marker of disease. You may also want to find out whether this outcome differs between different exposure groups, eg between smokers and non-smokers. In other words, you want to know about the association between outcome (disease) and exposure by comparing the amount of disease between exposure groups. The variable representing these exposure groups (eg smoking status) is known as the exposure variable.
What information can we gain from comparing the outcomes between exposure groups?
We can look at the association between outcome (eg disease) and exposure by comparing the amount of disease between exposure groups. The variable representing these exposure groups (eg smoking status) is known as the exposure variable. Looking at such associations helps us to identify risk factors for disease.
We can also look at associations between treatment and disease, for example, we may want to know whether platelet count differs between people on active treatment or placebo. In this case ‘treatment group’ is thought of as the exposure variable.
In a study looking at how smoking status affects disease rates what is the exposure variable and what is the outcome variable?
Smoking status is the exposure variable and disease is the outcome variable.
In a study looking at whether platelet count differs between people on active treatment or placebo what is considered to be the exposure variable and what is considered to be the outcome variable?
The outcome variable is platelet count and the exposure variable is treatment group.
What is the exposure variable sometimes referred to as?
The explanatory variable.
What is the outcome variable sometimes referred to as?
The response variable.
Describe the steps you would take in the descriptive comparison of a continuous outcome variable - for example ‘amongst survivors of lung cancer, does lung function differ between men and women?’
1) . Determine whether the outcome variable is normally distributed by drawing a histogram of either the whole sample or the two groups separately
2) . Compute summary measures for the two groups separately. If the outcome variable is normally distributed you will need to compute the mean and standard deviation in the two groups. If the outcome is not normally distributed then the 2 medians and their interquartile range can be represented.
3) . Compute a measure of effect. We now want some feel of how much higher one measure of an outcome variable is compared to another (i.e. some measure of effect). For normally distributed the measure of effect is the difference in means. This will normally be the mean in the exposed group minus the mean in the unexposed group.
We can calculate a 95% CI around the mean difference which tells us where the ‘true mean difference’ is likely to lie.
For non normal data we compute the difference in medians. The difference in medians is the median in the exposed group minus the median in the unexposed group. 95% confidence intervals around medians or difference in medians cannot be easily computed.
A confidence interval for the mean difference which spans 0 will always give a non-significant result.
How would we go about descriptive comparison of a continuous outcome variable if we have more than 2 exposure groups?
1) . First we must decide which group is going to be thought of as the unexposed group. This is often known as the reference or baseline group, and it is the group against which the other groups will be compared. If the exposure variable is ordered then we usually choose the lowest or highest group. It is also best avoiding groups with only a small number of people in.
2) . The mean difference (and 95% CI) is then calculated for each group (other than the reference group) in the usual way.
With more than 2 groups, the 95% CIs cannot be used to assess whether there is likely to be a true overall association or not. An appropriate hypothesis test would need to be carried out.
If an outcome is normally distributed what summary measures would you use?
The mean and standard deviation.
Is an outcome is not normally distributed what summary variables would you use?
The median and interquartile range.
How would you calculate the difference in means between two continuous outcome variables?
Difference in means = mean in the exposed group - mean in unexposed
How could we calculate where a true mean difference is likely to lie?
Calculate a 95% confidence interval for the mean difference.
What measure of effect between groups would be use for normally distributed data?
The difference in means.
What measure of effect between groups would we use for non- normally distributed data?
The difference in medians.
How would you go about descriptive comparison of a binary outcome variable? (E.g. does the proportion of people with disease differ between smokers and non-smokers)
1). Compute the descriptive statistics (proportions/percentages). For cross-sectional and cohort studies we are interested in whether there is a difference in risk of developing disease in each of the exposure groups. This data can therefore be summarised by computing the percentage or proportion with disease in the exposed or unexposed groups separately. The proportion with disease in a cross-sectional study can be thought of as the risk of disease.
To calculate the percentage or proportion with disease in the exposed and unexposed groups we can create a cross-tabulation which shows how many diseased and disease-free individuals there are for smokers and non-smokers separately, as well as for the whole sample.
Risk in whole sample population = number of people with disease / total number of people in study population
Risk in exposed group = diseased in exposed group / total number in exposed group
This would not work for a case-controlled study because in case-controlled studies the number of individuals with and without disease is fixed by the investigators, rather than relying on the prevalence and incidence of a disease. Instead, you should present the proportion of each exposure group within each outcome group (row percentages).
2). We then need to compute a measure of effect. There are three common measures of effect used for a binary outcome, risk difference, risk ratio or odds ratio.