EAB - Study Design and Summarising Data Flashcards

1
Q

What is an RCT?

What is their benefit?

A

A randomized controlled trial (RCT) is an intervention study where subjects are randomly allocated to treatment options.
Randomized controlled trials (RCTs) are the accepted ‘gold standard’ of individual research studies.

They provide sound evidence about treatment efficacy which is only bettered when several RCTs are pooled in a meta-analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the benefit of randomizing?

A
  • randomization ensures that the subjects’ characteristics do not affect which treatment they receive - the allocation to treatment is unbiased
  • in this way, the treatment groups are balancedby subject characteristics in the long run and differences between the groups in the trial outcome can be attributed as being caused by the treatments alone
  • this provides a fair test of efficacy for the treatments which is not confounded by patient characteristics
  • randomisation makes blindness possible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an observational study?

A

In observational studies the subjects receive no additional intervention beyond what would normally constitute usual care.

Subjects are therefore observed in their natural state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a case-control study?

A

This study investigates causes of disease, or factors associated with a condition.

It starts with the disease (or condition) of interest and selects patients with that disease for inclusion, the ‘cases’.

A comparison group without the disease is then selected, ‘controls’, and cases and controls are compared to identify possible causal factors.

Case-control studies are usually retrospective in that the data relating to risk factors are collected after the disease has been identified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some limitations of a case-control study?

A

The choice of control group affects the comparisons between cases and controls.

Exposure to risk factor data is usually collected retrospectively and may be incomplete, inaccurate or biased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a cohort study?

A

A cohort study is an observational study that aims to investigate causes of disease or factors related to a condition but, unlike a case-control study, it is longitudinal and starts with an unselected group of individuals who are followed up for a set period of time.

Cohort studies are sometimes used to confirm the findings of case-control studies such as happened when Doll and Hill observed a relationship between smoking and lung cancer in a case-control study and subsequently established the longitudinal study of doctors in the UK.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some limitations with a cohort study?

A
  • A large number subjects is needed to obtain enough individuals who get the disease or condition, particularly if it is uncommon.
  • The length of follow up may be substantial to get enough diseased individuals and so the cohort study is not feasible for rare diseases.
  • There is difficulty in maintaining contact with subjects, particularly if the follow-up is lengthy.
  • The resources required may be very high.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a cross-sectional study?

A

In a cross-sectional study a sample is chosen and data on each individual is collected at one point in time.

Note that this may not be exactly the same time point for each subject – for example a survey of primary care consultations may be conducted over a week - each patient will fill in the survey once but different subjects will fill out their survey on different days depending on when they came to the surgery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When would you use a cross-sectional study?

A
  • Surveys of prevalence, such as a survey to ascertain the prevalence of asthma
  • Surveys of attitudes or views, such as studies of patient satisfaction, patient/professional knowledge; studies of behaviour such as alcohol use, sexual behaviour etc
  • When inter-relationships between variables are of interest, for example a study to determine the characteristics of heavy drinkers where a cross-sectional study allows comparisons by sex, age and so on
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why would we summarise data?

A
  • Data quality monitoring
  • Data checking and data cleaning
  • Baseline data in a study
  • Before doing a complex analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the definition of quantitative data, and what are the two types?

A

Quantitative data are data which can be measured numerically and may be continuous or discrete:

  • Continuous data lie on a continuum and so can take any value between 2 limits. The only limitation is that imposed by the accuracy of the method of measurement so that some continuous data may be recorded as integers although that is an approximation to the true value.
  • Discrete data do not lie on a continuum and can only take certain values, usually counts (integers).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is ordinal data?

A

The data values can be arranged in a numerical order from the smallest to the largest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is categorical data?

A

Categorical data are data where individuals fall into a number of separate categories or classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is dichotomous data?

A

This is where there are only 2 classes and all individuals fall into one or other of the classes.

These data are also known as binary data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the problem with dichotomizing data?

A

Dichotomizing (re-categorizing data into two groups) is potentially very problematic because a great deal of information is discarded and statistical power is lost in the analysis.

In addition, the nature of any relationships may be masked.

For example, if the relationship was curved, this may be weaker if the data were categorized and if the relationship was U-shaped, categorization may totally obscure it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

List the common statistical measures of center of data.

A
  • mean

- median

17
Q

List the common statistical measures of variability of data.

A
  • standard deviation (variance)
  • range (minimum, maximum)
  • interquartile range
18
Q

How would you calculate the mean?

A

This is the simple average of all the data: the sum of all values divided by the total number of values. This mean is known as the arithmetic mean.

19
Q

How do you calculate the median?

A

This is the middle value when the data are arranged in ascending order of size.

If there are an odd number of values in the sample then the median will be the value with the same number of values both bigger than it and smaller than it. If there is an even number of values, there will be two middle values and the median will be the mean of the two.

20
Q

How do you calculate the standard deviation, and what does it indicate?

A

It indicates how dispersed the data are and is a measure of the average difference between the mean and each data value.

It is calculated by taking the square root of the variance.

The variance is calculated by summing the squared differences between the overall mean and each value and then dividing by the number of values minus one.

21
Q

What is the advantage of standard deviation over the variance?

A

The advantage of the standard deviation over the variance is that it is in the same units as the original data and so is easier to interpret.

22
Q

How do you calculate the range?

A

This is the difference between the smallest and largest value and is usually expressed as the minimum and maximum.

23
Q

How do you calculate the interquartile range?

A

This is the range of values that includes the middle 50% of values and is bounded by the lower and upper quartile.

The lower quartile is found by ranking the data as for the median and then taking the value below which 25% of the data sit.

The upper quartile is the value above which the top 25% of data points sit.

24
Q

Describe how different centers of distributions of quantitative data help in choosing summary measure.

A
  • if continuous data with symmetric distribution – use arithmetic mean
  • if continuous data with positively skewed distribution – consider geometric or harmonic mean but be aware that these do not allow zero values.
  • if continuous data with skewed distribution – consider median
  • if discrete data – present median unless the range of data is large enough to make the calculation of a mean sensible
25
Q

How does having the distribution of the continuous data skewed or not help choose the summary measure?

A
  • if the continuous data is not skewed - use standard deviation (also useful to do range in addition to SD)
  • if the continuous data is skewed - consider using interquartile range
26
Q

With summarizing categorical data, how would you summarize unordered categories (nominal data)?

A

These can be summarized using the frequencies in each category together with either the overall proportions or percentages.

The choice of whether to use proportions or percentages is a personal one although percentages are more commonly seen.

The complete set of frequencies is the frequency distribution.

27
Q

With summarizing categorical data, how would you summarize ordered categories (ordinal data)?

A

These can also be summarized by frequencies and percentages as above but in addition we can calculate cumulative frequencies and percentages.

This can be useful to show the percentage below a certain cut-off.

28
Q

What is a histogram?

A

This is a diagram which shows the distribution of the data by plotting the data in rectangles known as ‘bins’ corresponding to categories along the horizontal (x) axis.

The rectangles have heights or areas which are proportional to the frequencies in these categories. The vertical (y) scale is the frequency per interval.

29
Q

What is a box and whisker plot?

A
A box and whisker plot contains five pieces of summary information about the data: 
• median = horizontal line in box
• upper quartile = top edge of the box
• lower quartile = lower edge of box
• maximum = top of ‘whisker’
• minimum = bottom of ‘whisker’
30
Q

When are bar charts used?

A

Graphs can be used to provide visual summaries of categorical data. The two most commonly used are bar charts and pie charts. In a bar chart, each category is given its own bar along the horizontal (x) axis. The height of each bar is proportional to the frequency of observations.

31
Q

When are pie charts used?

A

Pie charts show the distribution of individuals in different categories of a variable where every individual belongs to one and only one category.

In a pie chart, each category is given an area (or slice) of the graph (the pie). The area of each slice is proportional to the frequency of observations within that category and is calculated by dividing the whole pie, 360 degrees, into slices.

Pie charts enable comparison of proportions in different population groups. For