Critical Numbers Flashcards

1
Q

What are the target and sample populations?

A
  • Target population = larger population
  • Sample population = sub-set of target population as we can’t sample everyone
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 main types of bias?

A
  • Sampling bias = individuals more/less likely to be included
  • Recall bias = cannot remember specifics
  • Social-desirability bias = study group tell us incorrect information due to societal pressure
  • Information bias = measurement error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 types of study designs?

A
  • experimental (researcher changed something) vs. observational (researcher has not intervened, just observed)
  • retrospective (look back into past, subject to bias) vs. prospective (collect information at start and follow up over time)
  • individual vs. population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a case-control study? What are its strengths and weaknesses?

A
  • Find people with a disease, look back in time and see whether they were exposed to risk factor in question.
  • Retrospective

Positives

  • Works well for investigating rare outcomes
  • Relatively fast/cheap (no follow up)
  • Few ethical considerations

Negatives

  • Cannot prove causation/eliminate confounders
  • Can be difficult to establish order of events
  • Possibilities for bias (recall bias)
  • Can only investigate a single disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a cross-sectional study? What are its strengths and weaknesses?

A

•Take a sample, see who has the disease right then and there

Positives

  • Relatively fast/cheap (no follow up)
  • Few ethical considerations
  • Generates hypotheses

Negatives

  • Cannot prove causation/eliminate confounders
  • Less suitable for rare disease
  • Difficult to get an understanding of order of events
  • Sample bias (when you do the study)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a cohort study? What are its weaknesses?

A
  • Collect information from a sample, some with exposure, some do not (none should have the outcome).
  • Follow-up over time and see if there is a link between exposure and outcome
  • Prospective

Positives

  • Few ethical considerations
  • Clarity on event sequence

Negatives

  • Cannot prove causation/eliminate confounders
  • Not suitable for rare disease or when disease takes a long time to develop
  • Time consuming/expensive
  • Difficulty following up
  • Patients can change behaviours in the cohort
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is randomised controlled trial (RCT)? What are its strengths and weaknesses?

A
  • Multiple groups (referred to as arms), give each different exposures and compare outcomes.
  • We can balance arms by matching, randomising, cross-over, placebos, blinding

Positives

  • Considered gold standard, can prove causation by eliminating confounders…
  • Particularly with extensions (cross-over trial)
  • Random (less bias)

Negatives

  • Not suitable for rare outcome or when outcome takes a long time to develop
  • Time consuming/expensive
  • Often unethical
  • Issues with people following up, compliance etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a crossover trial? What are its weaknesses?

A
  • Extension of RCT. Everyone has all arms of trial
  • Weaknesses = more technical analyses, not always suitable (even if standard RCT is)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an ecological study? What are its strengths and weaknesses?

A
  • Massive sample (if not whole population) by looking at data previously collected to look at prevalence, trends and correlation
  • Look at populations, not individuals

Positives

  • Fast/cheap
  • Very large sample (small standard error)
  • Easy to do
  • Good first step to generate hypothesis

Negatives

  • You do not know how data was collected (variation/bias)
  • Often absent/inconsistent/incorrect data – variation in diagnosing criteria
  • Cannot prove causation

Ecological fallacy – where there is a correlation between predictor and outcome, but this does not mean causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain what the ecological fallacy is.

A

The ecological fallacy occurs when you make conclusions about individuals based only on analyses of group data. Just because two things are linked, it doesn’t imply a causal relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a sample? What is the difference between the target population and the sample population?

A

•A sample is a group we are using to represent the population.

–Target population – the population the sample represents

–Sample population – the people whom data is collected

The sample population should generalise the target population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the five main types of sampling?

A
  • Random sampling - random number generator, “draw a name out of a hat”. Usually preferred way of sampling
  • Systematic sampling - count of the list and every “k”th element is taken
  • Convenience sampling - The first people who approach you are used. Easiest technique but likely the worst
  • Cluster sampling:

●Divide the population into groups, usually geographically

●Each group is called a cluster, or block

●Clusters are randomly selected, each element in the selected cluster used

  • Stratified sampling:

●Divide the population into groups/strata, based not on geography, but some characteristic, e.g. males and females

●A sample is taken from each of these strata using either random, systematic or convenience sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 6 main types of bias?

A
  • Sampling bias – sample is not representative of the target population (encompasses other forms of bias)
  • Recall bias – people fail to remember specifics innocently
  • Social-desirability bias – incorrect information is given due to societal pressure
  • Information bias – where data is consistently measured wrong (may be referred to as observation bias)
  • Volunteer bias – volunteers for a study often aren’t representative
  • Produced bias – subjects in different arms are treated differently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 3 other forms of bias that are related to screening?

A

•Selection bias - people who sign up for screening programmes are not representative of the whole population (higher/lower risk)

–E.g. women in higher socioeconomic groups more likely to attend cervical cancer screening, who are at lower risk

•Lead time bias - screening improves survival length?

–Maybe no, just detected illness sooner

•Length-time bias - screening improves survival length?

–Maybe no, people who survive longer are more likely to be picked up by screening

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a confounding factor?

A
  • A confounding factor has to be related to the outcome, and the characteristic of interest (exposure)
  • Examples:
  • There is a high rate of lung cancer among those who have breath mints. Confounder is smoking.
  • There are high rates of cancer among people in care homes. Confounder is old age.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which type of study design below would be best to investigate the following;

“Identify patients who have had previous MIs and compare their diet, smoking habits and exercise activity with people that are similar but have not suffered previous MI?”

A. Ecological study

B. Cohort study

C. Cross sectional study

D. Longitudinal study

E. Case control study

A

Answer: E, take the group of individuals who have had heart attacks and look back in time to investigate diet smoking habits and exercise, then do the same for the group who have not suffered heart attacks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A research group wants to estimate the UK national prevalence of coeliac disease. What study design would be most appropriate?

A. Cohort

B. Randomised control trial

C. Cross sectional

D. Longitudinal

E. Case control

A

Answer: C – Cross sectional, gives a snapshot of prevalence with no reference to time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

To calculate the average 100m sprint time, a research group advertises they are looking for participants to run around a track and select participants by a convenience sample.

Which type of bias is this sampling method subject to?

A. Volunteer bias

B. Recall bias

C. Lead time bias

D. Misclassification bias

A

Answer: A – people who enjoy exercise are more likely to volunteer for the study than those who are not, thus the sample is not entirely representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

To draw a sample from primary school children, researchers line children up and count off the children 1, 2, 3, 1, 2, 3… placing them into 3 different groups

Which term best describes this form of sampling?

A. Random sampling

B. Systematic sampling

C. Convenience sampling

D. Cluster sampling

E. Stratified sampling

A

B. Count of the list and every “k”th element is taken, here K is 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A study is designed to examine the relationship between blood pressure (BP) and occupation group. If age is a confounder, then:

  • A. Age is linked to diet, and diet affects BP
  • B. Different occupations will have different ages, BP will change with age
  • C. Younger and older people have the same occupations
  • D. Different occupations have similar age groups
A

B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which of the following is true of observational studies?

  • A. They can have retrospective studies
  • B. They are always shorter than an experimental study
  • C. They are more powerful than experimental studies
  • D. Participants must be randomised into groups before analysing the data
A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

University students with insomnia were randomly assigned with a simple randomisation (flip of a coin) to receive either CBT or usual care. This randomisation method ensures:

  • A. Each student has an equal chance of being in any treatment group
  • B. The student is unaware of the treatment group to which they are assigned
  • C. The same number of students will be allocated to each group
  • D. Although individuals receive different treatments, each student will be allocated to the treatment most likely to benefit them
A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Sampling bias occurs when:

  • A. Certain individuals are more likely to be included in the study than others
  • B. 40% of the original sample drop-out of the study at random
  • C. Observational studies are used instead of RCTs
  • D. The researcher is not blinded within the study
A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In a cohort study:

  • A. It is possible to look at a range of outcomes
  • B. We use a snapshot of time
  • C. We can examine very rare diseases
  • D. We don’t worry about confounding
A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A case-control study:

  • A. Can often suffer from a loss to follow-up
  • B. Is the type of study where individuals are initially selected on the basis of their exposure status, not their outcome
  • C. Has an advantage as it allows researchers to look at a range of outcomes
  • D. Can often suffer from recall bias
A

D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Match the study design with the most appropriate description.
- A. Cohort study

  • B. Case-control study
  • C. Ecological study
  • D. Cross-sectional study
    1. Carries out in a snap-shot of time without follow up of subjects or looking back in time
    2. Collect information now and follow subjects up over time to explore outcomes
    3. Collect information on an outcome now, and look back in time to see when exposures were experienced
    1. Information on groups of individuals (e.g. countries) rather than individual level data
A

1 = D

2 = A

3 = B

4 = C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the five steps of Evidence Based Medicine?

A
  • Asking focused questions (use PICO questions: population, intervention, comparator + outcome)
  • Finding the evidence
  • Critical appraisal (how valid + reliable? Don’t take everything at face value)
  • Making a decision
  • Evaluating performance

Before any of this, make sure the answer isn’t already out there.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is PICO?

A

A way of generating a research question.

  • P – patient of population
  • I – intervention or indicator (exposure, treatment or procedure)
  • C – comparison or control (a group compared against the intervention)
  • O – Outcome (end-point of interest)

An example of a research question formed with PICO:

“Is living alone (I) more likely to cause clinical depression (O) in adults aged 20-40 (P) compared to individuals living with 1 or more people (C)?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a variable? What are the two categories of variables?

A

Variable = quantitative measure of something that varies. Categoric = individuals fall into one of several categories. Numeric = variable measured on a numerical scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the different types of categoric variables?

A
  • Binary = only 2 categories (yes/no)
  • Ordinal = >2 categories, ordering (low/medium/high)
  • Nominal = >2 categories, no ordering (hair colour)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are the different types of numeric variables?

A
  • Discrete = distinct number of values, e.g. age in years
  • Continuous = any value within a particular range, e.g. blood pressure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How can we display categorical data?

A
  • Categorical data (nominal, ordinal & binary) is normally summarised in terms of frequency.
  • For this reason, bar charts and pie charts are commonly used.
  • Discrete numerical data can also be displayed via bar and pie charts if it is appropriate to do so.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What types of variables are the following?

  1. Weight
  2. Eye colour
  3. Shoe size
  4. Social class
  5. Age
  6. Is Warfarin prescribed?
A
  1. Weight - Continuous
  2. Eye colour - Nominal
  3. Shoe size - Discrete
  4. Social class - Ordinal
  5. Age - Continuous
  6. Is Warfarin prescribed? - Binary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is descriptive statistics? What are the categorical variables used?

A
  • Collection of statistical measures used to describe the data sample we have
  • Probability/risk = outcome number / total (0 to 1)
  • Percentage = 0 to 100
  • Rate = number of times something happens per a quantified (x per 100 people) (0 to infinity)
  • Odds = probability of occurence / probability of non-occurence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

A study is done to compare the survival rates of various treatments for prostate cancer within a cohort of 695 patients. 348 patients were managed via Watchful Waiting (WW). Of these, 31 patients died. 347 patients were managed via Radical Prostatectomy (RP). Of these, 16 patients died. What are the odds of death from prostate cancer with WW and RP?

A

WW odds (using populations) = 31 / (348-31) = 0.097….

RP odds (using risks) = (16/347) / (1 - (16/347)) = 0.048….

We can then use these to find the odds ratio which is calculated in the same way as a risk ratio:

The odds ratio of death from prostate cancer with WW compared with RP = (0.097..)/(0.048..) = 2.02

The odds of death from prostate cancer are 102% higher with WW compared to RP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the absolute risk difference?

A

The difference between two risks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

A study is done to compare the survival rates of various treatments for prostate cancer within a cohort of 695 patients. 348 patients were managed via Watchful Waiting (WW). Of these, 31 patients died. 347 patients were managed via Radical Prostatectomy (RP). Of these, 16 patients died. What is the Absolute Risk difference of death from prostate cancer with WW compared to RP?

A
  • Risk of death in WW group is 31/348 = 0.089
  • Risk of death in RP group is 16/347 = 0.046
  • Therefore the Absolute risk difference is 0.089 - 0.046 = 0.043 = 4.3%
  • Translated in The risk of death from prostate cancer was 4.3% greater with WW than RP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

A zoo has 4 tigers and 10 bears and tests them all for a particular disease due to a recent outbreak. It is discovered that 1/4 tigers and 1/10 bears are carrying the disease. Calculate the risk ratios and odds ratios for both carrying the disease and not carrying the disease.

A
  • The risk ratio of disease in tigers compared to bears is: 1/4 ÷ 1/10 = 2.5
  • The risk ratio of non-disease in tigers compared to bears is: 3/4 ÷ 9/10 = 0.833
  • The odds ratio of disease in tigers compared to bears is: 1/3 ÷ 1/9 = 3
  • The odds ratio of non-disease in tigers compared to bears is: 3/1 ÷ 9/1 = 0.3333 = ⅓
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does 1/ARD give us?

A
  • NNT/H = 1/ARD
  • Number Needed to Treat (NNT) or Harm (NNH) is the number of patients who must (on average) be treated with a specific therapy for one of them to benefit or be detrimentally affected respectively over the other treatment.
  • In the previous example, the ARD was 0.043. The NNT/H is 1/0.043 = 23.25581…. = 24 patients. THE NNT/H MUST ALWAYS BE ROUNDED UP!
  • 24 patients need to be treated with RP over WW to prevent 1 additional death prostate cancer OR 24 patients need to be treated with WW over RP to cause 1 additional death from prostate cancer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Example. Calculate probability, percentage, risk and odds for both Drug A and Placebo.

A

Drug A:

  • Probablility = 31/341 had MI = 0.091
  • Percentage = 9.1%
  • Rate = 9.1 MI’s per 100 people / 91 MI’s per 1000
  • Odds = 31/310 = 0.1

Placebo:

  • Probability = 61/366 had MI = 0.167
  • Percentage = 16.7%
  • Rate = 16.7 MI’s per 100 people / 167 MI’s per 1000
  • Odds = 61/305 = 0.2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

It’s not sufficient to say that ‘one looks more effective’ when comparing statistics. What 3 methods do we use to compare statistics?

A
  • Risk difference, e.g. Placebo - Drug A = 0.076 or 7.6%, so risk with Placebo is 7.6% higher than with Drug A
  • Risk Ratio = Group A/Group B, numerator = focus. 3 potential outcomes: >1, 1, <1. Always compared to 1, e.g. Placebo/Drug A = 0.167/0.091 = 1.835, so risk of MI in Placebo is increased by 0.835 compared to Drug A. Can make Drug A the focus: Drug A/Placebo = 0.091/0.167 = 0.545, so risk of MI in Drug A group decreased by 0.455 compared to Placebo
  • Odds ratio = odds in Group A / odds in Group B. 3 potential outcomes: >1, 1, >1. Always compared to 1, e.g. Placebo/Drug A = 0.2/0.1 = 2, so odds of MI in Placebo increased by 1, or 100% compared to Drug A. Making Drug A the focus: Drug A/Placebo = 0.1/0.2 = 0.5, so odds of MI in Drug A decreased by 0.5, or 50% compared to Drug A
  • RR<
  • Misleading: RR can be 2, but risk difference can be 0.00015, e.g. 30/100000 compared to 15/100000
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How can we display numerical data?

A

Numerical data (mainly continuous) must be displayed using alternative graphs to account for the variable nature of the data.

The main types of graph which are used are:

  • Histograms - histograms are essentially continuous boxplots where a bar covers a range as opposed to 1 singular value
  • Box and Whisker plots - great for comparing a continuous variable between multiple different groups. Also great for summarising continuous data as it is non-normally distributed on a histogram
  • Scatter plots - usually used when displaying 2 continuous variables against each other. Frequently used when assessing correlation and regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Why is the median sometimes better than the mean? What is skewness?

A

If we have outliers, the median gives a better representation. Right skew = outlier lies to right of curve, left skew = outlier lies to left of curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

When is the inter-quartile range especially useful?

A

Especially useful when the data is not normally distributed, i.e. skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are the three main measures of spread?

A
  • Range = largest - smallest
  • Inter-Quartile Range = 75th centile - 25th centile, associated with median, most representative, middle 50%
  • Standard Deviation = measure of how spread out the values are (average distance from the mean). Affected by extreme values, but also is more powerful as uses all of the values. Cannot use it when we have a skewed distribution.
  • Symmetric distribution = mean + standard deviation
  • Non-symmetric dsitribution (skewed) = median + IQR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the standard error? How is calculated?

A
  • The standard error is the standard deviation of all the sample means
  • The standard error (se) is an estimate of the precision of the population parameter estimate that doesn’t require lots of repeated samples. It provides a measure of how far from the true value the sample estimate (usually the mean) is likely to be.
  • The standard error assumes that the data is normally distributed and there is a sufficient sample size
  • Standard error: S / square root of n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the normal distribution?

A
  • Certain numeric variables, when plotted, follow a normal distribution (symmetric). Most people have values in around the mean, and a few extreme, but roughly the same either side. 1 SD either side of mean = 68%, 1.96 SD either side of mean = 95% of sample
  • Explained by two parameters: mean + standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

How can we work out where the bottom 2.5% lies? How about the top 2.5%?

A

If mean = £24,991 + SD = £1,574:

mean - 1.96xSD = 24,991 - (1.96 x 1,574)

= 24,991 - (3,085) = £21,906

Top = mean + 1.96xSD = 24,991 + (1.96 x 1,574)

= 24,991 + (3,085) = £28,076

So between £21,906 and £28,076 is roughly 95% of observed values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

We’ve looked at comparing two categoric variables. How about comparing one numeric and one categoric variable or two numeric variables?

A
  • Differences in means/medians (mean if both are normally distributed, median if any of the groups not normally distributed, e.g. normally distribution and a right skew
  • Comparing two numeric variables = Pearson’s correlation coefficient (r between -1 and +1). +1 = perfect positive linear association, -1 = perfect negative linear association + 0 = no linear relation at all
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

The mean of a large sample size:

  • A. Is the same as the median if distributed symmetrically
  • B. Is greater than the standard deviation
  • C. Is calculated by multiplying all values together
  • D. Is always a reasonable measure of the centre
A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

The inter-quartile range of a set of data represents:

  • A. The range inside which the middle 95% of values lie within
  • B. The range inside which the middle 25% of values lie within
  • C. The range inside which the middle 50% of the values lie within
  • D. The spread around the mean in a skewed variable
A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

The following is a measure of the spread of a distribution:

  • A. Inter-quartile range
  • B. Median
  • C. Mode
  • D. Mean
A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

The mean and median values of the data (159, 165, 170, 175 176) are:

  • A. 176, 169
  • B. 176, 170
  • C. 169, 170
  • D. 176, 176
A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

As the standard deviation of measurements increase:

  • A. The mean gets larger
  • B. We should use the median
  • C. The values become more spread out
  • D. The mean becomes more informative
A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

The 2 values that are 1.96 standard deviations either side of the mean are:

  • A. 99% of the observed sample values
  • B. Where all the observed values are for that variable
  • C. The range of all potential values for that variable
  • D. 95% of the observed sample values
A

D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

The Pearson correlation coefficient:

  • A. Takes values between -1 and +1
  • B. Is always positive
  • C. Is -1 if there is no linear association
  • D. Can only be used for binary variables
A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

When we do a study, is the sample estimate exactly the same in the overall population?

A

No, not exactly. Sample mean is best guess of population mean. If we were to repeatedly resample, all those sample means would be normally distributed (symmetrically) around the true population mean. We can use this idea to infer how precise our sample mean is. This estimate of precision = ‘standard error’

58
Q

What does a smaller standard error mean?

A

Smaller SE = estimate more precise

59
Q

What is the difference between standard deviation and standard error?

A
  • SD relates to how spread out values are in sample we collect data on (descriptive)
  • SE relates to how precise our mean estimate is (inferential)
60
Q

What is a confidence interval?

A
  • Estimate is not likely to be ‘true’ estimate, SE indicates how precise the estimate is. We can use these together to get an idea of what true estimate is, e.g. mean weight for boys was 25kg (95 CI: 20kg to 30kg). This means we are 95% confident that the true mean weight for the boys is between 20kg and 30kg, 5% of the time we will miss the true estimate. All estimates of population values should be presented with a confidence interval.
  • The 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population
61
Q

What are the 3 ways of interpreting a confidence interval?

A
  • If I was to take repeated random samples, 95% of these confidence intervals would contain true population estimate
  • My sample estimate is my best guess, and I am 95% confident that the true population estimate is between these two limits
  • I am 95% confident that any of these values (between these limits) is the true populaion estimate
62
Q

How do we calculate confidence interval?

A

We multiply mean by SE. Example: mean = 24,991, n = 124, SD = 1,574, so SE = 1,574/square root (124) = 141.35. Mean-(1.96xSE) = 24,991 - 277 = £24,714. Mean+(1.96xSE) = 24,991 + 277 =£25,268. Therefore, we’re 95% confident the true mean income for High SES is between £24,714 and £25,268

63
Q

Work out the confidence interval for a sample with mean = 18,477, n = 12 and SD = 3,732

A

SE = 3,732/square root (12) = 1,077.3. Mean - (1.96xSE) = 18,477 - 2,112 = £16,365. Mean + (1.96xSE) = 18,477 + 2,112 = £20,589. Therefore, we are 95% confident the true mean income for Low SES is between £16,365 and £20,589

64
Q

How would our calculation change if we want to change the level of confidence?

A
  • 1.96 (95%) goes to 2.58 if we want 99%
  • For Low SES, mean - (2.58xSE) = 18,477 - 2,779 = £15,698. Mean + (2.58xSE) = £21,256. 95% was £16,365 to £20,589, so 99% confidence interval is wider as we are more confident we have captured the true value
65
Q

Let’s say we perform a sample of 81 people with a mean of 5 and a standard deviation of 4. Please calculate the 95% confidence intervals

A
  • Upper confidence interval: 5 + (1.96 x (4/9)) = 5.87
  • Lower confidence interval: 5 - (1.96 x (4/9)) = 4.12
  • 95% confidence interval = 4.12 - 5.87
66
Q

Why can’t standard errors always be used?

A

Because confidence intervals use assumptions. If data is not normally distributed, we can’t use mean/SD, which means we can’t use SE, which means we can only get confidence intervals through other calculations

67
Q

What are the standard error assumptions?

A
  • Data approximately normally distributed
  • Sufficient sample size (>20 individuals)
  • Worth noting that confidence intervals are used for relative risk, odds ratio etc.
68
Q

What is a hypothesis test?

A
  • Main reason we do hypothesis tests is to try to say whether an difference is significant. We have an idea of what the true population estimate actually is, how comparable is our sample estimate? Main question: is the difference big enough to be important?
  • Example: random sample of how many calories students ate on bonfire night. On average, it was 3000, from previous research, it was 2250 on any given day. Have students consumed more calories (750 more) on bonfire night than on a standard day? Is this big enough to be significant?
69
Q

In every hypothesis test there is a null hypothesis. What is the null hypothesis?

A
  • In majority of cases, it relates to no difference
  • Example: there is no difference between the amount of calories consumed on bonfire night and any other day. Another: there is no difference between the IQ scores of students at UoS and SHU
  • We are going to ‘believe’ this and see if we can change our minds
  • Now we want to say, if our null hypothesis is true, what is the chance we could have seen this much of a difference (750 calories). So if there is a difference of 1000 calories, we will be less inclined to think that our null hypothesis is true
70
Q

What is a p-value?

A
  • A p-value is the probability of obtaining your results or results more extreme, if the null hypothesis is true. The significance level is usually set at 0.05. Thus if the p-value is less than this value we reject the null hypothesis
  • If I observe a value close to my null hypothesis, p-value will be large (closer to 1). On graph, value closer to null will give a bigger area after it = bigger p-value. Therefore, it is likely that any difference observed is due to chance (no signifcant difference between observed + null value). We might conclude that our new drug has no difference in pain compared to old drug. Null hypothesis = true
71
Q

What happens if our observed value is far away from the null hypothesis? What is the relationship between the p-value and significance?

A
  • P-value will be small (closer to 0). It is likely that the difference observed is beyond chance alone (significant difference between observed + null value. We have eveidence to reject the null hypothesis (can never say it’s false unless p = 0)
  • Lower p-value = more significant
  • A good way to remember is low p-value = likely to have evidence to reject null hypothesis
72
Q

If there were 2 studies that had the same mean, same standard deviation, same null hypothesis but one had a larger sample size, would the p-values be different?

A

Study with larger sample size has a smaller SE + a smaller p-value. You have more people but see the same difference, so the differences are less likely to be due to chance

73
Q

For the bonfire example, what would a p-value of 0.0001 mean?

A
  • 95% confidence interval = (2,800, 3,200)
  • Null = 2,250
  • p-value = 0.0001. We would have seen this much of a difference, or more extreme, 1 in 10,000 if the null hypothesis was true (no difference). The value of 3,000 calories is highly significant. The students ate significantly more calories on bonfire night than on a standard day
  • The null value is NOT in the conifdence interval, p-value = significant. If null value was 3,050, p-value would have to be greater than 0.05 because confidence interval is 95% and it lies within
74
Q

What is the test statistic? How can we calculate the p-value?

A
  • A test statistic is calculated using your data (it reduces your data down to a single value). The general formula for a test statistic is:

Test statistic = Observed value - hypothesized value / standard error of the hypothesized value

  • Compare this test statistic to a hypothesized critical value (using a distribution we expect if the null hypothesis is true (e.g. Normal distribution)) to obtain a p-value.
75
Q

What is the relationship between confidence intervals and p-values?

A
  • If null not in 95% CI, p-value less than 0.05
  • If null in 95% CI, p-value greater than 0.05
  • This follows for other confidence intervals, e.g. 99%, null not in 99% CI = p-value less than 0.01
76
Q

What is a one-sample t-test?

A

One continuous variable (mean calories consumed tested against pre-specified value). Multiple test depending on variables, e.g. Student’s t-test and Chi-square

77
Q

What is the difference between statistical and clinical significance?

A

Just because a finding is statistically significant doesn’t necessarily mean it is clinically significant, e.g. would you really introduce a new drug for such a small difference?

78
Q

Mean=163, Median=155, SD=6, SE=1.0.

An approximate 95% confidence interval is:

A. (153, 157)

B. (143, 167)

C. (161, 165)

D. (151, 175)

A

C

79
Q

The width of a confidence interval:

A. Depends on the inter-quartile range

B. Does not matter to clinical judgement

C. Depends on the standard error

D. Does not depend on the sample size

A

C

80
Q

As sample size increases:

A. The median should be used over the mean

B. The standard error is unaffected

C. The standard error decreases

D. The standard error increases

A

C

81
Q

A 99% confidence interval (compared to a 95% confidence interval):

A. Will be wider than a 95% CI

B. Will be narrower than a 95% CI

C. Is less precise than a 95% CI

D. Is more precise than a 95% CI

A

A

82
Q

A p-value of 0.001 means:

A. The null hypothesis is true

B. The study has been successful

C. We would have seen this much of a difference 1 time in 1,000 if the null hypothesis was true

D. The null hypothesis is false

A

C

83
Q

The null hypothesis when comparing two groups:

A. Assumes a clinically important difference between the groups

B. Assumes no difference between the groups

C. Has a null value of 100

D. Is not important when performing a significance test

A

B

84
Q

A p-value of 0.027:

A. Is not far enough below 0.05 to be of interest

B. Shows clinical significance

C. Is non-significant

D. Is more significant than p=0.03

A

D

85
Q

Statistical significance:

A. Is quantified via a p-value

B. Does not relate to confidence intervals

C. Implies any difference is causal

D. Implies clinical significance

A

A

86
Q

A p-value:

A. Adds more context than a confidence interval

B. Lies between -1 and +1

C. Is the probability of seeing the difference you have if the null was true

D. Is the probability the null hypothesis is true

A

C

87
Q

If a p-value is 0.03 (tested against a hypothesised value of 10), then the 95% confidence interval could NOT be:

A. (9, 11)

B. (5, 15)

C. (11, 15)

D. (10, 11)

A

C

88
Q

What r value would you give for the second graph?

A

0.9 - correlation tells us strength of a line, but not which line, e.g. two lines on r = 0.9 but different gradients. Correlations are advanced by REGRESSION

89
Q

What equation do we fit regression into?

A
  • y = a + bx
  • y and x = variables in dataset
  • y variable is the outcome or the dependent variable (what we can ‘change’), e.g. BP reading
  • x variable is the predictor (variable we are using to estimate) or the independent variable (can’t change), e.g. family history
  • a and b are numbers that explain the relationship between these variables. a = intercept (value of y when x = 0) b = co-efficient (change in y when we increase x by 1 unit)
  • In most scenarios, b is the main interest
90
Q

If we increase x by 1 (x + 1), what would the new equations and hence difference be?

A
  • y = a + b (x + 1)
  • y = a + bx + b
  • Therefore, difference between y = a + bx and y = a + bx + b is b
91
Q

What can we infer from the b-value for this graph?

A

If a baby increases temperature by 1 degree, their predicted uninterrupted sleep decreases by 6 hours. y = 224.82 - 5.97x

92
Q

Simple linear regression using y = mx + c:

Birth weight (g) = 7.43*Mother’s weight (kg) + 2370

If a mother weighs 70kg, how heavy do we expect her baby to be?

If a baby weighed 3.1kg, what would be the expected weight of the mother?

A
  • 2890.1g
  • 98.25kg
93
Q

Do we still get 95% confidence intervals and p-values for regression?

A

Yes. 95% CI (-8.67, -3.27), so 95% confident that if we increase x by 1, y will decrease by between these values. Null value would have to be 0. P-value<0.001, so significant relationship between temperature + sleep

94
Q

What type of regression is this?

A

This has a single continuous variable (sleep time), so is linear regression

95
Q

What is multiple regression?

A
  • Incorporating additional values, e.g. other factors that affect sleep time such as birthweight, gender, geographical location.
  • We can incorporate these different factors into a regression model as additional predictors:

y = a + b1*x1 + b2*x2 + b3*x3… y and a don’t change, but we have multiple b and x’s

  • For example, including birthweight in a multiple regression model irons out any differences in birthweight so we can see the relationship between sleep + temperature clearer
  • If something could bias the results, this is where it is included in the analysis
  • Interpretation remains the same, but add on, ‘… after adjusting for other variables in the model’
96
Q

How is multiple regression presented?

A

In a table. In this example:

  • there is no significant relationship between birthweight and sleep after adjusting for temperature values
  • there is a significant relationship between temperature and sleep after adjusting for birthweight
  • accounting for differences in birthweight has not changed our interpretation (although our estimate has changed slightly, -5.97 to -5.46)
97
Q

Regression isn’t restricted to continuous variables. What happens when we have categorical variables?

A
  • We always have a ‘reference’ category. The coefficient estimate is in comparison to that reference category. For example: north, midlands and south, this would likely be linked to temperature. Pick north as reference value. Note that South has p-value of 0.051, so is borderline. Can’t just say that it’s not statistically significant. These is a significant difference between babies based in the South compared to the North (Southern babies sleep significantly longer)
  • There is no significant difference between babies born in the Midland and the North (after accounting for other factors in the model)
98
Q

The temperature estimate has changed. Is it still significant?

A

Yes. We are including more factors that explain the relationship with sleep, so the strength of temperauture is likely to diminish. However, it is still significantly associated

99
Q

What are some questions we can ask about the study design and descriptive statistics?

A
  • Study design: Who is in their study?

Who could have been missed?

Was any group over sampled?

Inclusion/exclusion criteria?
Is their research question clear?

  • Descriptive statistics: Summarised their data appropriately?
    Normally distributed data?
    Mean with SD?
100
Q

What are some questions we could ask about inferential statistics and graphs?

A
  • Inferential statistics: Have they presented a p-value?

Or a confidence interval?

Or both?
Did they mention about examining for normality and choose an appropriate test?

  • Graphs: Suitable graph selection?

Improved their message?

101
Q

What is AXIS?

A

A critical appraisal tool to assess different key aspects of an article. Other critical appraisal tools generate a score, AXIS has 20 questions (no scoring system)

102
Q

If a paper is reporting data collection, what should be shown?

A
  • A flow diagram. Presents in a digestible format the number of people approached, the number who withdrew (and why) + the final number that were analysed
103
Q

What are non-parametric tests?

A

Alternative significance tests which do not require assumptions to be met in order to be performed (any sample size, distribution etc.) Parametric tests require assumptions to be met otherwise they aren’t accurate. Non-parametric can be used anytime but not as powerful

104
Q

In a linear regression with 1 predictor variable, a coefficient of 1.5 means:

A. Increasing that predictor by 1 unit increases the outcome by 1.5 units

B. The outcome is 1.5 times larger than the predictor

C. Increasing that predictor by 1 unit decreases the outcome by 1.5 units

D. The outcome is increased by 1.5% for an increases in the predictor

A

A

105
Q

A regression model with more than one predictor is called:

A. A univariable model

B. A simple model

C. A multivariable model

D. Logistic regression

A

C

106
Q

The difference between coefficients in a single and multiple regression model is:

A. Coefficients in a single model are more informative

B. Coefficients in a single model always show smaller effects

C. Coefficients in a multiple model have taken account of background factors

D. Coefficients in a multiple models do not require p-values

A

C

107
Q

Linear regression requires:

A. A binary outcome variable

B. More than one predictor

C. At least 100 individuals

D. A numeric outcome variable

A

D

108
Q

Regression models:

A. Remove the effects of confounding

B. Explore how a particular drug influences outcome

C. Can be used to create predictive models

D. All of the answers are valid

A

D

109
Q

Multivariable regression models can:

A. Make the interpretation easier

B. Account for a small sample

C. Remove sampling bias

D. Adjust for confounding variables

A

D

110
Q

For a sample to give an unbiased estimate it must be:

A. Collected through a trial

B. Normally distributed

C. Large

D. Random

A

D

111
Q

As the standard deviation of measurements increases:

A. The mean increases

B. Larger samples are needed to make the same inferences

C. The values become less spread out

D. The variance decreases

A

B

112
Q

A p-value from a 2-sample t-test (comparing two means) is 0.03, the 95% Confidence Interval could NOT be:

A. (-8, -2)

B. (28, 54)

C. (-1, 2)

D. (0.001, 0.002)

A

C

113
Q

Investigatomng a rare disease would be best explored using what type of study design:

A. Cross-sectional

B. Cohort

C. Case-control

D. Ecological

A

C

114
Q

A potential confounder in the relationship between hypertension and osteoarthritis is:

A. Age

B. Alcohol consumption

C. Height

D. Smoking

A

A

115
Q

Observational studies:

A. Cannot be randomised

B. Are always larger

C. Can never be useful

D. Must be blinded

A

A

116
Q

The width of a confidence interval:

A. Does not depend on the sample size

B. Is decreased the more confident we want to be

C. Can sometimes be too wide to be of any clinical use

D. Depends on the median

A

C

117
Q

A study finds a RR of 0.5 for incidence of depression for those taking vitamin D (1%) vs. not (2%):

A. A 50% increase in the risk of depression for vitamin D takers

B. The difference is big enough that confounding must not be present

C. Results are not statistically significant

D. Absolute risk is 1%

A

D

118
Q

A group of 100 children undergo IQ testing. Their mean IQ is 95, SD = 5. Measurements are approximately:
A. 95% of these children will have IQ values between 85 and 105

B. Approximately 5 of the children will have IQ scores below 85 or above 105

C. All answers are correct

D. Median IQ will be roughly 95

A

C

119
Q

Which of the follwing is true for an Odds Ratio:

A. Evidence of a causal relationship if 95% CI does not include 1

B. Is the same as Relative Risk (RR)

C. Compared against 0 (as 0 indicates no difference)

D. Derived from observed number in a 2x2 table

A

D

120
Q

Outline 2 ways in which a RCT is advantageous compared to a cross-sectional study.

A

One way in which a RCT is advantageous compared to a cross-sectional study is that you can establish whether a cause-effect relationship exists between an intervention and an outcome. RCTs are also more advantageous becuase you can control confounding factors by balancing the groups.

121
Q

State two reasons why a RCT can’t always be used.

A

Ethical reasons, e.g. can’t force someone to smoke. Expensive.

122
Q

The overall systolic BP for everyone in the sample had a mean of 86.5, and a median of 92. The authors predominantly reported the mean, and presented their findings with standard deviations to accompany this.

Discuss their decision and what impact this might have on your conclusions of their findings.

A

The median and mean aren’t similar, so the data is skewed and can’t be normally distributed. Instead of using the mean and confidence interval, the median and inter-quartile range should be used

123
Q

What is the main benefit of using a multivariable regression model?

A

Takes confounding factors into account, e.g. age and gender. Gives you a better picture of the independent variable that you are interested in

124
Q

State a reason why the p-value may change in the multivariabale model compared to the univariable model.

A

Including other variables reduces significance of a single exposure on outcome correlation.

125
Q

Explain why statistical significance is not the same as clinical significance.

A

Statistically significant doesn’t mean we can apply the results to our patients. Statistical significance gives a number, but this may only be a small improvement on what is already known, so it may not be economically viable to change the way something is done for a small difference.

126
Q

List two advantages and two disadvantages of a case-control study design.

A

Advantages: can be used to investigate rare diseases with long latency period between exposure and disease manifestation, cheap + quick

Disadvantages: subject to recall bias, difficult to establish order of events

127
Q

What is the difference between the dependent and independent variable?

A
  • An independent variable is a variable that can be altered in a study.
  • A dependent variable is a variable that is dependent on the independent variables or one that cannot be altered.
128
Q

What is sensitivity?

A
  • The sensitivity of a test is the proportion of people who test positive among all those who actually have the disease
129
Q

What is specificity?

A

The specificity of a test is the proportion of people who test negative among all those who actually do not have that disease

130
Q

What is positive predicted value (PPV)?

A

The positive predictive value is the probability that following a positive test result, that individual will truly have that specific disease

131
Q

What is the negative predicted value (NPV)?

A

The negative predictive value is the probability that following a negative test result, that individual will truly not have that specific disease

132
Q

What is accuracy? How do we work it out?

A
  • Test accuracy reveals what proportion of true results were revealed by the test
  • Accuracy = TP + TN / TP + TN + FP + FN
133
Q

What is prevalence?

A
  • The prevalence assesses the proportion of people within the community with the disease
  • Prevalence = TP + FN / TP + TN + FP + FN
134
Q

What is logistic regression?

A
  • Logistic regression = used when your outcome variable (y) is a binary variable:

logit(p) = a + bx

a shows where probability increases

b shows how fast it increases

135
Q
A
136
Q
A
137
Q
A
138
Q
A
139
Q
A
140
Q
A