Applied Statistics Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are the characteristics of the question used to test the hypothesis

A

PICOT

Patients or Population
Intervention(s) or Exposure(s)
Comparator
Outcome
Time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 5 fundamental types of clinical questions

A
  1. Therapy
  2. Harm
  3. Differential diagnosis
  4. Diagnosis
  5. Prognosis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Summarise and classify the different types of study designs

A

EXPERIMENTAL

  1. RCT
  2. Pseudo - / Quasi - RCT
  3. Non-RCT

OBSERVATIONAL

  • Descriptive
  • Analytical
  • –> Cohort
  • –> Cross-sectional
  • –> Case-control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of studies are the lowest level of evidence and what are these studies used for. What are its advantages and disadvantages

A

Animal studies

  • lowest level of evidence
  • Used as hypothesis generating studies

ADV

  • Cheaper
  • Adequate physiological/metabolic surrogate
  • Limits human suffering due to experimentation

DISADV

  • Metabolic pathways / pharmacokinetics differ
  • Young/no comorbidities
  • Defects of methodology (less rigorous / slowly manifesting effects)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Distinguish

  1. Ecological study
  2. Case report/series
  3. Cross sectional surveys
  4. Case-controlled studies
  5. Cohort studies
  6. Randomised Controlled Trials
  7. Systematic review
  8. Meta-analysis
A
  1. Ecological
    - Observational
    - Retrospective
    - Looks at occurrence and associations in groups
  2. Case report/series
    - Observational
    - Descriptive
    - No control group
  3. Cross sectional surveys (snap-shot)
    - Observational
    - Descriptive / Analytical / Diagnostic
    - Large series of case reports
    - No control group
  4. Case-controlled studies
    - Observational
    - Retrospective
    - Historical controls used
    - Essentially: choose a group with a shared feature and compare it to another group without that feature.
    - Uses Odd’s ratio to quantify risk
  5. Cohort Studies
    - Observational
    - Longitudinal: Retrospective, Concurrent, Prospective
    - Observes exposure, then observes development of disease
    - Observes identical control group without exposure
    - Uses relative risk to quantify risk
  6. Randomised Controlled Trial
    - Experimental
    - Randomised
    - Prospective
    - Interventional
    - Analytical
  7. Systematic review
    - Answers a defined research question by collecting and summarising all empirical evidence that fits pre-specified eligibility criteria
  8. Meta-analysis
    - Use of statistical methods to summarise the results of these studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are cross-over trials?

A

Each patient acts as their own control
Patients ‘cross-over’ from one treatment to the next following a ‘washout period’ between treatments
There is usually randomization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are self controlled studies

A

Each patient is their own control

Post treatment measurements in each patient are compared to pre-treatment measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

With regards to data collection (sampling), What two principles are paramount

A
  1. Internal validity
    - Sampling should be free from selection bias
  2. External validity
    - Sample should represent broader real-world population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe and define 4 sampling strategies

A
  1. Simple random - Everyone has equal chance of being picked
  2. Stratified random - Divide into subgroups 1st then random selection
  3. Clustered random - Treat people as groups (School vs. ICU)
  4. Convenience sample - non-random selection: just as they come
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does randomisation mean

A

A representative sample can be chosen by RANDOM sampling, whereby each person is equally likely to be selected.

It means that no systematic bias is introduced and the samples selected should be representative of the populations of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the CONSORT or STROBE diagram

A

STROBE - STrengthening the Reporting of OBservational studies in Epidemiology

CONSORT - CONsolidated Standards of Reporting Trials

Figure 1 in any published study –> total number of patients eligible vs total number of patients included. If number included is low vs number eligible then this is strongly suspicious that the subset is biased , either through who is in the study, or who declined to participate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is sampling error

A

If a study is repeated different sample chosen with slightly different characteristics and as such the result will differ slightly.

Sampling error becomes smaller as the sample size increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between a parameter and a statistic

A

A parameter refers to a property of a population

A statistic refers to a property of a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the conventional symbols for the mean and standard deviation of a population vs a sample

A

Population

  • Mean: mu
  • SD : sigma

Sample

  • Mean: X-bar (x with a bar on top)
  • SD: S
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a histogram. List and describes the 3 main shapes of this entity

A

This is a graph that gives an indication of the distribution of data.

  1. Normally distributed = Gaussian = Bell shaped
  2. Left skew = long left tail = negative skew
  3. Right skew = long right tail = positive skew
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

On what types of data can parametric tests be used

A

Normally distributed data

(This includes log transformed right skewed data –> Gaussian)

Unfortunately, Left skewed data cannot be transformed easily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of a histogram

A

To show the frequency and shape of continuous data.

Determining whether the data is normally distributed (or can be transformed to normality) allows for the use of parametric tests in data analysis.

Shows:

  1. Gaps
  2. Outliers
  3. Skewed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the kurtosis of the data

A

This refers to the flat or pointed nature of the distribution of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you calculate a 95% confidence interval and why is this necessary

A

95% CI = Mean ± 2SD

Used to determine if the data presented is plausible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the indication for a Box and Whisker Plot

A

To graphically represent the median and interquartile range in non-normally distributed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe the data organisation of a box and whisker plot

A

Median - thick horizontal line within the box
Length of box represents the interquartile range (25% –> 75%)
Whiskers represent range
Outliers shown when they are more than 3 box lengths from the upper or lower end of the box

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the most common transformation of non-normally distributed data?

A

Log transformation of positively skewed (right skewed) data. Creates a normal distribution curve for which parametric tests during data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the purpose of scatterplots

A

To provide a visual representation of the relationship between two variables

  1. Strength of the relationship
  2. Degree of linearity
  3. Association positive or negative
  4. Presence of outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the indication to use a scatterplot

A

To understand the nature of the relationship between two continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the correlation coefficient

A

Numerical value depicting the correlation between two continuous variables:

Expresses both the magnitude (0- 1) and the direction of the correlation (positive or negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the coefficient of determination and how is it calculated. Give an example

A

Coefficient of determination is the square of the correlation coefficient. If r = 0.7 then coefficient of determination = 0.49.

0.49 means that 49% of the variation can be explained by the two variables and 51% is due to other factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the criteria required for causation

A
  1. Causative occurrence must precede the effect
  2. If cause occurs then effect should occur
  3. If cause does not occur then effect should not occur

Correlation does not imply causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the limitations of correlation and scatterplots

A
  1. Correlation does not imply causation

2. Lack of correlation does not mean that the variables are not correlated in a non-linear way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the difference between ordinal and continuous data

A

Ordinal data is categorical data with a set order it. The interval between categorical data is not known

Continuous data is not categorical and exists on an increasing or decreasing scale with known interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the different types of correlation coefficients

A

Pearson’s (r) correlation coefficient
- Plots two continuous variables

Spearman’s (rho) correlation coefficient
- Plots two ordinal variables OR 1 continous and 1 ranked variable

Kendall’s correlation coefficient
- Plots two categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

With regard to scatterplots what is more important the p-value or the r

A

The r (the correlation coeeficient)

32
Q

Which is on a linear scale r or r^2. What is r^2

A

r^2

This is the coefficient of determination.

A coefficient of determination of 0.49 means that 49% of the variation can be explained by the relationship between the variables and therefore 51% explained by other factors

33
Q

True or false: A significant p-value and a high r value on a scatterplot imply causation

A

False

34
Q

What are Altman-Bland Plots

A

These plots quantify the agreement between 2 readings.

35
Q

Which methods cannot be used to quantify the agreement between two readings

A
  1. comparing the means (and finding no significant difference)
  2. The correlation coefficient is measure of association, not agreement

Altman-Bland Plots can be used to measure agreement between two readings

36
Q

When are you most likely to use the Altman-Bland plot

A

Comparing a measurement by a new device/monitor against the gold standard

37
Q

Define the axes for the altman-bland plot. How is this plot interpreted

A

Y-axis –> Difference between methods (A - B)

X-axis –> Average of methods x axis [(A+B)/2]

Interpretation:

  1. The mean of the difference A - B is the relative bias.
  2. The SD is the estimate of the error
38
Q

Classify variables

A

CATEGORICAL

  • Ordinal (ordered)
  • Nominal (non-ordered)

CONTINUOUS (familiar constant and computable differences between variables)

  • Interval scale
  • ratio scale
39
Q

What is frequency in statistics

A

The number of times (N) or proportion (%) of times a variable (data item) has been observed to occur.

40
Q

Distinguish the measures of central tendency

A

Mean: the average
Median: The middle value
Mode: The most common value

41
Q

What is the dispersion

A

Dispersion tests are tests for the normality of the data distribution. tests of skewness and kurtosis

42
Q

How should asymmetrical data be represented?

A

By Box and Whisker plots:

  1. Range: minimum to maximum value within the fence
  2. Interquartile range: 25th to 75 percentile
  3. Quartiles: Four equal groups of 25 %

Fences are calculated as follows

Lower fence is Q1 - 1.5 x IQR

Upper fence is Q3 + 1.5 x IQR

43
Q

How is a normal or Gaussian distribution described

A

Mean (mu population and x-bar sample)

Standard deviation

44
Q

Demonstrate how the standard deviation is calculated

A

X - mean
S - Standard Deviation

  1. Calculate X
  2. Subtract the mean from each data point (x - X)
  3. Square the result to make all differences positive
    (x - X)^2
  4. Sum all the differences SUM [(x-X)^2]
  5. Divide the result by n = 1 –> gives you the variance
  6. Take the square root of the variance and you get the standard deviation
45
Q

What is the point of the standard deviation and what does it mean

A

Allows you to determine the distribution in relationship to the mean.

1SD - 68% of people fall within 1 SD
1.96SD - 95% of people fall within 1.96 SD
2SD - 95.4% of people fall within 2 SD
3SD - 99.7% of people fall within 3SD

46
Q

What is the standard error of the mean and why is it used. How is it calculated

A

If the study was repeated, you would get different patients and hence different results.

You can estimate the error in your sample by calculating the standard error of the mean.

SE = SD/ √n

47
Q

What is the 95% confidence interval. How is the 95% confidence interval calculated

A

This is the range of values within which the true population mean is likely to lie

95% CI = X + 1.96 (SE) to X - 1.96 (SE)

Where SE = SD/ √n

Thus the SE becomes smaller with increasing sample size. As SE gets larger with increasing sample size, the 95%CI gets smaller indicating greater certainty in the precision of the result.

i.e. the larger the sample size –> the smaller the SE –> the more precise the result –> the narrower the distribution.

48
Q

How are the SD and the SE interpreted differently

A

The standard deviation (and reference range) describes the amount of variability between individuals within a single sample

The standard error of the mean (and confidence interval) measure the precision with which a population value (e.g. mean (mu)) is estimated by a single sample.

49
Q

What is a Z score. How is it calculated?

A

It is the number of standard deviations that a value (x) is above or below the mean

Z = (observed value - population mean)/(population SD)

Z = (x - mu) / (sigma)

50
Q

Define the Null hypothesis (H0)

A

There is no difference between the groups.

H0: X = mu

We assume that the groups that are being compared are being drawn from the same population, and hence the population parameters mew and sigma are known

51
Q

Define the alternative hypothesis (H1)

A

There is a difference between the groups

H1: X does not equal mu

An alternative hypothesis states that there is a difference between the groups.

52
Q

What is the P value and how is it interpreted

A

The p-value is the probability of the observed result arising by chance (If H0 is true).

The p-value is the chance of getting the reported study result when the null hypothesis is actually true.

The smaller the p-value, the stronger the finding.

53
Q

What does a p value of > than 0.05 mean

A

This means that there is a 1 in 20 chance that the study result occurred by chance despite the null hypothesis actually being true.

If this is the case the null hypothesis is accepted and the alternative hypothesis (H1) rejected.

A p value of less than 5% is statistically significant meaning that there is less than 5% chance that the study result occurred by chance if the null hypothesis is actually true.

54
Q

What is a type 1 error and distinguish the alpha value from the p - value

A
  • A false positive
  • The null hypothesis is incorrectly rejected (there really is no treatment effect, but the study finds one)
  • the alpha-value determines the risk of this happening.

An alpha-value of 0.05 - same as the p - value - so there is a 5 % chance of making a type 1 error

p-value is the probability of the observed result arising by chance alone (if the H0 is actually true)

a -value is the chance that the null hypothesis is incorrectly rejected ( a false positive / a type 1 error)

55
Q

What is a type 2 error

A
  • This is a false negative
  • The null hypothesis is incorrectly accepted (there is a treatment effect, but the study finds none)
  • The (1 - beta) determines the risk of this happening
  • At a Beta is 0.8, so there is a 20% chance of making a type 2 error
56
Q

What is the power of a statistical test? What is the point of calculating the power of a statistical test and how is it calculated?

A

The power of a statistical test is the probability of CORRECTLY REJECTING the null hypothesis

It is the chance of the study demonstrating a true result.

You can use ‘the power’ to calculate a sufficient sample size, and not run the risk of performing a pointless negative study.

Power = 1 - false negative rate

Power = 1 - Beta error

Normally power is 80% (i.e. a 20% chance of false negative result)

57
Q

Which factors are required for the calculation of the an adequate sample size?

A
  1. Alpha value: level of significance (normally 0.05)
  2. Beta value: the power (normally 0.2)
  3. The statistical test you plan to use
  4. The variance of the population (the greater the variance, the larger the sample size)
  5. The effect size (the smaller the effect size, the larger the sample required)
58
Q

Differentiate statistical significance from clinical significance

A

STATISTICAL SIGNIFICANCE

  • the likelihood othat the results obtained were not due to chance
  • data which do not reach statistical significance are too weak to reach any conclusion

CLINICAL SIGNIFICANCE

  • the practical importance of a treatment effect
  • clinical significance implies that the difference between treatmnes in effectiveness is clinically important, and it is possible that clinical practice will change if such a difference is seen.

Statistical significance is used to inform clinical significance

59
Q

Define the primary outcome

A

Only the primary outcome can change practice, if the study findings are found to be both statistically and clinically significant

60
Q

Define secondary outcomes

A

Secondary outcomes are only hypothesis generating. They need further investigation to ensure that this was not just a chance finding

61
Q

Compare the standard deviation formula to the Chi squared test formula

A

S = √ [ Σ(x - X)^2 / (n-1) ]

x - obsrervation
X - mean
n = total number

Chi squared (X^2) = Σ(Oi - Ei)^2 / Ei

Oi - Observed value
Ei - Expected value

62
Q

What is the Chi squared test used for, what does it calculate and how is it interpreted?

A

The Chi squared test can be used to test the ‘goodness of fit’ between observed and expected data.

It is used similar to the p-value used for quantitative data.

Interpretation:

Calculated Chi squared > Chi square critical value (p = 0.05) –> reject your null hypothesis.

Calculated Chi squared < Chi squared critical value (p=0.05) –> accept your null hypothesis

63
Q

What is the Fragility index

A

Measure of robustness (or fragility) of the results of a clinical trial

The fragility index is the number indicating how many patients would be required to convert a trial from being statistically significant to not significant (p>0.05)

The larger the fragility index the better

64
Q

How can a study of an intervention be biased?

A
  1. Intervention and control groups may be different at the start
  2. Intervention and control groups may become different as the study progresses
  3. Intervention and control groups differ, independent of treatment at the end of the study
65
Q

Give an example of bias in the case that the intervention and control groups differ from the start of the study. Suggest how this bias can be reduced with regards to therapy and harm

A

Treatment and control patients differ in prognosis

Therapy

  • Randomisation
  • Randomisation with stratification

Harm

  • Statistical adjustment of prognostic factors
  • Matching
66
Q

Give three examples of bias in the case that the intervention and control groups become different as the study progresses. Suggest how this bias can be reduced with regards to therapy and harm

A

Placebo

  • Therapy: Blinding of patients
  • Harm: Objective outcomes (mortality)

Co-intervention

  • therapy: Blinding of caregivers
  • Harm: Document treatment differences and statistically adjust

Bias in assessment

  • Therapy: Blinding of assessors of outcomes
  • Harm: Document treatment and statistically adjust
67
Q

Give 3 examples of bias in the case that the intervention and control groups differ, independent of treatment at the end of the study

A

Loss to follow up

  • therapy: ensure complete follow up
  • harm: Ensure complete follow up

Stop study early because of large effect
- therapy: complete study as initially planned

Omitting patients who did not receive assigned treatments
- therapy: include all patients in the arm to which they were randomized

68
Q

Describe the levels of evidence

A

RCTs - HIGH QUALITY

1a - Systematic review (with homogeneity) of RCTs
1b - Individual RCT (w narrow CI)
1c - All or none ( All pts dies before Rx avail but now some survive on Rx. or some patients died before Rx avail. but now none die on it)

LOW QUALITY RCTs and COHORT STUDIES
2a - Systematic review (with homogeneity) of cohort studies
2b - Individual Cohort studies (including low quality RCT < 80% follow up)
2c - ‘Outcomes’ Research or ecological studies

3a - Systematic review (with homogeneity) of Case control studies
3b - Individual Case control studies

4 - Case series (poor quality cohort and case control)
5 - Expert opinion or based on physiology, research or first principles

69
Q

What is the indication for a Forest plot

A

to present the summary data of a meta-analysis

70
Q

Discuss the interpretation of a Forest plot

A

X-axis: Odd’s ratio
Y-axis: List of studies
Vertical line: Line of no effect - Odds Ratio of 1.0
Horizontal lines: confidence interval of individual study
Square position: A point estimate of odds ratio
Square size: Weight of study according to weighing rules of the meta-analysis (representing sample size and statistical power)
Diamond: Combined result of the meta-analysis

Results can be considered statistically significant if the CIs of the combined result do not cross the line of no effect

71
Q

Differentiate between parametric and non-parametric tests

A

Parametric tests

  • Rqr. Normal distribution
  • Are more accurate
  • Rqr. Large sample size

Non-parametric tests

  • Make no assumptions about the distribution of data
  • Better with smaller sample sizes (n < 30)
  • Have less power than parametric tests
72
Q

What is a Receiving Operating Characteristic Curve (ROC)

A

A curve to determine the cut -off point in continuously distributed data, the predicts the presence of an outcome.

1) Screening cut point
2) Diagnostic cut point
3) Optimal cut point

73
Q

How are ROC curves interpreted

A

Data point at extreme top and left of the curve = perfect test i.e. 100% sensitivity and 100% specificty

X - Axis is 100 - specificity
Y - Axis is Sensitivity

Further left on the plot the more specific (Minimum false pos)
Further up on the plot the more sensitive (Minimum false neg)

74
Q

When are you most likely to see a ROC test

A

To determine an appropriate cut point for a test e.g. at what STOP-BANG score should you consider postoperative apnoea a clinical problem.

75
Q

What is a Survival plot (Kaplan-Meier curve)

A

To present the time to an outcome in two different groups.

Used to report the time of specific outcomes in two patient cohorts.

The utility of a survival plot is that it can indicate the time period at which the patient is most likely to be at risk of the outcome (the steepest part of the curve)