Statistics Flashcards

1
Q

Definition of funnel plot

A

a scatterplot of treatment effect (e.g. OR on x axis) against a measure of study precision (e.g. SEM on y axis)

Commonly used in meta-analyses to assess publication bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of forest plot

A

graphical display of estimated results from a number of scientific studies addressing the same question

gives visual suggestion of the amount of heterogeneity

and can show the estimated common effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Definition of standard deviation

A

measure of amount of variation or dispersion of a set of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition of per-protocol analysis

A

only subjects who completed the entire protocol are included in the analysis of a randomised clinical trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Definition of intention to treat analysis:

A

all subjects randomised are included in the analysis regardless of whether they completed the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Definition of clinical equipoise

A

state of genuine uncertainty on relative value of two interventions being compared in a trial - requirement for RCTs to be ethical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Definition of MHRA Yellow Card scheme

A

provides early warning that safety of a medicine or medical device may require further investigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Definition of standard error

A

SD / square root of (N)

looks at how accurate the mean of the study population is compared to the true population
(whereas standard deviation compares how participants in the study population compare to each other)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Definition of null hypothesis

A

hypothesis that there is no significant difference between specified populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Definition of type I error

A

falsely rejecting a null hypothesis that is true in the population (false positive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Definition of type II error

A

failing to reject null hypothesis that is false in the population (false negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Definition of power

A

probability of picking up a significant difference, if there is one (probability of not making type 2 error (false negative))

1 – probability of Type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Definition of p-value

A

probability of event happening by chance = probability of wrongful rejection of the null hypothesis = probability that the null hypothesis is true = type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Definition of confidence interval

A

range within which the true answer will lie 95% of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Definition of a priori

A

pre-specifying end-points & outcomes of a study to reduce reporting bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Definition of surrogate endpoint

A

variable relatively easily measured that predicts a distant outcome of the intervention being tested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Definition of composite outcome

A

combination of two or more outcomes into single endpoint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Definition of Cohen’s kappa coefficient

A

statistic used to measure inter-rater reliability (degree of agreement between raters/observers )for qualitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Definition of absolute risk

A

probability that an event will occur (incidence)

number of events/total number of people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Definition of absolute risk reduction

A

difference in rate of events between 2 groups

ARR = AR (C) – AR (T)
Incidence in control - incidence in intervention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Definition of relative risk

A

risk ratio = relative likelihood of an event occurring in the treatment vs control group throughout study period

RR = AR (T) / AR (C)

cumulative risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Definition of relative risk reduction

A

reduction in rate of outcome in treatment group vs. control group

RRR = ARR / AR (C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Definition of number needed to treat

A

number of pts needed to treat to prevent 1 additional bad outcome, e.g. death, stroke

NNT = 1 / ARR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Definition of hazard rate

A

probability of the event occurring in the next time interval divided by the length of that time interval

time-sensitive = instantaneous risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Definition of hazard ratio

A

relative likelihood of an event occurring in the treatment vs control group at any given point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Definition of logistic regression

A

statistical analysis method to predict a binary outcome, such as yes or no, based on existing independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Definition of linear regression

A

regression model that estimates relationship between one independent variable and one dependent variable using a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Definition of chi-squared test

A

hypothesis test to determine whether observed frequencies are significantly different to expected frequencies if the null hypothesis was true

categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Definition of t-test

A

hypothesis test to determine whether means of two groups are significantly different from each other

continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Definition of ANOVA

A

hypothesis test to determine whether means of three or more groups are significantly different from each other

continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Definition of log-rank test

A

hypothesis test to compare the survival distributions of two samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Definition of Kaplan Meier curves

A

probability of survival curves for categorical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Definition of Cox proportional hazards regression analysis

A

survival analysis for both quantitative & categorical variables, which can simultaneously assess the effect of several risk factors on survival time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Definition of correlation coefficient

A

how closely 2 continuous variables move with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Types of correlation coefficient

A

parametric: Pearson’s R
non-parametric: Spearman’s rank correlation Rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Definition of receiver-operating characteristic (ROC) curve

A

a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: True Positive Rate. False Positive Rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

ROC curve axes

A

X axis: 1-specificity (false +ves)
Y axis: sensitivity (true +ves)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Funnel plot axes

A

X axis: study outcome, e.g. OR
Y axis: study precision, e.g. SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Measure of forest plot heterogeneity

A

I squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How do you calculate relative risk?

A

RR = A/(A+B) / C/(C+D)

(those who got the disease in all exposed vs those who got the disease in all not exposed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What types of studies can RR be used in?

A

Prospective studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Odds ratio

A

Ratio of odds of something happening vs the odds of something not happening with a particular exposure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Which studies is odd ratio used in?

A

Case-control studies

44
Q

How do you calculate odds ratio?

A

OR = A/C / B/D

Odds of exposure in the cases, vs odds of exposure in the controls

(the odds of getting the disease when exposed vs the odds of not getting the disease when exposed)

45
Q

If a disease is really rare, the odds ratio and relative risk actually end up being quite similar. True or false?

A

True - however, they are not the same thing … and most times they end up being very different

46
Q

When would hazard ratio be used?

A

Useful when the risk is not constant with respect to time - it uses data from different time points where the risk might be changing over a period of time

47
Q

Relative risk 1.45 in plain language

A

45% more likely to have outcome X

48
Q

Case control, looking at exposure to risk factors in patients that had oral cancer. Looking at risk factor chewing tobacco, OR is 1.6. Explain this in plain language?

A

In those who had oral cancer, the odds of chewing tobacco were 1.6 times higher than those who did not have oral cancer.

49
Q

Odds ratio 1.6 in plain language (compared to RR 1.6)

A

Odds ratio of 1.6 means the odds of disease is 60% higher in exposed people

Whereas risk ratio of 1.6 means exposed people are 60% more likely to be diseased

50
Q

Hazard ratio 0.79 in plain language

A

At any particular point, group A is 21% less likely to have outcome X

51
Q

Incidence

A

Number of new cases of a disease within a specific period of time

52
Q

Prevalence

A

Number of cases of disease at a given time

53
Q

Number needed to treat (NTT)

A

1/ARR -> tells you how many people need to be treated with that intervention in order to prevent one outcome occurring

54
Q

What is relative risk reduction (RRR)?

Compared ARR?

A

ARR / incidence [control group] as %

RR of 0.8 = RRR of 20%

Relative risk reduction (RRR) refers to the percentage decrease in risk achieved by the group receiving the intervention vs. the group that did not receive the intervention (the control group). Absolute risk reduction (ARR) refers to the actual difference in risk between the treated and the control group.

55
Q

What are the causes of type 1 error?

A

bias
confounding
data dredging

56
Q

What are causes of type 2 error?

A

Sample size too small
Measurement variance being too large

57
Q

Beta

A

Probability of making a type II error (under 0.8 and we are not too fussed?)

(alpha is the probability of making a type I error)

58
Q

How can we increase power?

A

Increase sample size
Increase effect size
Increase measurement precision

59
Q

Advantages of per protocol analysis

A

Accurate representation of the effect of the intervention because you have only included the people who have properly done the intervention.

60
Q

Disadvantages of per protocol analysis

A

Susceptible to attrition bias and exclusion bias

61
Q

Advantages of ITT analysis

A

More accurate of results in clinical practice because in practice patients do not always follow instructions/protocols
More generalisable

62
Q

Disadvantages of ITT analysis

A

Not getting a true, accurate estimate of how well the drug actually does in optimal conditions
Imputed values may be inaccurate

63
Q

Null hypothesis

A

The assumption that any difference between experimental groups is due to chance

64
Q

Methods for dealing with missing data (in ITT analysis)

A

Worst-case scenario
Hot deck imputation: fill in missing values from similar subjects with complete records
Last observation carried forward

65
Q

Standard deviation of data interpretation

A

The narrower the standard deviation, the less important it is to have a large sample size

66
Q

Parametric, paired, 2 groups

A

Paired t-test

67
Q

Parametric, paired, >2

A

One way ANOVA

68
Q

Parametric, unpaired, 2 groups

A

Independent t-test

69
Q

Parametric, unpaired, > 2 groups

A

One way ANOVA

70
Q

Non-parametric, paired, 2 groups

A

Wilcoxon signed rank

71
Q

Non-parametric, paired, > 2 groups

A

Friedman test

72
Q

Non-parametric, unpaired, 2 groups

A

Mann-Whitney U test

73
Q

Non-parametric, unpaired, > 2 groups

A

Kruskal Wallis test

74
Q

Parametric data is

A

data that assumes a normal distribution. When data sets are large enough, parametric statistical tests can be employed regardless of normality. Parametric tests are generally considered to have greater statistical power.

75
Q

Non-parametric data is

A

data that does not assume a normal distribution. The data is ordinal, ranked, or has outliers that cannot be removed.

76
Q

Time to event analysis: based on Kaplan-Meir curve. Can use:

A

Cox proportional hazards, log-rank or Wilcoxon two-sample test. Cox model is the most used.

77
Q

Retrospective subgroup analysis

A

Data dredging means that some associations will crop up due to chance. Dredging: “cherry-picking of promising findings leading to a spurious excess of statistically significant results in published or unpublished literature”.

78
Q

How to compare if two Kaplan-Meir curves
are different?

A

Log-rank test

79
Q

How to do power calculations

A

Power is the ability to discern a certain difference if that difference exists. You usually pick a clinically meaningful difference. You need a population mean and standard deviation AND:
▪ The standard deviation of the test group
▪ The clinically meaningful difference of the test group
▪ Then you can calculate the size of the sample you need for certain power

80
Q

Effect size

A

effect size is the magnitude of the difference between groups.
The absolute effect size is the difference between the average, or mean, outcomes in two different intervention groups.

81
Q

Nominal data

A

a type of qualitative data which groups variables into categories

ie hair colour

82
Q

Ordinal data

A

a kind of qualitative data that groups variables into ordered categories.

ie range of income, or level or education

83
Q

Interval data

A

a data type which is measured along a scale, in which each point is placed at equal distance from one another

ie temperature in degrees, time in minutes

84
Q

Ratio data

A

a form of quantitative (numeric) data.

ie height, weight,

85
Q

Paired vs unpaired samples

A

Paired means that both samples consist of the same test subjects

Unpaired means that both samples consist of distinct test subjects

86
Q

Alpha level

A

also known as the significance level

is the probability of rejecting the null hypothesis when it is true

type 1 error - false positive

87
Q

Inferential testing

A

Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population

88
Q

Correlation

A

Correlation is a statistical measure that expresses the extent to which two variables change together at a constant rate.

89
Q

Regression

A

a statistical technique that relates a dependent variable to one or more independent (explanatory) variables

90
Q

Correlation vs regression

A

Correlation and regression are techniques used to analyze the relationship between two quantitative variables.

While correlation measures the strength of a linear relationship between two variables, regression measures how those variables affect each other using an equation.

91
Q

Degrees of freedom

A

degrees of freedom in a statistical calculation represent how many values involved in a calculation have the freedom to vary

calculated to help ensure the statistical validity of chi-squared tests or t-tests etc

92
Q

Validity

A

Statistical validity can be defined as the extent to which drawn conclusions of a research study can be considered accurate and reliable from a statistical test

93
Q

Accuracy

A

Accuracy is how close a given set of measurements (observations or readings) are to their true value,

94
Q

Precision

A

the agreement among repeated measurements of the same variable.

95
Q

Variance

A

term variance refers to a statistical measurement of the spread between numbers in a data set

how far each number in the set is from the mean

96
Q

Placebo

A

a substance that has no therapeutic effect, used as a control in testing new drugs.

97
Q

What is within participant comparison

A

Participants are assessed before and after an intervention
Analysis is of the same participant

98
Q

What is an N-of-1 trial

A

A single subject trial where an individual is the sole observation
Provides optimal intervention for an individual (e.g. optimal dose)

99
Q

What is a factorial design

A

Study that investigates multiple independent variables on an outcome measure (both separately and combined)

100
Q

Number needed to treat

A

Number of participants required to take a medication/have an intervention (compared with the control) to see one positive event
Is 1/ARR

101
Q

Sensitivity

A

How well the test is able to detect those with the disease

True Positive (correctly detected with disease) /True Positive +False Negative (total with disease)

102
Q

Specificity

A

How well the test is able to rule out those without the disease

True Negative (correctly detected without disease) /True Negative + False Positive (total without disease)

103
Q

Positive predictive value

A

The percentage of people that test positive, that truly have the disease
True Positive (correctly detected with disease / True Positive +False Positive (total that tested positive)

104
Q

Negative predictive value

A

The percentage of people that test negative, that truly do NOT have the disease
True Negative (correctly detected without disease)/ True Negative + False Negative (total that tested negative)

105
Q

Number needed to harm (NNH)

A

derived statistic that tells us how many patients must receive a particular treatment for 1 additional patient to experience a particular adverse outcome.

Lower NNT and higher NNH values are associated with a more favorable treatment profile

106
Q

Outlier

A

An outlier is an observation that lies an abnormal distance from other values in a random sample from a population

Extreme values that stand out greatly from the overall pattern of values in a dataset