Statistics Flashcards

1
Q

Definition of funnel plot

A

a scatterplot of treatment effect (e.g. OR on x axis) against a measure of study precision (e.g. SEM on y axis)

Commonly used in meta-analyses to assess publication bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of forest plot

A

graphical display of estimated results from a number of scientific studies addressing the same question

gives visual suggestion of the amount of heterogeneity

and can show the estimated common effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Definition of standard deviation

A

measure of amount of variation or dispersion of a set of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition of per-protocol analysis

A

only subjects who completed the entire protocol are included in the analysis of a randomised clinical trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Definition of intention to treat analysis:

A

all subjects randomised are included in the analysis regardless of whether they completed the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Definition of clinical equipoise

A

state of genuine uncertainty on relative value of two interventions being compared in a trial - requirement for RCTs to be ethical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Definition of MHRA Yellow Card scheme

A

provides early warning that safety of a medicine or medical device may require further investigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Definition of standard error

A

SD / square root of (N)

looks at how accurate the mean of the study population is compared to the true population
(whereas standard deviation compares how participants in the study population compare to each other)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Definition of null hypothesis

A

hypothesis that there is no significant difference between specified populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Definition of type I error

A

falsely rejecting a null hypothesis that is true in the population (false positive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Definition of type II error

A

failing to reject null hypothesis that is false in the population (false negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Definition of power

A

probability of picking up a significant difference, if there is one (probability of not making type 2 error (false negative))

1 – probability of Type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Definition of p-value

A

probability of event happening by chance = probability of wrongful rejection of the null hypothesis = probability that the null hypothesis is true = type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Definition of confidence interval

A

range within which the true answer will lie 95% of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Definition of a priori

A

pre-specifying end-points & outcomes of a study to reduce reporting bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Definition of surrogate endpoint

A

variable relatively easily measured that predicts a distant outcome of the intervention being tested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Definition of composite outcome

A

combination of two or more outcomes into single endpoint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Definition of Cohen’s kappa coefficient

A

statistic used to measure inter-rater reliability (degree of agreement between raters/observers )for qualitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Definition of absolute risk

A

probability that an event will occur (incidence)

number of events/total number of people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Definition of absolute risk reduction

A

difference in rate of events between 2 groups

ARR = AR (C) – AR (T)
Incidence in control - incidence in intervention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Definition of relative risk

A

risk ratio = relative likelihood of an event occurring in the treatment vs control group throughout study period

RR = AR (T) / AR (C)

cumulative risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Definition of relative risk reduction

A

reduction in rate of outcome in treatment group vs. control group

RRR = ARR / AR (C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Definition of number needed to treat

A

number of pts needed to treat to prevent 1 additional bad outcome, e.g. death, stroke

NNT = 1 / ARR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Definition of hazard rate

A

probability of the event occurring in the next time interval divided by the length of that time interval

time-sensitive = instantaneous risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Definition of hazard ratio
relative likelihood of an event occurring in the treatment vs control group at any given point
26
Definition of logistic regression
statistical analysis method to predict a binary outcome, such as yes or no, based on existing independent variables
27
Definition of linear regression
regression model that estimates relationship between one independent variable and one dependent variable using a straight line
28
Definition of chi-squared test
hypothesis test to determine whether observed frequencies are significantly different to expected frequencies if the null hypothesis was true categorical variables
29
Definition of t-test
hypothesis test to determine whether means of two groups are significantly different from each other continuous variables
30
Definition of ANOVA
hypothesis test to determine whether means of three or more groups are significantly different from each other continuous variables
31
Definition of log-rank test
hypothesis test to compare the survival distributions of two samples
32
Definition of Kaplan Meier curves
probability of survival curves for categorical values
33
Definition of Cox proportional hazards regression analysis
survival analysis for both quantitative & categorical variables, which can simultaneously assess the effect of several risk factors on survival time
34
Definition of correlation coefficient
how closely 2 continuous variables move with each other
35
Types of correlation coefficient
parametric: Pearson’s R non-parametric: Spearman’s rank correlation Rho
36
Definition of receiver-operating characteristic (ROC) curve
a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: True Positive Rate. False Positive Rate
37
ROC curve axes
X axis: 1-specificity (false +ves) Y axis: sensitivity (true +ves)
38
Funnel plot axes
X axis: study outcome, e.g. OR Y axis: study precision, e.g. SEM
39
Measure of forest plot heterogeneity
I squared
40
How do you calculate relative risk?
RR = A/(A+B) / C/(C+D) (those who got the disease in all exposed vs those who got the disease in all not exposed)
41
What types of studies can RR be used in?
Prospective studies
42
Odds ratio
Ratio of odds of something happening vs the odds of something not happening with a particular exposure
43
Which studies is odd ratio used in?
Case-control studies
44
How do you calculate odds ratio?
OR = A/C / B/D Odds of exposure in the cases, vs odds of exposure in the controls (the odds of getting the disease when exposed vs the odds of not getting the disease when exposed)
45
If a disease is really rare, the odds ratio and relative risk actually end up being quite similar. True or false?
True - however, they are not the same thing … and most times they end up being very different
46
When would hazard ratio be used?
Useful when the risk is not constant with respect to time - it uses data from different time points where the risk might be changing over a period of time
47
Relative risk 1.45 in plain language
45% more likely to have outcome X
48
Case control, looking at exposure to risk factors in patients that had oral cancer. Looking at risk factor chewing tobacco, OR is 1.6. Explain this in plain language?
In those who had oral cancer, the odds of chewing tobacco were 1.6 times higher than those who did not have oral cancer.
49
Odds ratio 1.6 in plain language (compared to RR 1.6)
Odds ratio of 1.6 means the odds of disease is 60% higher in exposed people Whereas risk ratio of 1.6 means exposed people are 60% more likely to be diseased
50
Hazard ratio 0.79 in plain language
At any particular point, group A is 21% less likely to have outcome X
51
Incidence
Number of new cases of a disease within a specific period of time
52
Prevalence
Number of cases of disease at a given time
53
Number needed to treat (NTT)
1/ARR -> tells you how many people need to be treated with that intervention in order to prevent one outcome occurring
54
What is relative risk reduction (RRR)? Compared ARR?
ARR / incidence [control group] as % RR of 0.8 = RRR of 20% Relative risk reduction (RRR) refers to the percentage decrease in risk achieved by the group receiving the intervention vs. the group that did not receive the intervention (the control group). Absolute risk reduction (ARR) refers to the actual difference in risk between the treated and the control group.
55
What are the causes of type 1 error?
bias confounding data dredging
56
What are causes of type 2 error?
Sample size too small Measurement variance being too large
57
Beta
Probability of making a type II error (under 0.8 and we are not too fussed?) (alpha is the probability of making a type I error)
58
How can we increase power?
Increase sample size Increase effect size Increase measurement precision
59
Advantages of per protocol analysis
Accurate representation of the effect of the intervention because you have only included the people who have properly done the intervention.
60
Disadvantages of per protocol analysis
Susceptible to attrition bias and exclusion bias
61
Advantages of ITT analysis
More accurate of results in clinical practice because in practice patients do not always follow instructions/protocols More generalisable
62
Disadvantages of ITT analysis
Not getting a true, accurate estimate of how well the drug actually does in optimal conditions Imputed values may be inaccurate
63
Null hypothesis
The assumption that any difference between experimental groups is due to chance
64
Methods for dealing with missing data (in ITT analysis)
Worst-case scenario Hot deck imputation: fill in missing values from similar subjects with complete records Last observation carried forward
65
Standard deviation of data interpretation
The narrower the standard deviation, the less important it is to have a large sample size
66
Parametric, paired, 2 groups
Paired t-test
67
Parametric, paired, >2
One way ANOVA
68
Parametric, unpaired, 2 groups
Independent t-test
69
Parametric, unpaired, > 2 groups
One way ANOVA
70
Non-parametric, paired, 2 groups
Wilcoxon signed rank
71
Non-parametric, paired, > 2 groups
Friedman test
72
Non-parametric, unpaired, 2 groups
Mann-Whitney U test
73
Non-parametric, unpaired, > 2 groups
Kruskal Wallis test
74
Parametric data is
data that assumes a normal distribution. When data sets are large enough, parametric statistical tests can be employed regardless of normality. Parametric tests are generally considered to have greater statistical power.
75
Non-parametric data is
data that does not assume a normal distribution. The data is ordinal, ranked, or has outliers that cannot be removed.
76
Time to event analysis: based on Kaplan-Meir curve. Can use:
Cox proportional hazards, log-rank or Wilcoxon two-sample test. Cox model is the most used.
77
Retrospective subgroup analysis
Data dredging means that some associations will crop up due to chance. Dredging: “cherry-picking of promising findings leading to a spurious excess of statistically significant results in published or unpublished literature”.
78
How to compare if two Kaplan-Meir curves are different?
Log-rank test
79
How to do power calculations
Power is the ability to discern a certain difference if that difference exists. You usually pick a clinically meaningful difference. You need a population mean and standard deviation AND: ▪ The standard deviation of the test group ▪ The clinically meaningful difference of the test group ▪ Then you can calculate the size of the sample you need for certain power
80
Effect size
effect size is the magnitude of the difference between groups. The absolute effect size is the difference between the average, or mean, outcomes in two different intervention groups.
81
Nominal data
a type of qualitative data which groups variables into categories ie hair colour
82
Ordinal data
a kind of qualitative data that groups variables into ordered categories. ie range of income, or level or education
83
Interval data
a data type which is measured along a scale, in which each point is placed at equal distance from one another ie temperature in degrees, time in minutes
84
Ratio data
a form of quantitative (numeric) data. ie height, weight,
85
Paired vs unpaired samples
Paired means that both samples consist of the same test subjects Unpaired means that both samples consist of distinct test subjects
86
Alpha level
also known as the significance level is the probability of rejecting the null hypothesis when it is true type 1 error - false positive
87
Inferential testing
Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population
88
Correlation
Correlation is a statistical measure that expresses the extent to which two variables change together at a constant rate.
89
Regression
a statistical technique that relates a dependent variable to one or more independent (explanatory) variables
90
Correlation vs regression
Correlation and regression are techniques used to analyze the relationship between two quantitative variables. While correlation measures the strength of a linear relationship between two variables, regression measures how those variables affect each other using an equation.
91
Degrees of freedom
degrees of freedom in a statistical calculation represent how many values involved in a calculation have the freedom to vary calculated to help ensure the statistical validity of chi-squared tests or t-tests etc
92
Validity
Statistical validity can be defined as the extent to which drawn conclusions of a research study can be considered accurate and reliable from a statistical test
93
Accuracy
Accuracy is how close a given set of measurements (observations or readings) are to their true value,
94
Precision
the agreement among repeated measurements of the same variable.
95
Variance
term variance refers to a statistical measurement of the spread between numbers in a data set how far each number in the set is from the mean
96
Placebo
a substance that has no therapeutic effect, used as a control in testing new drugs.
97
What is within participant comparison
Participants are assessed before and after an intervention Analysis is of the same participant
98
What is an N-of-1 trial
A single subject trial where an individual is the sole observation Provides optimal intervention for an individual (e.g. optimal dose)
99
What is a factorial design
Study that investigates multiple independent variables on an outcome measure (both separately and combined)
100
Number needed to treat
Number of participants required to take a medication/have an intervention (compared with the control) to see one positive event Is 1/ARR
101
Sensitivity
How well the test is able to detect those with the disease True Positive (correctly detected with disease) /True Positive +False Negative (total with disease)
102
Specificity
How well the test is able to rule out those without the disease True Negative (correctly detected without disease) /True Negative + False Positive (total without disease)
103
Positive predictive value
The percentage of people that test positive, that truly have the disease True Positive (correctly detected with disease / True Positive +False Positive (total that tested positive)
104
Negative predictive value
The percentage of people that test negative, that truly do NOT have the disease True Negative (correctly detected without disease)/ True Negative + False Negative (total that tested negative)
105
Number needed to harm (NNH)
derived statistic that tells us how many patients must receive a particular treatment for 1 additional patient to experience a particular adverse outcome. Lower NNT and higher NNH values are associated with a more favorable treatment profile
106
Outlier
An outlier is an observation that lies an abnormal distance from other values in a random sample from a population Extreme values that stand out greatly from the overall pattern of values in a dataset