Statistics Flashcards
Sensitivity
the proportion of people who test positive among all those who actually have the disease
Specificity
the proportion of people who test negative among all those who actually do not have that disease
PPV
the probability that, following a positive test result, that individual will truly have that specific disease
NPV
the probability that, following a negative test result, that individual will not have that specific disease
As prevalence decreases, what happens to a) PPV and b) NPV
a) PPV decreases - there will be more false positives for every true positive “needle in a haystack”
b) NPV increases - more true negatives for every false negative
- bc a false negative would mean the person actually has the disease, which is unlikely because the disease is rare (low prevalence)
Correlation
statistical measure which determines co-relationship or association of two quantitative variables
used to represent a linear relationship between two variables
Regression
describes how an independent variable is numerically related to the dependent variable. used to estimate a line of best fit and estimate one variable on the basis of another
can make predictions about what we expect among individuals who have not had the dependent variable measured
Statistical inference
Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution
Adjusted R sq
estimates the total proportion of overall variation of ___ explained by the fitted regression model
R sq
___ % of the variability in variable Y is explained by variation in X
Coefficient
For every one unit change in X, there is a coefficient (beta) unit change in Y
cox proportional hazard model
allows for survival times to be modelled in terms of continuous and categorical explanatory variables
any factor alters the hazard factor at the same magnitude at all time points
Hazard ratio
the probability of an individual dying at time (t) compared to background hazard rate in people without intervention
Bland Altman Plot
graphical method to compare two measurement techniques
plots the difference between measurements for each data point against average of observations
Number needed to treat NNT
the number of patients you would need to treat with new treatment to have ONE MORE SUCCESS than with the old treatment
Parametric
assumes variables have been sampled from a normal distribution (as defined by mean and SD)
H0 of mann whitney and equivalent
H0: both groups observations are sampled from the same underlying distribution
2 sample t
H0 of wilcoxon signed ranks and equivalent
H0: pops MEDIAN is of a particular value
one sample t
H0 of wilcoxon matched pairs test and equivalent
HO: median difference in popn is 0 - values of 1st measurement are apron equal to those of the 2nd measurement
paired t test
kruskal wallis test
ANOVA
h0 all groups come from popns with same distribution
Assumption of MW CI
that the two groups have the same shape and spread (differ only in medians not variability)
I squared
measures the percentage of variability in treatment effect estimates that is due to between study heterogeneity rather than chance
P value
the probability of obtaining the observed data or data more extreme, assuming the null hypothesis is true eg. 5% chance if null is true
due to random sampling error
censored observation
information about an individuals survival time is incomplete - shows time spent in study before an indivudal left due to unrelated illness/condition, drop out etc
Log rank test
tests for differences in survival times. goodness of fit
NPM
ROC curve
plots sensitivity v 1- specificity for different cut off points
helps define cut off point for a test with continuous measurements