stats Flashcards
median
50th percentile
quantifies average
50% of data above median, 50% beloe
when is data symmetrial with resepect to median
when median is equidistant from upper and lower quartile boundaries
when is negative skew seen wiith respect to median
when median is closer to upper quartile
how do you check symmetry of variables
box and whisper
histogram
difference of 99% CI compared to 95% CI
99% CI would be a wider range than 95% CI and extend it at both extremes
if p>0.05
no evidence
there may truly be no difference in the mean of the variables
the sample may be too small to detect a difference
smaller standard error means
the estimate of the mean is more precise
2-tailed test
difference in sample means in either direction provides evidence against null hypothesis
when is mann whitney test used
if variables are discrete/categorical/ordinal
if data is non-parametric
a parametric test makes strong assumptions on..
distribution of data
what does wilcoxon signed-rank test compare
distribution between first and second measurement
assesses whether population mean ranks differ
when is wilcoxon signed-rank test used
matched/paired data
when assumptions of paired t-test do not fit
what does standard error indicate
indicates how far the study estimate would be from the true value in the population if you were to repeat the study multiple times with different samples
p-value if CI excludes the null hypothesis value
p<0.05
there is some evidence
define odds
how common a binary characteristic is to occur for a single group
odds ratio
measure of association between exposure and outcome
odds of one group compared to another
reference category
odds of ref category = 1
used to compared odds
pearsons correlation coefficient
r
quantifies the strength of linear association between two variables
assumptions for pearsons correlation
linear relationship between variables
what does r squared (pearsons) refer to
the proportion of variation in one variable explained by the other variable
what does linear regression desribe
the relationship between two quantitative variables
one variable is independant and affects the other dependant variable
equation for linear regression
outcome = a + b(predictor)
how do you calculate diagnostic accuracy
PPV
NPV
how do you calculate sensitivity
no. who correctly tested +ve for the disease / total no. who have the disease
how do you calculate specificity
no. of people correctly test -ve / total no. of healthy people
how do you calculate PPV
no. of people who correctly test +ve / total no. of people who test +ve
use of normal distribution
determines choice of statistical methods
mean and sd define
normal distribution
define population
full set of units (people) to which the study results will be generalised
usually infinite in size
why might there be uncertainty in the answer provided by the sample data
variability between people
sample is only a subset of the population - not fully representative
what are statistics for
summarising sample data
quantifying uncertainty in results
2 types of statistics
inferential
descriptive
descriptive statistics
describe basic features/characteristics in the sample
inferential statistics
make inferences about relationships in the population using the sample
however can never be 100% certain
e.g. standard error, CI, p-values
sampling distribution
all the different estimates from different samples and their frequencies
effect of sample size on CI
the larger the sample size the narrower the CI