stats Flashcards

1
Q

median

A

50th percentile
quantifies average
50% of data above median, 50% beloe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when is data symmetrial with resepect to median

A

when median is equidistant from upper and lower quartile boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

when is negative skew seen wiith respect to median

A

when median is closer to upper quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do you check symmetry of variables

A

box and whisper

histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

difference of 99% CI compared to 95% CI

A

99% CI would be a wider range than 95% CI and extend it at both extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

if p>0.05

A

no evidence

there may truly be no difference in the mean of the variables
the sample may be too small to detect a difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

smaller standard error means

A

the estimate of the mean is more precise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2-tailed test

A

difference in sample means in either direction provides evidence against null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when is mann whitney test used

A

if variables are discrete/categorical/ordinal

if data is non-parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

a parametric test makes strong assumptions on..

A

distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does wilcoxon signed-rank test compare

A

distribution between first and second measurement

assesses whether population mean ranks differ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when is wilcoxon signed-rank test used

A

matched/paired data

when assumptions of paired t-test do not fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does standard error indicate

A

indicates how far the study estimate would be from the true value in the population if you were to repeat the study multiple times with different samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

p-value if CI excludes the null hypothesis value

A

p<0.05

there is some evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define odds

A

how common a binary characteristic is to occur for a single group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

odds ratio

A

measure of association between exposure and outcome

odds of one group compared to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

reference category

A

odds of ref category = 1

used to compared odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

pearsons correlation coefficient

A

r

quantifies the strength of linear association between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

assumptions for pearsons correlation

A

linear relationship between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does r squared (pearsons) refer to

A

the proportion of variation in one variable explained by the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what does linear regression desribe

A

the relationship between two quantitative variables

one variable is independant and affects the other dependant variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

equation for linear regression

A

outcome = a + b(predictor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how do you calculate diagnostic accuracy

A

PPV

NPV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how do you calculate sensitivity

A

no. who correctly tested +ve for the disease / total no. who have the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
how do you calculate specificity
no. of people correctly test -ve / total no. of healthy people
26
how do you calculate PPV
no. of people who correctly test +ve / total no. of people who test +ve
27
use of normal distribution
determines choice of statistical methods
28
mean and sd define
normal distribution
29
define population
full set of units (people) to which the study results will be generalised usually infinite in size
30
why might there be uncertainty in the answer provided by the sample data
variability between people | sample is only a subset of the population - not fully representative
31
what are statistics for
summarising sample data | quantifying uncertainty in results
32
2 types of statistics
inferential | descriptive
33
descriptive statistics
describe basic features/characteristics in the sample
34
inferential statistics
make inferences about relationships in the population using the sample however can never be 100% certain e.g. standard error, CI, p-values
35
sampling distribution
all the different estimates from different samples and their frequencies
36
effect of sample size on CI
the larger the sample size the narrower the CI
37
effect of CI on certainty/uncertainty
the wider the CI, the greater the uncertainty
38
what do p-values quantify
the extent to which the sample estimate contradicts the null hypothesis
39
what does PICO stand for
population/patient intervention comparison outcome
40
what does the t in PICO(T) stand for
type of study design that would work best
41
why is PICO used
to frame or answer a health related question
42
when is data paired
if data are matched on criteria e.g. age/gender before comparing on either trial arm if measurements are taken before and after an interventoin
43
what does paired data analyse
within-pair differences
44
parametric methods
e.g. t-test, analysis of variance (ANOVA) make distribtuional assumptions eg. Normal summarise data using means and sd
45
parametric method for 3 or more independent groups
ANOVA
46
what does ANOVA stand for
analysis of variance
47
parametric methods for 3 or more dependant groups
paired test | repeat measures of ANOVA
48
when do you use a non-parametric test
if variables are skewed small sample size if sd is different across groups if the variables are more ordinal than quantitative
49
when using non-parametric tests you should...
analyse the rank ordering in the data (not actual scores) only provide p-values (not CIs) compare entire distribution rather than just means
50
how do you summarise non-parametric data
IQR | median
51
non-parametric test for 2 independent groups
Mann Whitney
52
non-parametric test for 2 paired groups
Wilcoxon signed-rank
53
non-parametric for 3 or more independent groups
Kruskal Wallis
54
non-parametric for 3 or more paired groups
Friedman
55
advantages of non-parametric tests
they are always valid for quantitative data | parametric only valid if assumptions are satisfied
56
disadvantages of non-parametric tests
no CIs based only on analysis of ranks no direct inferences about a parameter
57
what defines a large sample sizw
sample greater than 50
58
how do you calculate variance
SD squared
59
how do you calculate whether the variances are 'equal'
variance in one group should be no more than 4x the variance of the other group
60
how can you compare CIs between groups
calculate a single CI for the difference between groups
61
effect of proportion on odds
the higher the proportion the higher the odds
62
how do you calculate proportion
no. of participants in category of interest / total no. of participants
63
relationship between exposure variable and outcome variable
the exposure variable is the potential cause of the outcome variable
64
tests for binary hypothesis testing
chi-squared (large samples) | fisher's exact (small samples)
65
risk difference of 0
no risk difference | groups equally likely to have the disease
66
how do you calculate risk difference
proportion in group A - proportion in group B
67
how do you calculate risk ratio
proportion in group A / proportion in group B
68
what do risk ratio and odds ratio quantify
the strength of association between the intervention and binary variable
69
risk ratio = 1
no difference in risk between two groups
70
NNT stands for
number needed to treat
71
how do you calculate NNT
1 / risk difference
72
what is NNT
the number of people that need to receive intervention before 1 person benefits from it
73
what is NNT better for
quantifying the impact of an intervention in a given population
74
what does NNT do
measures the effectiveness of an intervention | based on risk difference
75
what is correlation
the association between two variables
76
graphical description of correlation
scatter plot outcome = y-axis predictor = x-axis
77
numerical description of correlation
correlation coefficient pearson's = linear spearmans = non-linear
78
assumptions for spearmans correlation coefficient
non-linear correlation e.g. curved line must be 'monotonic' - either never -ve or never +ve e.g. graph cannot be U-shaped
79
if r squared = 1
then all the variation in one variable is explained by the other variable
80
what is the predictor
the independent variable | the explanatory variable - potential cause of the outcome variable
81
what is the least squares regression line
line that makes the vertical distance from the data points to the regression line as small as possible
82
what is a residual (e)
the vertical distance between the observed data point and the regression line (predicted value)
83
equation for calculating erros in prediction
outcome = a + b(predictor) + e
84
are most biological variables are continuous?
yes | e.g. blood pressure
85
why is it impossible to choose a cut-off line to correctly classify all subjects to a disease status
most distributions of diagnostic test scores will overlap
86
what are the probability-based estimates of accuracy
specificity sensitivity PPV NPV
87
what factors affect sensitivity of a test
the severity of the disease
88
assumption for sensitivty test
population shave similar disease severity
89
what factors affects specificity of tests
if symptoms show on non-disease patients specificity is reduced
90
what does PPV quantify
the likelihood that somebody has the disease based on the test result
91
how does the prevalence of a disease affect the PPV
if a disease has a greater prevalence (is more common) then the PPV will increase