stats Flashcards

1
Q

median

A

50th percentile
quantifies average
50% of data above median, 50% beloe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when is data symmetrial with resepect to median

A

when median is equidistant from upper and lower quartile boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

when is negative skew seen wiith respect to median

A

when median is closer to upper quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do you check symmetry of variables

A

box and whisper

histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

difference of 99% CI compared to 95% CI

A

99% CI would be a wider range than 95% CI and extend it at both extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

if p>0.05

A

no evidence

there may truly be no difference in the mean of the variables
the sample may be too small to detect a difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

smaller standard error means

A

the estimate of the mean is more precise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2-tailed test

A

difference in sample means in either direction provides evidence against null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when is mann whitney test used

A

if variables are discrete/categorical/ordinal

if data is non-parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

a parametric test makes strong assumptions on..

A

distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does wilcoxon signed-rank test compare

A

distribution between first and second measurement

assesses whether population mean ranks differ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when is wilcoxon signed-rank test used

A

matched/paired data

when assumptions of paired t-test do not fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does standard error indicate

A

indicates how far the study estimate would be from the true value in the population if you were to repeat the study multiple times with different samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

p-value if CI excludes the null hypothesis value

A

p<0.05

there is some evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define odds

A

how common a binary characteristic is to occur for a single group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

odds ratio

A

measure of association between exposure and outcome

odds of one group compared to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

reference category

A

odds of ref category = 1

used to compared odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

pearsons correlation coefficient

A

r

quantifies the strength of linear association between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

assumptions for pearsons correlation

A

linear relationship between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does r squared (pearsons) refer to

A

the proportion of variation in one variable explained by the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what does linear regression desribe

A

the relationship between two quantitative variables

one variable is independant and affects the other dependant variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

equation for linear regression

A

outcome = a + b(predictor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how do you calculate diagnostic accuracy

A

PPV

NPV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how do you calculate sensitivity

A

no. who correctly tested +ve for the disease / total no. who have the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

how do you calculate specificity

A

no. of people correctly test -ve / total no. of healthy people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

how do you calculate PPV

A

no. of people who correctly test +ve / total no. of people who test +ve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

use of normal distribution

A

determines choice of statistical methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

mean and sd define

A

normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

define population

A

full set of units (people) to which the study results will be generalised
usually infinite in size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

why might there be uncertainty in the answer provided by the sample data

A

variability between people

sample is only a subset of the population - not fully representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what are statistics for

A

summarising sample data

quantifying uncertainty in results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

2 types of statistics

A

inferential

descriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

descriptive statistics

A

describe basic features/characteristics in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

inferential statistics

A

make inferences about relationships in the population using the sample
however can never be 100% certain

e.g. standard error, CI, p-values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

sampling distribution

A

all the different estimates from different samples and their frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

effect of sample size on CI

A

the larger the sample size the narrower the CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

effect of CI on certainty/uncertainty

A

the wider the CI, the greater the uncertainty

38
Q

what do p-values quantify

A

the extent to which the sample estimate contradicts the null hypothesis

39
Q

what does PICO stand for

A

population/patient
intervention
comparison
outcome

40
Q

what does the t in PICO(T) stand for

A

type of study design that would work best

41
Q

why is PICO used

A

to frame or answer a health related question

42
Q

when is data paired

A

if data are matched on criteria e.g. age/gender before comparing on either trial arm
if measurements are taken before and after an interventoin

43
Q

what does paired data analyse

A

within-pair differences

44
Q

parametric methods

A

e.g. t-test, analysis of variance (ANOVA)

make distribtuional assumptions eg. Normal
summarise data using means and sd

45
Q

parametric method for 3 or more independent groups

A

ANOVA

46
Q

what does ANOVA stand for

A

analysis of variance

47
Q

parametric methods for 3 or more dependant groups

A

paired test

repeat measures of ANOVA

48
Q

when do you use a non-parametric test

A

if variables are skewed
small sample size
if sd is different across groups
if the variables are more ordinal than quantitative

49
Q

when using non-parametric tests you should…

A

analyse the rank ordering in the data (not actual scores)
only provide p-values (not CIs)
compare entire distribution rather than just means

50
Q

how do you summarise non-parametric data

A

IQR

median

51
Q

non-parametric test for 2 independent groups

A

Mann Whitney

52
Q

non-parametric test for 2 paired groups

A

Wilcoxon signed-rank

53
Q

non-parametric for 3 or more independent groups

A

Kruskal Wallis

54
Q

non-parametric for 3 or more paired groups

A

Friedman

55
Q

advantages of non-parametric tests

A

they are always valid for quantitative data

parametric only valid if assumptions are satisfied

56
Q

disadvantages of non-parametric tests

A

no CIs
based only on analysis of ranks
no direct inferences about a parameter

57
Q

what defines a large sample sizw

A

sample greater than 50

58
Q

how do you calculate variance

A

SD squared

59
Q

how do you calculate whether the variances are ‘equal’

A

variance in one group should be no more than 4x the variance of the other group

60
Q

how can you compare CIs between groups

A

calculate a single CI for the difference between groups

61
Q

effect of proportion on odds

A

the higher the proportion the higher the odds

62
Q

how do you calculate proportion

A

no. of participants in category of interest / total no. of participants

63
Q

relationship between exposure variable and outcome variable

A

the exposure variable is the potential cause of the outcome variable

64
Q

tests for binary hypothesis testing

A

chi-squared (large samples)

fisher’s exact (small samples)

65
Q

risk difference of 0

A

no risk difference

groups equally likely to have the disease

66
Q

how do you calculate risk difference

A

proportion in group A - proportion in group B

67
Q

how do you calculate risk ratio

A

proportion in group A / proportion in group B

68
Q

what do risk ratio and odds ratio quantify

A

the strength of association between the intervention and binary variable

69
Q

risk ratio = 1

A

no difference in risk between two groups

70
Q

NNT stands for

A

number needed to treat

71
Q

how do you calculate NNT

A

1 / risk difference

72
Q

what is NNT

A

the number of people that need to receive intervention before 1 person benefits from it

73
Q

what is NNT better for

A

quantifying the impact of an intervention in a given population

74
Q

what does NNT do

A

measures the effectiveness of an intervention

based on risk difference

75
Q

what is correlation

A

the association between two variables

76
Q

graphical description of correlation

A

scatter plot
outcome = y-axis
predictor = x-axis

77
Q

numerical description of correlation

A

correlation coefficient
pearson’s = linear
spearmans = non-linear

78
Q

assumptions for spearmans correlation coefficient

A

non-linear correlation
e.g. curved line
must be ‘monotonic’ - either never -ve or never +ve
e.g. graph cannot be U-shaped

79
Q

if r squared = 1

A

then all the variation in one variable is explained by the other variable

80
Q

what is the predictor

A

the independent variable

the explanatory variable - potential cause of the outcome variable

81
Q

what is the least squares regression line

A

line that makes the vertical distance from the data points to the regression line as small as possible

82
Q

what is a residual (e)

A

the vertical distance between the observed data point and the regression line (predicted value)

83
Q

equation for calculating erros in prediction

A

outcome = a + b(predictor) + e

84
Q

are most biological variables are continuous?

A

yes

e.g. blood pressure

85
Q

why is it impossible to choose a cut-off line to correctly classify all subjects to a disease status

A

most distributions of diagnostic test scores will overlap

86
Q

what are the probability-based estimates of accuracy

A

specificity
sensitivity
PPV
NPV

87
Q

what factors affect sensitivity of a test

A

the severity of the disease

88
Q

assumption for sensitivty test

A

population shave similar disease severity

89
Q

what factors affects specificity of tests

A

if symptoms show on non-disease patients specificity is reduced

90
Q

what does PPV quantify

A

the likelihood that somebody has the disease based on the test result

91
Q

how does the prevalence of a disease affect the PPV

A

if a disease has a greater prevalence (is more common) then the PPV will increase