8. Medical Statistics 2 Flashcards
what is BIVARIATE ANALYSIS
ANALYSIS of the RELATIONSHIP between 2 Variables:
RESPONSE variable and EXPLANATORY variable
statistical methods analyse how the outcome on the RESPONSE VARIABLE DEPENDS ON or is EXPLAINED BY the value of the EXPLANATORY VARIABLE
- needed to formally investigate whether there are meaningful differences and not a result of random chance
what is the RESPONSE VARIABLE
OUTCOME or DEPENDENT
the one on which COMPARISONS ARE MADE
what is the EXPLANATORY VARIABLE
INDEPENDENT or EXPOSURE
usually DEFINES the 2 GROUPS being COMPARED
STEPS in HYPOTHESIS TESTING
- STATE the NULL and ALTERNATIVE HYPOTHESES
- DECIDE what STATISTICAL TEST is appropriate
- Use the test to CALCULATE the P-VALUE
- WEIGH the EVIDENCE AGAINST the NULL
what type of TEST is used for 2 NUMERICAL VARIABLES
CORRELATION / REGRESSION
- Correlation (2 sided association)
- Simple Linear Regression (one-sided association)
what type of TEST is used for 2 CATEGORICAL VARIABLES
CHI-SQUARED TESTS
- chi-squares test (unpaired)
- McNemar test (paired)
what type of TEST is used for 1 CATEGORICAL and 1 NUMERICAL VARIABLE
if 2 GROUPS: T-TEST (paired and unpaired)
if >2 GROUPS: ANOVA (unpaired) & ANOVA for repeated measures (paired)
what are DEPENDENT SAMPLES
when the research hypothesis involves comparing the SAME PEOPLE who were measured twice or more often
(PAIRED DATA)
eg. a diet study in which subjects’ weights are measured before and after the diet.
the observation in the first (before diet) and second (after diet) samples are related because they refer to the same person
Analysis of DEPENDENT SAMPLES requires use of the paired or unpaired version of the statistical tests
PAIRED VERSION
what are INDEPENDENT SAMPLES
and is the paired or unpaired version of the respective statistical test used
involve comparison of 2 or more groups who are INDEPENDENT from each other
- DIFFERENT INDIVIDUALS
eg. randomised trial that randomly allocates subjects to 2 treatments
eg. observational study that separates subjects into groups according to their value for an explanatory variable (ie smoking status)
the UNPAIRED VERSION of the statistical test is used
CORRELATION / REGRESSION
in SCATTERPLOT what is on the HORIZONTAL and VERTICAL AXIS
Horizontal, x : PREDICTOR VARIABLE
Vertical, y : RESPONSE VARIABLE
CORRELATION / REGRESSION
what to look for in SCATTERPLOTS?
DIRECTION of the relationship
- NEGATIVE: as one goes up other goes down
- POSITIVE: as one goes up other also goes up
FORM of the relationship
- LINEAR? or not
STRENGTH of the relationship
- points appear tightly clustered in a single stream or form a vague cloud?
OUTLIERS
CORRELATION / REGRESSION
what measures the STRENGTH of the LINEAR ASSOCIATION between 2 NUMERICAL VARIABLES
CORRELATION COEFFICIENT (r)
r is always between -1 and +1
r>0 POSITIVE correlation
r<0 NEGATIVE correlation
r=0 NO correlation
CORRELATION / REGRESSION
do OUTLIERS AFFECT CORRELATION
Correlation is VERY SENSITIVE to OUTLIERS
an extreme outlier can cause a dramatic change in r
CORRELATION / REGRESSION
what is COEFFICIENT of DETERMINATION (r^2)
expresses the PROPORTION of the VARIANCE in one variable that is ACCOUNTED FOR or ‘EXPLAINED’ by the variance in the other variable
- square of r
eg. a study finds an r=0.40 between salt intake and blood pressure.
It can be concluded that 0.40^2 = 0.16
or 16% of the variance in blood pressure in this study is accounted for by salt intake
CORRELATION / REGRESSION
what is SIMPLE LINEAR REGRESSION
we model the relationship between 2 QUANTITATIVE variables in such a way that we can PREDICT ONE VARIABLE FROM ANOTHER
it is an APPROXIMATION for the TRUE RELATIONSHIP
the 1ST STEP is to IDENTIFY the RESPONSE and PREDICTOR VARIABLE
RESPONSE VARIABLE (outcome or dependent)
- y variable, on vertical axis of scatterplot
PREDICTOR VARIABLE (explanatory or independent or exposure)
- x variable, on horizontal axis of scatterplot
CORRELATION / REGRESSION
how do REGRESSION and CORRELATION DIFFER
- in REGRESSION, we MUST IDENTIFY RESPONSE and EXPLANATORY VARIABLES
- CORRELATION does NOT REQUIRE one variable to be DESIGNATED as RESPONSE and the other as PREDICTOR
CORRELATION / REGRESSION
LINE OF BEST FIT equation with REGRESSION COEFICIENTS B0 and B1
y = b0 + b1x
(y=mx+c)
b0 = INTERCEPT (where line cuts the y axis. value of y when x=0)
b1 = SLOPE (gradient, the change in y for every 1 unit increase in x)
CORRELATION / REGRESSION
what does the SLOPE (how much y changes when x increases by 1 unit) DEPEND ON
the UNITS used to measure the variables
- we can make the slope as large/small as we want by changing the units
CORRELATION / REGRESSION
what does the SLOPE tell us about ASSOCIATION STRENGTH
it DOES NOT TELL US whether the association is strong or weak
CORRELATION / REGRESSION
does CORRELATION depend on UNITS
NO
the correlation is a standardized version of the slope
CORRELATION / REGRESSION
when using a REGRESSION model for PREDICTION where can you predict
ONLY WITHIN relevant range of data
- do not try to extrapolate beyond the range of observed X’s
CORRELATION / REGRESSION
what does a STRONG CORRELATION between x and y mean
that there is a STRONG LINEAR ASSOCIATION between the 2 variables
- do not say that increasing x by one unit ‘causes’ or ‘results in’ a corresponding change in y
what does CHI-SQUARED TEST (X^2) ASSESS
WHETHER 2 CATEGORICAL VARIABLES are ASSOCIATED
- indicates HOW CERTAIN we can be that the VARIABLES are ASSOCIATED (NOT how strong the association is)
it compares FREQUENCIES - OBSERVED vs EXPECTED under the null hypothesis that the variables are independent
measures HOW FAR the OBSERVED cell counts in a contingency table FALL FROM the EXPECTED cell counts (for a null hypothesis)
(assumes expected counts more than or equal to 5 in all cells)
CHI SQUARED TEST (X^2) EQUATION
(observed count - expected count)^2
X^2 = sum ————————————————————
expected count
what is the MCNEMAR CHI-SQUARED TEST
the equivalent of chi-squared test when we want to COMPARE BEFORE and AFTER findings
- PAIRED DATA
what is a T-TEST (UNPAIRED or INDEPENDENT samples)
ASSESSES WHETHER a NUMERICAL RESPONSE VARIABLE DIFFERS between 2 GROUPS (of different individuals)
COMPARES the MEANS and STANDARD DEVIATIONS of 2 groups
- LIMITED to comparing ONLY 2 GROUPS
T-Test is a PARAMETRIC test and ASSUMES the NUMERICAL VARIBALE is NORMALLY DISTRIBUTED and has EQUAL VARIANCE in both groups
when is a PAIRED T-TEST used
where the SAME GROUP contributes to REPEATED OBSERVATIONS
(paired data)
- SAME INDIVIDUALS
what is an ANOVA (ANALYSIS OF VARIANCE) TEST
when testing for SIGNIFICANT DIFFERENCE of a CONTINUOUS VARIABLE between 3 OR MORE GROUPS
(t-test limited to only 2 groups)
- only informs us whether there is an OVERALL DIFFERENCE between the MEANS
Does NOT give us specific information about which groups are different - PARAMETRIC TEST ASSUMING NORMAL DISTRIBUTION and EQUAL VARIANCES
- REPEATED MEASURES ANOVA test is performed when the SAME GROUP has contributed to REPEATED OBSERVATIONS (PAIRED DATA)