Week 4 - Measures of association Flashcards
What is in between exposure and outcome?
association
Measures of association
Measures of association with numeric outcome
What is mean difference?
- Mean difference refers to the comparison between two means (averages)
- Mean difference is a measure of association which assesses the presence of an association between a categorical exposure and a numeric outcome
In what situations are mean difference applicable?
- Comparing means between categories (groups) of a categorical variable (i.e. between independent groups of individuals): BETWEEN-SUBJECTS designs
- Comparing means in a single group of individuals in two or more different time points (e.g. before and after an intervention) or under different conditions: WITHIN-SUBJECTS or REPEATED MEASURES designs
Mean difference formula?
What statistical test provide a mean difference for between-subjects?
(i) Independent samples t-test in case our exposure variable (IV) has 2 categories and (
(ii) one-way Analysis of Variance (ANOVA) in case our exposure variable (IV) has >2 categories
ASSUME NORMALLY-DISTRIBUTED VARIABLES
Within-subjects mean difference formula?
What statistical tests are used for in-between mean difference?
(i) paired samples t-test in case our exposure variable (IV) has 2 categories and
(ii) repeated measures Analysis of Variance (repeated measures ANOVA) in case our exposure variable (IV) has >2 categories
What is important regarding within-subjects mean difference?
Within-subjects/repeated measures designs compare measurements in the same participants /same sources of variability (hence the term repeated measures)
* Repeated measurements can be collected at different time points, where change over time is assessed: this has been the emphasis so far.
* However, other within-subjects studies may compare the same participants under two or more different conditions
* For instance: comparing pain intensity for the same participants receiving different (over time) pharmacotherapies for pain relief (e.g. cross-over trials, see later in the course)
* Thus if same participants compared➔use paired t test or repeated measures ANOVA as per previous slides
What statistical test is used for independant groups (between-subjects IV) and repeated measures (within-subjects IV)?
mixed-design Analysis of Variance (mixed-design ANOVA)
What is the within-subjects factor?
Time-point
What is the between-subjects factor?
More than one group
How do we asses association between two numerical values?
WE DON’T USE MEAN
* Instead, we use a mathematical model (an equation) to predict a change in the outcome (Y) for a standard change in the exposure (X) (regression coefficient)
* We also quantify the strength of the association (as captured by this model) (correlation coefficient)
What are the 3 steps that are followed to fully investigate the association between 2 numeric variables?
- Derive a scatter plot
- Perform a correlation analysis
- Perform a linear regression analysis
What is a scatterplot?
- The relationship between any two numeric variables can be portrayed graphically
- Each individual (i) has a value for the exposure/independent variable (X) and a value for the outcome/dependent variable (Y)
- Thus, for each participant, we have (Xi, Yi)
- When the entire sample of participants is plotted in a 2-dimensional plot, the result is called scatter plot
What can scatterplots show us?
- Scatterplots can provide an overall (graphical) impression for the association between the 2 numeric variables of interest
- Scatterplots can reveal a trend for a direct (positive) association or an inverse (negative) association
What is direct and inverse association?
– direct (positive) association: as the exposure (X) increases, the outcome (Y) also increases
– inverse (negative) association: as the exposure (X) increases, the outcome (Y) decreases
What is correlation?
Correlation is a term usually used interchangeably with ‘association’; however ‘correlation’ more accurately refers to the association between numeric variables
What is the difference between positive and negative correlation?
- in situations where an increase in the exposure (X) leads to an increase in outcome (Y), we have a positive correlation
– in situations where an increase in the exposure (X) leads to a decrease in outcome (Y), we have a negative correlation
What is the correlation coefficient?
A measure of association that describes the strength of the correlation between 2 numeric variables is called the correlation coefficient (r)The correlation coefficient ranges between -1 to +1 and CANNOT take any value outside of this range
* The sign ( + or - ) indicates the direction of the association (i.e. positive or negative)
What are the three types of correlation coefficients?
correlation coefficient (r) = 1
=> perfect positive correlation
correlation coefficient (r) = -1
=> perfect negative correlation
correlation coefficient (r) = 0
=> no correlation
Strength of correlation?
r=0 graph
r=1 and r=-1 graph
0<r<1
-1<r<0
What is the strength of correlation examples?
r ≥ +/- 0.7 => strong correlation
r = +/- 0.5 to 0.7 => moderate correlation
r = +/- 0.3 to 0.5 => weak correlation
r < +/- 0.3 => very weak or no correlation
* Note: there is no hard rule about these cut-offs and the exact numbers to indicate the strength of the correlation can vary depending on the specific variables analyzed
What are the 2 main types of correlation?
- Pearson’s correlation is the most commonly used of the two and it denotes the correlation between 2 variables using the original values of these variables
- Spearman’s correlation (also called Spearman’s Rank correlation), denotes the correlation between 2 variables by first ranking the values (i.e. from lower to higher) and then assessing the correlation between the ranks
- Use Pearson’s correlation when 1) Continuous data 2) normally distributed 3) linear relationship
- Use Spearman’s when the above do not apply e.g. when data ordinal, not normally distributed, relationship not linear
What is linear regression?
- Linear regression is another statistical technique for assessing the association between 2 numeric variables
- Linear regression assesses the extent to which
an increase in one variable is associated with an increase in another variable - Linear regression goes ‘hand-in-hand’ with correlation and in fact the 2 techniques complement each other in giving a complete picture about the association between 2 numeric variables
- Linear regression operates by fitting a line-of-best-fit in a scatterplot using the least-squares method
What is the regression coefficient?
*The regression coefficient represents the estimated change (increase or decrease) in the Y variable for each 1 unit increase in the X variable
What is the formula for line of best fit?
Y=a+bX*Yꞌ is the predicted value of Y (predicted by X )
*The slope (beta or b) of the line is the regression coefficient
What does the sign of the regression coefficient mean?
Shows increase or decrease of Y when X is changed.
What is correlation coeffient?
how strong is the association?
What is the regression coefficient?
how much does a change in X predict a change in Y