Statistics Flashcards
Association between two continuous variables
- Pearson’s correlation
(Parametric) - Spearman’s correlation
(Non-parametric)
What are correlation values
Values between -1 and 1
1 all points on uphill line
-1 all points on downhill line
0 either variables are independent or relationship is curved
Correlation is only valid within range of samples
What is pearsons correlation?
Also known as product moment correlation coefficient
Measures the degree of linear association between the values of the two variables
(DOESNT DEAL WITH CURVED RELATIONSHIPS)
Positive correlation, r>0 both variables increase
Negative correlation, r<0, one variable increases as the other decreases
R= +/- 1 would mean a graph of the two variables is a perfect straight line
R = 0 means there is no linear association
ASSUMPTIONS:
- at least 1 variable must have a normal distribution for p-value to be valid
- both variables must have a normal distribution for the confidence interval to be valid
It is orderly for both variables to have a normal distribution
What is
What is spearman’s correlation?
It measures general association rather than linear association
Non-parametric
Can be used on ordinal data
Outliers have a smaller effect
What are the cons of correlation?
Correlation does not imply causation
Correlation does not quantify how closely two measures agree
Multiple testing- calculating correlation for all pairs for 10 variables gives 45 correlation coefficients.
Sample size calculations
- understanding hypothesis testing
- understanding errors related to hypothesis testing
- power of a test
- sample size
Why is sample size important ?
- data observed from a single trial or experiment- how good of an estimate is it?
- could the observed difference be due to chance alone, or is there really a true difference between groups?
- larger studies have greater power to detect differences and estimate population parameters with greater precision
- many clinical trials (and other studies) are far too small
Too small or too large
- unethical
- time
- manpower
- risk
- cost
The aim of to have a large enough sample size to have a high probability (power) of detecting a clinically worthwhile effect if it exists
Factors to be considered for sample size
Size is dependent on the outcome type and number of groups for comparison, sampling variability and effect size.
It can be effected by the way in which the outcome is expressed
- response rate
- PFS/ TTP rate at 6-months
-OS rate after 5-years
- median PFS
- median OS
What we need:
Effect size
Variability
Chosen level of significance
Chosen power
Why is a power of a test needed?
The power can be used to compare two tests with the same (a) to see which is more powerful (better)
It can also be used to decide how large a sample size should be used
Generally, treats with larger sample sizes are more powerful