Interpreting Data Flashcards
How do you calculate standard deviation?
Take the mean - Distance from the mean, squared
When would using the median result be more appropriate? [1]
Better to use median when have a skewed distribution, can avoid the influence of outliers
When would you use IQR vs STD? [1]
When wide outliers / skewed distribution: Better to use IQR in this case, to avoid the influence of outliers
What value is changing between the different colours in this Gaussain Distribution?
mean value
What value is changing between the different colours in this Gaussain Distribution?
Standard deviation
What is the standard deviation measuring? [1]
It is essentially calculating the average distance from the mean, and therefore the measure of spread of the results you have obtained. This is shown visually on the image here.
What are the references ranges are for 95%, 99% and 90% range for STD?
learn
What are the references ranges are for 95%, 99% and 90% range for STD?
learn
99% range (0.5th to 99.5th centile) = mean ± 2.58 SDs
95% range (2.5th to 97.5th centile) = mean ± 1.96 SDs
90% range (5th to 95th centile) = mean ± 1.64 SDs
Repeated sampling from a population
If the sample size isn’t too small then the distribution of the sample mean will be []
If the sample size isn’t too small then the distribution of the sample mean will be Gaussian
What is the STD of Gaussian distribution called? [1]
standard error
How do you calculate standard error of the mean? [1]
What would the standard error of the following be?
n=163
mean=22
standard deviation=4
How do you calculate the 95% confidence interval (CI) of a sample mean?
95% CI = sample mean ± 1.96 × standard error
key !!
What do the results from condfidence interval for the mean mean?
E.g. if 21.4-22.6
If results is 21.4 0 226:
We would expect 95% of samples of the same size to have a mean BMI between 21.4 and 22.6
95% confidence interval
Which is the correct definition?
In the population we are 95% sure that the mean weight could be as low as 75kg or as high as 81kg
In this study 95% of men weighed between 75kg and 81kg
Which is the correct definition?
In the population we are 95% sure that the mean weight could be as low as 75kg or as high as 81kg
In this study 95% of men weighed between 75kg and 81kg
What is the difference between reference range and CI? [1]
The reference range is defined as the interval between which 95% of values of a reference population fall into, in such a way that 2.5% will be below and 2.5% will be above.
Confidence interval is more concerned with the idea that the mean could be as low as/high as the two values given.
What is difference in values used to calculate STD and SE? [2]
Use standard deviation for ranges (for individual values)
Use standard error for confidence intervals (for means)
What is correlation (pearson) score for the following?
Perfect positive correlation: r=1
Perfect negative correlation r=-1
No correlation = r=0
If the data do not follow a Gaussian distribution or there is a non-linear monotonic relationship [] correlation can be used
If the data do not follow a Gaussian distribution or there is a non-linear monotonic relationship Spearman’s rank correlation can be used
x
Pearson’s correlation and linear regression describe which type of associations? [1]
Pearson’s correlation and linear regression only describe linear associations
Different regression curves can be fitted where the data pattern is
What type of regression modal would you use for the following?
Polynomial – quadratic (y=b0+b1x+b2x2)
Different regression curves can be fitted where the data pattern is
What type of regression modal would you use for the following?
Polynomial – quadratic (y=b0+b1x+b2x2), cubic (y=b0+b1x+b2x2+b3x3) etc
Different regression curves can be fitted where the data pattern is
What type of regression modal would you use for the following?
Exponential growth (y=bx;b>1)
Different regression curves can be fitted where the data pattern is
What type of regression modal would you use for the following?
exponential decay (y=bx;b<1)
Different regression curves can be fitted where the data pattern is
What type of regression modal would you use for the following?
Sigmoid
Multivariate linear regression
State if is dependent or independent on variables
What is meant by degrees of freedom? [1]
How do you calculate degress of freedom? [1]
e.g In the example we have 150 girls and 156 boys, what is the degrees of freedom? [1]
These are the number of values that are free to vary
In the example we have 150 girls and 156 boys, so there are 149+155=304 degrees of freedom.
If the 95% CI for a difference excludes 0 then p []?
If the 95% CI for a difference includes 0 then p []?
If the 95% CI for a difference excludes 0 then p < 0.05
If the 95% CI for a difference includes 0 then p ≥ 0.05
In a study, a group of patients took statins and another group placebo. The mean difference in LDL cholesterol was 1 mmol/L:
The 95% CI was 0.2 to 1.8
The 99% CI was -0.1 to 2.1
Which is correct?
P-value is less than 0.01
P-value is less than 0.05 but greater than 0.01
P-value is greater than 0.05
In a study, a group of patients took statins and another group placebo. The mean difference in LDL cholesterol was 1 mmol/L:
The 95% CI was 0.2 to 1.8
The 99% CI was -0.1 to 2.1
Which is correct?
P-value is less than 0.01
P-value is less than 0.05 but greater than 0.01
P-value is greater than 0.05
What analysis could you use where two measurements are made on the same group of people? [1]
Paired t-test:
Calculate individual differences – is the mean of those different to 0?
T-tests assume data follows a Gaussian distribution.
If the data doesn’t follow Gaussain distribution what tests would you perform if:
Instead of 2-sample t-test? [1]
Instead of paired t-test? [1]
Instead of 2-sample t-test do Mann Whitney U-test (Wilcoxon rank sum test)
Instead of paired t-test do Wilcoxon signed rank test
For example, in the free thyroxine samples in pregnant women shown above, the mean is 14 and the standard deviation is 1.7. Calculate the 95% reference range [1]
The 95% reference range is 14 +/- 1.96x1.7 = 10.7 to 17.3.
The observed middle 95% of women in this sample is 10.9-17.5. This therefore means 2.5% of women in the sample had a level of thyroxine less than 10.9 and 2.5% had a level of thyroxine greater than 17.5.
What is linear regression used for? [1]
usedto predict the value of a variable based on the value of another variable.
How do you calculate linear regression?
Y = a + bx.
Y = outcome (dependent variable)
Measured/affected during experiment
X = predictor (independent variable).
Factor changed during experiment.
B = slope (Y2-Y1/X2-X1).
A = intercept.
NOTE: this only works for linear associations.
How would you interpret this linear regression? [1]
UPDRS decreases by 0.7 for every unit increase in kinesia score
How would you interpret this linear regression? [1]
UPDRS decreases by 0.7 for every unit increase in kinesia score
How can toy adapt linear regression to for multiple factors? [1]
Independent variables can be continuous, categorical or a mix
Example: kinesia score according to age and sex
B
What is being predicted should always be on the vertical axis
What do you use a T-test and chi-squared test for? [2]
T-test cans be done for comparing continuous variables
Chi-squared test for comparing categorical variables
What is the link between CIs and p values?
It is important to note the link between CIs and p-values. In general, we say that CIs and p-values are consistent: id the 95% CI for a difference excludes zero then the p value will be less than 0.05 (statistically significant)< if the 95% CI contains zero then the p-value will be greater than 0.05 (not statistically significant).
When do you use Pearson vs Spearman correlation? [2]
Pearson correlation evaluates the linear relationship between two continuous variables.
Spearman correlation: Spearman correlation evaluates the monotonic relationship.