Statistics Flashcards
What does an ANOVA assume
- variable is Normally distributed in each group in the population (or sample size is large and variable not too skewed)
- standard deviation is similar across groups
- participants (observations) are independent across groups – i.e., NOT paired/matched
If proportion = 1. Odds = ?
Infinity
What is meant by ‘interquartile range’
The interquartile range spans the values between the lower quartile (25th percentile) and the upper quartile (75th percentile), that is the middle 50% of observations.
The interquartile range is used to quantify variation (dispersion) or the amount of spread of the scores.
Describe a Mann-Whitney Test
A non-parametric test for comparing a quantitative variable between two independent groups. Provides an IQR for each group, and p-value.
what is a ‘p-value’
The p-value is used to quantify extent to which the sample estimate contradicts the null hypothesis. P-value can take values between 0 and 1.
What is a Paired T-Test
Confidence interval & hypothesis test for mean difference between two paired groups.
What is meant by ‘median’
The median (also referred to as the 50th percentile) is the value below which 50% of the observations lie (and above which 50% of the observations lie). It quantifies average (or centrality) in the data.
If R2 = 1 then…
all the variation is explained
In a linear regression, what is ‘b’
the slope
How would you calculate the relative risk
the risk in one group divided by the risk in the other
What does a paired t-test assume
They assume within-pair differences on variable are Normally distributed (or sample size is large and within-pair differences are not too skewed).
If proportion = 0.5. Odds = ?
1
Give an example of when participants might be matched/paired
- participants paired on some criteria (e.g., gender, age) before randomly allocating one member of each pair to each of two trial arms under comparison
- measurements taken before and after an intervention is administered on all study participants; compare before (control) and after (intervention) conditions
How would you calculate the lower bound of range?
mean – 1.96 x standard deviation
What are non-parametric methods
They analyse the rank ordering in the data rather than the actual scores themselves.
They do not compare the mean between groups, rather they compare the entire distribution, and only provide p-values, not confidence intervals.
When would Fishers’ test be used instead of Chi-squared
– fewer than 20 participants or
– between 20 and 39 participants and the expected value in at least one cell is less than 5
Give two parametric methods for comparing groups
ANOVA and T test
Give two ways correlation can be summarised i.e. graphically and numerically
- graphically: using scatterplots
* numerically: using correlation coefficients
True or False:
if r > 0, as one variable increases the other increases
true
Define ‘true negative’
do not have the disease and correctly test negative
Describe the CHi-squared test
A parametric method that makes distributional assumptions for contingency tables
How would you calculate NPV
TN/(FN+TN)
How would you calculate the risk
It is calculated by dividing the number of people who have the disease by the total number of people.
How would you calculate PPV
TP/(TP + FP)
What does a Box and Whiskers plot show?
Graph indicates: • median • lower quartile • upper quartile • range that contains most values • outliers – extreme observations with very low or very high values
What does a Two-Sample Unpaired T-Test assume
- Variable is normally distributed in each group in the population (or the sample size is large and the variable is not too skewed)
- Standard deviation is similar in the two groups
- Participants (observations) are independent between groups – i.e., NOT paired
In a scatterplot, which axis is the outcome variable plotted on?
y axis/vertical
Describe least squares estimation
minimises the sum of the (vertical) squared distances between actual outcome scores and the line – line of best fit
What is meant by standard error?
The precision with which the true population parameter (the mean) is estimated.
The smaller the standard error the more precise the sample estimate is of the true mean.
Describe a Wilcoxon Signed Ranks Test
A non-parametric test for comparing a quantitative variable between two paired groups. Provides an IQR for each group, and p-value.
define ‘null hypothesis’
the most boring truth imaginable, not necessarily what you think the truth is.
Define ‘correlation coefficient’
quantify the strength of association between two variables
How is proportion calculated
number of participants in a category/total number of participants
Describe a bar chart
graph where the heights of rectangular bars are used to indicate the number (or proportion or percentage) of participants that are in each category
Describe a histogram
Graph where the heights of rectangular bars (or bins) are used to indicate the (relative) frequency with which values in specific ranges occur.
Unlike bar charts (which are used for categorical data) they have no gaps between the bins.
What is a repeated measures ANOVA
Hypothesis test for comparing the mean across three or more paired (matched) groups. It provides a global p-value comparing the mean across all groups
True or False: In linear regression the predictor is often assumed to be a potential cause of the outcome
True
Define ‘correlation’
the association between two variables
How is number needed to treat calculated
1/risk difference
Describe a scatterplot
graph used to summarise the relationship between two quantitative variables on two axes. Each participant is represented on the scatterplot using a symbol such as a dot (●) or cross (×).
The position of the dot on the vertical axis (y axis) indicates the score on one variable and the position on the horizontal axis (x axis) indicates the score on the other variable.