Choosing statistics Flashcards
E-modules 2018/19
What is needed to test the hypothesis?
Choice of statistical test
Patient population/study sample selected allows for comparison (i.e. inclusion/exclusion criteria)
Patient outcome measures (i.e. variables)
When the hypothesis proposes a correlation, what are the possible stats tests based on the variables?
Discrete
- Chi-Square
Continuous
- Pearson (normally distributed)
- Spearman rank (not normally distributed)
When the hypothesis proposes a comparison between groups, what stats test do you use for discrete data?
Chi-Square
When the hypothesis proposes a comparison between groups, what stats test do you use for continuous, normally distributed data based on number of groups?
> 2 groups
- ANOVA (one variable)
2 groups
- paired t-test
- independent t-test
When the hypothesis proposes a comparison between groups, what stats test to you use for continuous, NOT normally distributed data based on number of groups?
> 2 groups
- Kruskal Wallis
2 groups
- Wilcoxon (paired)
- Mann Whitney (independent)
Which statistical analysis tests for differences?
Chi-square ANOVA T-tests Kruskal-Wallis Wilcoxon Mann-Whitney U-Test
*hypothesis proposes a comparison between groups
Which statistical analysis tests for similarities?
Chi-Square
Pearson
Spearman rank
*hypothesis proposes a correlation
What is quantitative data?
Numerical information about quantities
- MEASURED: information can be measured and have continuous dimensions (height, temperature, BP)
- COUNTED: information can be counted but not continuous (no. of children in family, no. of patients in clinic)
What is qualitative data?
Information about qualities, it can’t actually be measured
Deals with descriptive information such as free-text comments to open-ended question/response to interview
What is categorical data?
In-between quantitative and qualitative
- ORDINAL aspects can be easily converted into numerical data (i.e. scale on happiness can be given in numbers instead of words)
- NOMINAL aspect consists of individual terms rather than sentences like in qualitative data
Broadly compare quantitative, qualitative, and categorical data
Quantitative = when you measure something and give it a number value
Categorical = when you classify something
Qualitative = when you judge something
Compare discrete and continuous data
Discrete data; counted
- cannot be made more precise
- i.e. number of children
Continuous data; measured
- can be divided and reduced to finer and finer levels
- i.e. height of a person
Compare nominal and ordinal data
Nominal = items that are assigned individual named categories that do not have an implicit or natural value or rank
i.e. gender, fracture incidence
Ordinal = items which are assigned to categories that do have some kind of implicit or natural order
i.e. describe patients’ characteristics: stage of hypertension, pain level, and satisfaction
Broadly describe the mean and standard deviation
Mean is an average of the data
Standard deviation describes the width
What is normality?
It measures the central tendency and dispersion of data, and is used to decide how to describe the properties of large data-sets
Describe the relative mean, median, and mode for the following skews:
a) positive skew
b) symmetrical distribution
c) negative skew
a) mode > median > mean
b) mean = median = mode
c) mode < median < mean
- positive: >
- negative <
What is kurtosis?
Describes data that are heavy-tailed or light-tailed relative to a normal distribution
Compare high and low kurtosis
High kurtosis
- tend to have heavy tails, or outliers that create a very wide distribution
Low kurtosis
- ten to have light tails, or lack of outliers that create a very narrow distribution
What statistical tests are used to test for normality?
Shapiro-Wilks test
- small sample size (n<50)
Kolmogorov-Smirnov
- large sample size (n>50)
What is a descriptive statistics?
Mean, mode or median
- used to categorise large data-sets into a tangible format
Compare range, variance and standard deviation
Range
- measures how fat a set of numbers are spread out from their average value
Variance
- measure of the spread of the numbers away from the mean value
Standard deviation
- measure the spread of a set of data
Compare IQR, standard error of mean, and confidence intervals
Interquartile range
- UQ - LQ
Standard error of mean
- measures how well the sample mean approximates to the population mean
Confidence intervals
- range of values in which true mean value might be found
How do you determine whether groups are paired or independent?
Look at whether each group is composed of the same subjects of interest or if they are different
Paired = two data-sets come from the same individual
- measure same variable in same subject at different time points (longitudinal study)
Independent = two data-sets from different individuals
- comparing two groups with no common factors (cross-sectional study)
Compare when to use parametric and non-parametric statistics
Parametric = normally distributed
Non-parametric = not
Name parametric tests
Paired/independent t-tests
ANOVA
Name non-parametric tests
Wilcoxon Signed Rank
Mann-Whitney U
Friedman (non-parametric equivalent of repeated measures one-way ANOVA)
Kruskal-Wallis
When would you use the different t-tests?
Paired: different variables are compared with the same sample
Independent: same variable is compared by from different samples
What does a one-way ANOVA tell you?
Used to compare the means from more than two samples with a normal distribution and will only tell you if a difference exists between your samples
Further stat tests (i.e. post hoc test) are needed to calculate exactly where the difference is
What can the Pearson correlation coefficient tell you about correlation?
How strong the relationship is
Varies between -1 to +1 (from perfect negative to perfect positive correlation)
Approximately what are the r-values for the following correlations:
a) very low,
b) low,
c) reasonable,
d) high,
e) very high?
a) 0.0-0.2
b) 0.2-0.4
c) 0.4-0.6
d) 0.6-0.8
e) 0.8-1.0
*can be +/-
What is the r^2-value from a Pearson correlation?
Represents how closely your data is fitted to the correlation line
The higher the value, the more reliable your conclusion can be
Compare correlation and regression
Correlation = indicates the strength of the relationship between two variables
Regression = quantifies the association between the two variables, i.e. tells us the impact that changing one variable will have on the other variable
How is regression defined?
gcse
y = a + bx
a = the y-axis intercept value b = the gradient of the line, i.e. the regression coefficient
What does the chi square test measure?
It is a measure of the differences between observed and expected frequencies
Represented as X/X^2
What does it mean when X^2 = 0?
The observed and expected frequencies are the same
What does it mean the higher the X^2 value?
The bigger the difference between the observed and expected frequencies
How can the size of a study affect the p-value?
Very small studies with few samples might not return a reliable p-value
Very large studies with many samples might be over powered and find a significant difference where none exists
What is a type I error?
Incorrectly reject the null hypothesis when it is true (significance level, a-value)
False positive
What is a type II error?
Incorrectly fail to reject the null hypothesis when it is false
False negative
*the greater the power of the test, the lower type II error rate (power = 1-beta; the closer the power is to 1, the better the test is at detecting a false null hypothesis)