E-module 2 - Choosing statistics Flashcards
Which statistical tests are used when the hypothesis proposes a correlation between continuous variables:
- with a normal distribution?
- without a normal distribution?
Hypothesis proposes correlation between continuous variables:
- normal distribution: Pearson
- not normal: Spearman rank
Test for comparison between 2 groups with continuous variables with a normal distribution?
- Paired vs unpaired data
Paired: paired t-Test
Unpaired: independent t-Test
Which statistical test is used when the study uses discrete variables?
Chi squared
Test for comparison between more than 2 groups with continuous variables with a normal distribution with:
- one variable?
- multivariate?
- One variable: ANOVA
- Multivariate: Consult book
Test for comparison between more than 2 groups with continuous variables without a normal distribution?
Kruskal Wallis
Test for comparison between 2 groups with continuous variables without a normal distribution with:
- paired variables?
- independent variables?
Paired: Wilcoxon
Independent: Mann Whitney
What are the 2 types of analysis can be used to test the hypothesis?
- Correlations i.e. hypothesis tests to evaluate relationships between variables
- Comparisons i.e. hypothesis tests to evaluate differences between groups or populations
What are the different types of qualitative data and give examples of each?
- nominal (unordered) e.g. gender, life status (alive/dead)
- ordinal (ordered) e.g. fitness, stages of hypertension
both are non-parametric
What are the different types of quantitative data and give examples of each?
- continuous (parametric) e.g. heart rate, age
- discrete (non-parametric) e.g. no. of males/females in a group, no of people with hypertension
Define discrete data
Discrete data is of a count that cannot be made more precise e.g. a family cannot have 2.4 children
Define continuous data
Continuous data can take any value between a range so it can be divided and reduced to finer and finer levels e.g. can measure height in progressively more precise scales: meters, centimetres, millimetres etc.
Give an example of a variable that could be measured quantitatively or qualitatively
Eye colour can be measured quantitatively by assessing the RGB scale or qualitatively by categorising into blue, brown or green etc.
Give an example of a variable that could be interpreted as discrete or continuous
Age is a discrete variable if going by the number of years and continuous if looking for the exact age in months, days, hours minutes or seconds.
Define nominal data
Items that are assigned individual named categories that do not have an implicit or natural value or rank. e.g. gender (male or female) or fracture incidence (yes or no).
Define ordinal data
Items which are assigned to categories that have some implicit or natural order, such as ‘small, medium, or large’.
Ordinal variables are often used to describe a patient’s characteristics e.g. stage of hypertension, pain level, and satisfaction.
Define a normal distribution
Normality measures the central tendency and dispersion of data and is used to decide how to describe the properties of large data-sets i.e. the descriptive statistics which are presented instead of the raw data.
How can you determine whether distribution is normal?
By graphing data in a histogram (frequency distribution plot of the data points from a group or population) or a frequency bar chart
Describe a normal curve
Symmetrical distribution with well-behaved tails i.e. many data points at the central region of the range and a symmetrical disruption either side.
Also called ‘Gaussian’ or ‘bell curved’
Define skewed data
- left and right?
Asymmetric with many data points in the high or low end of the range and an uneven tail (long on one side and short on the other).
- left-skewed distribution = negatively skewed -> long left tail and mean to left of peak
- right-skewed distribution = positively skewed -> long right tail and mean to right of peak
What is a kurtosis?
A kurtosis describes data are heavy-tailed or light-tailed relative to a normal distribution.
- high kurtosis = heavy tails, or outliers that create a very wide distribution
- low kurtosis = light tails, or lack of outliers that create a very narrow distribution.
Why is the distribution of data important for statistical analysis?
Mathematics underpinning most statistical tests rely on the data having a normal distribution (i.e. two-thirds of data is within one standard deviation of the mean), and that the distribution is symmetrical (i.e. 50% of data is above the mean average and 50% is below).
What are the statistical tests of normality?
- Shapiro-Wilks test: used to test for normality with small sample sizes (n<50)
- Kolmogorov-Smirnov: used to test for normality with large sample sizes (n>50)
n= no. of samples in data set
p-value <0.05 means data is not normally distributed
What are descriptive statistics?
Raw data is usually presented in the form of descriptive statistics which summarise the data. They categorise large data-sets into a tangible format.
- measure of central tendency - mean, mode or median.
- measures of dispersion of the data - variance, standard deviation (SD) or standard error (also known as the standard error of the mean, SEM)
Explain the difference between paired and unpaired data
Paired data is dependent and occurs when each group is composed of the same subjects of interest. Typically, paired observations arise from measuring the same variable in the same subject at different time-points i.e. longitudinal experiment
Unpaired data is independent. Each group is composed of the different subjects of interest. These observations are seen when comparing two groups with no common factors i.e. cross-sectional study
What is a correlation coefficient?
A correlation coefficient tells you how strong the relationship is. It varies from 1 (perfect positive correlation) to -1 (perfect negative correlation).
e.g. Pearson’s r
Spearman’s rho (p)
When are parametric and non -parametric tests used?
Parametric statistics (e.g. t-test, ANOVA) are used when the data is well described by the mean and standard deviation i.e. quantitative data which is normally distributed.
Non-parametric tests (e.g. Mann-Whitney, Wilcoxon signed rank test) are adopted when the population is not well described by the mean and standard deviation. i.e. when quantitative data is not normally distributed or when data is qualitative.
Why are parametric statistics better than non-parametric?
Parametric tests are easier to understand, and the analyses are more powerful, and less likely the incorrectly reject or fail to reject a hypothesis.
What is the most appropriate measure of dispersion for normally distributed data: mean, median or mode?
Mean
Median and mode should only be used for data that is not normally distributed
Mode is very rarely used