Statistics Flashcards
What is quantitative data?
Numerical data
- Discrete (whole number)- eg number of children
- Continuous (usually a measurement)
Give an example of nominal data?
Blood group, gender
Group that contains no logical order
Type of categorical data
Name types of qualitative data?
Categorical data
- Nominal- contains no logical order
- ordinal- categories have a natural order.
If data is negatively skewed, what is the order of the mean, median and mode (from low to high/ L->R).
Mean, Median, Mode
Peak of graph further to right
If data is positively skewed (right skewed) then what order is the mean, median and mode (from left _>R)
Mode, median, mean
What is the range?
Maximum - minimum.
Poor measure of spread as affected by outliers and dependent on sample size
What is the inter-quartile range?
upper quartile - lower quartile
Better than range as not influenced by outliers
3 measures- lower quartile, median and upper quartile
What is variance?
Calculate deviations = difference between each observation and the mean of the data.
Square these deviations so negatives become positive
Average the squared deviations by dividing by n-1 (lose a degree of freedom, the mean has already been included)
Square root of the variance = standard deviation
Influenced by outliers.
How do you calculate standard deviation from variance
Square root of variance
What would you use to summarise symmetrical data?
Mean
Standard deviation
What would you use to summarise skewed data?
Median
Interquartile range
How would you summarise categorical data?
Use number (%)
What information does a box and whisker plot give you>
Median
IQ range
Range
How can you summarise categorical data in a chart?
Pie chart
Bar chart
What is the mean and SD in a normal distribution data set?
Mean = 0 SD= 1
What is the reference range and when can it be used and what does it measure?
Used in NORMAL DISTRIBUTION
Mean +/- 1.96 SD = often rounded to +/- 2SD = 95% data
Measure of spread of the data
In normal distribution data how much of data is included in mean +/- 1SD, +/- 2SD and +/- 3SD?
Mean +/- 1 SD = 68% data included
Mean +/- 2SD = 95% data included = reference range
Mean +/- 3SD = 99% data included
What is the difference between the 95% reference range and 95% confidence interval?
95% reference range (or normal range)
- Mean +/- 2SD
- Measures SPREAD of data
95% confidence interval
- mean +/- 2 standard errors
- Measures the ACCURACY of a sample estimate (95% probability that the interval contains true population value)
How can you make positively skewed data more symmetric?
Calculate
- Log (x)
- 1/x
- square root x
More difficult with negatively skewed date
How can you check if a data set is normally distributed?
- By eye - draw a histogram
- test for normality eg Kolmogorov-Smirnov test or Shapiro-Wilk test
If p <0.05 conclude not normal
If p>0.05 no evidence against normal
but small samples will have insufficient power to detect deviations from normality and for large samples normality usually less important
What is bias and how can you avoid it?
Bias: when the sample is selected in such a way that even with a very large sample you will not get the true answer
Avoid with a random sample
What is precision?
A sample estimate is precise if different samples of the same size, selected in the same way would give answers which are close together
WHat is a distribution defined by:
- centre (mean)
- Spread (SD)
- Shape (i.e. normally distributed)
When will sample means be normally distributed?
- the underlying data is normally distributed
- the samples are large (in which case does not matter if the data are normal or not - Central limit theorem)