2 (1) Statistics Flashcards
What do we use statistics for?
Describing data, applying normative data to clinical practice, looking for associations, seeing whether variables are similar or different and if this is down to chance.
Pros and cons of using statistics
Concise - can filter down info into numbers
Generalisation of findings to wider population
Numbers remove context and meaning
Still need qualitative data, as numbers can’t convey subtle differences
Types of variables
Got category/scale
And within scale (ordinal versus interval/ratio)
- discrete or nominal or categorical
- ordinal variables
- Continuous or scale variables
Discrete/nominal/categorical
Classify data into categories (e.g. gender)
Example: Y or N.
Ordinal variables
Order matters but not the actual differences between the numbers
Examples self rating scales
continuous/scale variables
Values are along the scale. There is order; differences in magnitude.
Example: age, income, grades
How would you describe scale variables?
Data distribution: Normal and skewed distribution.
In what case could we put the distribution of the data in histogram
If the date is continuous and we have enough data points.
What is normal distribution?
After plotting frequencies on a histogram, we can get a symmetrical bell-like curve. This is known as normal distribution. The largest portion cluster in the middle.
The relevant values are mean and standard to you soon.
What’s skewed distribution?
Distribution (on histogram) is not symmetrical.
The relevant values are the median and range.
Skewness value must be above +1 and below -1.
Positive versus negative skew.
Positive skew: most people score in the lower range. Mean>median.
Negative skew: most people score in the high range.
Median>mean.
Bimodal distribution
Two or more central clusters
What are the measures of central tendency?
- mean
- median
- mode
Mean
Average score obtained by adding all the scores and dividing the number of cases
Median
Results are put in numerical order and the middle value is found. It is less affected by extreme scores.
Mode
Most freq occurring number
How can mean median and mode be affected?
More normal the distribution, the closer the three measure of central tendency are.
Mean is sensitive to the range of data and outliers.
In normal distribution: mean is good descriptor of data.
Skewed distribution: median better descriptor
Range and outlier
Range: highest and lowest score; minimum and max
Outlier: value in data that is markedly different to the others
Aside from using mean, median, mode, what’s another way of looking at the spread of data?
Calculate the difference between each score and mean.
Variance: the average of the squared differences from the mean
Standard deviation: measure of how spread out the date is = the square root of variance.
What’s standard deviation and how is it reported?
It’s a measure of how spread out the data is.
Always reported with the mean so MEAN(SD)
When SDs overlap, we are less confident in the results.
SDs and percentages
NORMAL, 0-1 SD➡️ 68% (34% above and below the mean)
+/-1 to 2 SD ➡️ 14%
2-3 SD ➡️ 2%
What’s interquartile range?
It reports only 50% of the data range. Located in the middle of distribution.
4 quartiles. Median is in the middle of the IQR.
What’s standard error?
It’s a measure of how accurate an estimate of the population mean our sample mean is.
- 95% confidence ➡️ 95% pop. fall in range (5% standard error)
The smaller the SE, the better the sample mean is an estimation of the population mean.
How would we describe/report categorical data?
Reporting the frequency of cases of each category; reporting the percentage frequency of cases in each category.
Plot discrete data on pie or bar charts.