descriptive statistics Flashcards
categoric variable
individual fall into one of several categories
binary: 2 categories e.g. yes/no
ordinal:>2 categories, with a natural ordering e.g. low/medium/high
Nominal: >2 categories but no ordering e.g. hair colour
numerical variables
variables measured on a numeric scale
discrete: there is a distinct number of values e.g. years in age
continuous: any value within a particular range e.g. blood pressure
descriptive statistics (categorical)
- Probability/ proportion = the number with outcome/ the total number (scale 0 to 1)
- Percentage = (proportion)*100 (scale 0 to 100)
- Rate= the number of times something happens per a quantifier (x per 100 people) (scale 0 to infinity)
- Odds = the number with the outcome/the number without the outcome (scale 0 to infinity)
quantifying differences
• Not sufficient to say ‘one looks more effective’, want to quantify that measure
risk ratio (RR)
- Divide one probability/percentage by the other
- Whichever group goes on the top is the focus
- When we divide two numbers together, there are 3 potential outcomes (>1,=1,<1)
odds ratio (OR)
- Divide the odds in one group by the other
* Same rules apply as RR
odds
Odds = probability/ 1-probability
standard deviation (SD)
- Calculate difference between each value and the mean
- Square those values to make them all positive
- Add all those squared differences together
- Divide by the number of values
- Then square root the number
- It is affected by extreme values, but uses all values so more powerful, can be used if skewed data
measures of spread
- Range
- inter-quartile range (more representative, representing middle 50% of the data, calculate 25th centile and 75th centile, associated with the median)
- standard deviation (average distance from mean for individual picked at random, measure of how spread out the values are, used for comparison)
symmetric distribution
mean and SD
non-symmetric distribution
median and IQR
normal distribution
- Most people have values around the middle (mean) some with extremes but roughly the same on each side
- when we know the mean &SD, we can work out where a certain % of the sample are within
- Can use properties of normal distribution to create ‘normal ranges’ – where 95% of the data lie
quantifying differences
- One numeric and one categoric variable
- Two numeric variables
- If we can use difference in means we do
- If any of groups not normally distributed, difference in medians
- Concluding whether that difference is big enough to be important
comparing 2 numeric variables
- Pearson’s correlation coefficient (denoted r)
- R must be between -1 and +1
- +1 = perfect positive linear association
- -1 = perfect negative linear association
- 0= no linear relation at all