Describing data Flashcards
Name 2 types of Qual data types
Nominal and Ordered categorical (ordinal)
What is Ordered categorical (ordinal)
Qual type of data data that can be put into more than 2 cat's, eg social class or grade of BC Mutually exclusive and ordered
what is Nominal
Qual type of data
displayed usually in a pie or bar chart
mutually exclusive and unordered
eg blood group or gender
name 2 types of Quan data types
Measured (Continuous) and Discrete
what is Discrete data type
Quan data type
can only take certain whole number values
eg number of children in a family
What is Measured (Continuous) data
Quan data type
take values given within a range
limited only by accuracy of instrument
eg weight in Kng, age, height
what does Binary data fall under
Nominal data
what can numerical data be plotted as
dot plot, stem and leaf diagram, histogram, Box and whisker plot
What is mean and how do you calculate it
the average of the sample
To get the mean you add all the sample values then divide over the size of the sample
what is the medium and how do you calculate it
Middle value of the ordered sample
Order sample values small to large- basically the middle number
What is the mode and how do you calculate it
A third measure of location is the mode which is simply the most common value observed
eg 2 appears most on sample of 10 with 2 appearing 6 times
what are the pros and cons of mean/median/mode
Median robust to outliers.
Median/mode reflects what ‘most’ people experience.
Mean uses all the data (more ‘efficient’).
Mean is ‘expected’ value.
Mean more common with statistical tests.
Mode useful for grouped or categorical data
Is the Median robust to outliers.
Yes
what is the most common statistical tests
Mean
which measure uses all the data
mean
if the data is skewed what measure do you use
medium
if the data is symmetrical what measure do you use
Mean
what are the 3 approaches to quantifying the variability
- Range
- Inter quartile Range
- Standard deviation
what is the range
Simplest way to describe the spread of a data set is to quote the minimum (lowest) and maximum (highest) value
Difference between the smallest value and the largest
Effected by extreme values at each end of the data
what is the Inter quartile range
Split the data set into four equal parts - quartiles Using three cut-points Lower quartile (25th centile) Median (50th centile) Upper quartile (75th centile)
Inter quartile range (IQR) tells you where the middle 50% of your data lies
IQR = upper quartile - lower quartile
Graphical way of summarising data using percentiles is the box & whisker plot.
Basically tells you where medium is, useful for looking at deprivation eg upper Q and lower Q
How do you calculate the IQR
When the quartile lies between 2 observations easiest option is to take the mean
Take mean away from medium- eg 2 away from 6- IQR-4 and 8- medium 6
what is Variance
Based on the idea of averaging the distance each value is away from the mean
Basically, you work out each values difference from the mean, square them and add them up then divide that number by the sample size-1
The variance is not a suitable measure for describing variability because it is not in the same units as the raw data
Is the Variance suitable measure for describing variability
No The variance is not a suitable measure for describing variability because it is not in the same units as the raw data
what is a suitable measure for describing variability
Standard deviation
How do you calculate SD
square root the variance
sd vs IQR
S.D. vulnerable to ‘outliers’
Not useful for skewed data
IQR robust
Does not use all the data
why use the mean and SD
For many variables in health sciences the mean ± 1 SD covers 68% of the distribution.
The mean ± 2 SDs covers 95% of the distribution.
The mean ±2 SDs is called the ‘normal reference range’.
What does 95% of the distribution mean
the mean +- 2SD
68% IS 1SD+-
in a normal distribution do the mean and medium coincide
Yes
what summary measure is appropriate for symmetrical data
If symmetrical use the mean and standard deviation
Remember this data is bell shaped
what summary measure is appropriate for Skewed data
If skewed the median and inter quartile range is more appropriate
Remember this data is presented with tail