Lecture 3-Summarising data Flashcards
what is a variable?
subject’s characteristics taking any number of a set of values
what is a qualitative variable?
(catergorical)
falls into a specific catergory
e.g. Sex, hair colour, ethnic group
what is a quantitative variable?
Continuous or discrete
- Continuous
- variables can take any number of values –> height / weight - discrete
- variables that can only take integer measuremnets
- number of children / number of pre-existing diseases
what are the three scales of measurement
- interval scale –> height / BMI / BP
- Nominal catergory –> sex / hair colour
- ordinal catergory –>QOL / hospital performance ranking
what does the frequency distribution show?
shows how often different values of a variable occur in the dataset, usually described as a graph or table
what is the frequency distribution graph for nominal / ordinal data?
Bar chart
-frequency on y-axis and values on the x-axis
how can frequency distribution be shown on interval scale variables?
Histogram -where the class intervals are on an axis and rectangles with heights or areas proportional to the frequencies are stacked on them
what is relative frequecy?
frequency expressed as percentage of the total frequency
What is the mean?
centre value of the sample
-sum of all the values divided by the number of values
what is the median?
Middle value of the data
-resistant measure of data’s center
what is measure of dispersion?
the value each data piece has from the mean
what is variance?
The variance of the sample is the mean of the squared deviations of the values from their mean
what is standard deviation
square root of the variance
what is the range?
Highest number - lowest number
prone to outliers
what is the IQR
UQ - LQ
what is a positive skew?
Skew to the right
but the graph is towards the left
what is a negative skew?
Skew to the left
graph is towards the right
When would you see a uniform symmetrical histogram?
throwing a dice
What is included in the five number summary boxplot?
- Min
- Q1
- Median
- Q3
- Max
How to deal with outliers?
Check for obvious mistakes. When/who recorded the data. How are the results affected? Can the results be analysed a different way? Do not simply discard!
What do outliers do to the data?
affect mean and standard deviation
do 5 calculations with the outliers present
what is preferred when data is heavily skewed?
IQR and median