Types of variables and presentation of data Flashcards
What are some routinely collected sources of data?
- mortality and census data
- hospital activity data
- primary care data
- infectious disease notifications
- regular national surveys (e.g. health survey
for England)
What is a strength and a weakness of research study data?
+ Better quality
- More expensive and time consuming
What are the 3 types of categorical variables?
Ordinal (ordered categorical)
Nominal (unordered categorical)
Binary / Dichotomous
What is categorical data?
categories (no numbers) e.g. hair colour
What is ordinal data?
Has an underlying order
Categories can be ranked
e.g. highest level of education, GCSE, A level, Degree
What is nominal data?
No underlying order, categories cannot be ranked e.g blood group
What is binary (dichotomous) data?
Has two categories
e.g. Male / Female
Presence of disease - Yes / No
I / 0
What are the 2 types of numerical variables?
Continuous
Discrete / count variable
What is continuous data?
Can be any number
e.g. height
e.g. 5.5
What is discrete ( count) data ?
Can only be whole numbers (integers)
Can categorical variables be created from numerical variables?
Yes - categorical variables can be created from numerical variables
Can numerical variables be created from categorical variables?
NO - numerical variables CANNOT be created from categorical variables
Why is the type of variable important?
Variable type determines appropriate way to:
- display the data
- summarise the data (central tendency /
variation) - analyse the data using statistical testing
How should single variable data with one categorical variable be presented?
= Bar chart, Pie Chart or Frequency table
How should single variable data with one continuous variable be presented?
= histogram or bar chart
How should a pair of variables with categorical outcome and categorical exposure be presented?
= Contingency table
How should a pair of variables with numerical outcome and categorical exposure be presented?
= Box and whisker plot
How should a pair of variables with numerical outcome and numerical exposure be presented?
= Scatter plot
With exposure and outcome which is the X and which is the Y variable?
X variable = Exposure
Y variable = Outcome
What factors relate to exposures?
- Explanatory variable
- Independent variable
- Risk factor
- Treatment group
X variable
What factors relate to outcomes?
- Response variable
- Dependent variable
- Case / control group
- Disease group
Y variable
What is the 3 main features of a bar chart?
- Height of the bars are proportional to the
frequencies - Useful for comparing frequencies relative to
others - Variables MUST be categorical
What is the 2 main features of pie charts?
- Areas of the sectors are proportional to the
frequencies - Useful for comparing the frequencies in each
category with the whole group
What are the 2 main features of a histogram?
- Variable must be CONTINUOUS
- relative frequencies are represented by
areas of the bars
How does a box and whisker plot work?
Minimum / maximum indicated by whiskers
Middle 50% contained within box
Median indicated by horizontal line inside box
What are the three types of distribution?
- Normal distribution
- Positively skewed (long tail to right)
- Negatively skewed (long tail to left)
What is the definition of the mean (average)?
= Sum of all values divided by number of observations
What is the definition of standard deviation?
= Measure of the spread of observations around the mean
√ (sum of squared deviations) / (no of observations - 1)
(All square rooted)
[Variance = SD2]
What is the formula for squared deviation?
= (Original value - Mean value) 2
What is the definition of the median?
the middle value when values are arranged in order
What is the definition of the interquartile range?
The range from the first (25%) to the third (75%) quartiles of a distribution
What is the definition of the mode?
= the most frequently occurring value
- should not exist is the data is truly continuous
What is the definition of the range?
The difference between largest and smallest values in a distribution
- depends upon the extreme values, which may give an unrepresentative view of the whole set of values
What does a 95% reference range indicate?
= mean + or - 1.96 x SD
-> can interpret as likely values for an individual in the population
What is variance?
= (Standard Deviation)^2