Lectures Notes Flashcards
what are the two types of categorical data?
nominal and ordinal
what is nominal data? give examples
categorical data with no natural order
e.g. blood group, sex
what is ordinal data? give examples?
ordered categorical data.
e.g. pain severity, social class, grade of breast cancer
what are the two types of numerical data?
discrete and continuous
what is binary data?
a form of nominal categorical data, where there are only two categories
how might you display categorical data?
bar chart, pie chart
how might you display numerical data?
dot plot, stem and leaf, histogram, box and whisker
how do you calculate the mean of a data set?
total all the values, and divide by the number of values
how do you calculate the median of a data set?
order the values, median is the middle value.
if there is an even number of values, take the mean of the middle two values.
how do you calculate the mode of a data set?
the most common value observed
what’s the main advantage of using a median of a data set?
robust to outliers.
what’s the main advantage of using the mean of a data set?
uses all the data.
when would you use median vs mean?
symmetrical data = mean
skewed data = median
list the three main approaches to quantifying variability
range
interquartile range
standard deviation
what is the interquartile range of a data set, and how could you display it graphically?
the middle 50% of your data.
upper quartile - lower quartile.
box and whisker plot.
how do you calculate variance?
- draw a table
- calculate difference between observed value and mean for each value
- square each of these values
- calculate the total of the squared differences from mean
- divide this by n-1
(n= number of values)
how do you calculate standard deviation (SD)?
square root of variance
how many decimal places should you use when calculating SD?
usually 2 or 3 more decimal places than the original data
what is the relationship between mean and SD in Normally distributed data?
mean ± 1 SD covers 68% of data
mean ± 2 SD covers 95% of data
how do you calculate the ‘normal reference range’ of an investigation?
mean ± 2 SDs
what is the relationship between the mean and the median in Normally distributed data?
will be the same!
formula for risk
risk = no. events observed / number in the group
formula for risk difference
RD = risk (exposed group) - risk (unexposed group)
what’s the difference between risk difference and ABSOLUTE risk difference?
in ARD you ignore the sign - so it’s always expressed as a positive number, but it might represent an increase or decrease in risk
your 95% CI for a relative risk includes 1.00 - what does this mean?
there is NO difference between groups
formula for number needed to treat (or harm!)
1/ARD
formula for odds
no. people with disease / no. people without
formula for odds ratio
odds (exposed) / odds (unexposed)
if an event is rare, what is the relationship between odds and risk ratios?
they’ll basically be the same - but for a common event, they can be really different
____ is a useful measure of spread when data is distributed symmetrically
standard deviation
if data is symmetrically distributed, what percentage of data lies within 2 SD of the mean?
95%
what would you see on a histogram of positively skewed data?
peak of data is at the left, tail extends to the right.
the mean will be greater than the median.
how can you tell which direction data is skewed in from the mean and median?
mean = median : symmetrical
mean > median : positive skew
mean < median : negative skew
what would you see on a histogram of negatively skewed data?
peak of data is at the right, tail extends to the left.
mean will be less than the median.
what summary measures would you use for positive/negative skewed data?
median and interquartile range
formula for the addition rule of probability
P(A or B) = P(A) + P(B)
formulae for the multiplication rule of probability
P(A and B) = P(A) x P(B)
if an event has a probability of 0, what does this mean? what about 1?
0 = it can never happen 1 = it definitely happens
define standard error
estimate of the precision of a sample estimate - a measure of how far from the true population value a sample estimate is likely to be.
what type of distribution will a set of sample means take, given a large enough sample size?
Normal
formula for standard error of a mean
SD / sq root of n
formula for standard error of a proportion
square root of:
p(1-p) / n
(p = sample proportion)
how do you calculate the standard error of the difference between two sample means
SD of first sample / n for that sample
+
SD of second sample / n for that sample
square root the answer
what does a large standard error mean?
that your estimate of a population mean is imprecise
what does a small standard error mean?
that your estimate of a population mean is precise
if sample size increases, does standard error go up or down?
down - we get a more precise estimate
what is the general formula for calculating a 95% CI?
mean ± (1.96xSE)
what is the technical definition for a 95% confidence interval?
if the study were to be repeated 100 times, of the 100 resulting 95% CIs, we would expect 95 of these to include the population mean
what is the correct way to interpret a 95% CI of 120-130mmHg, mean 125
We are 95% confident that the true population mean sys. BP lies between 120 and 130, but the best estimate we have is 125.