4. Data, Distribution, Populations, and Samples Flashcards
What is a categorical Variable?
Individuals classified into one
of several categories
What are different types of categorical variables?
Binary variables: Only two categories e.g. female/not female,
yes/no
Ordinal variables: >2 categories and they are ordered
e.g. pain (non/mild/moderate/severe)
Nominal variables: Neither binary nor ordinal
e.g. ethnicity (Caucasian, Asian, Black)
Whats a binary variable?
Only two categories e.g. female/not female,
yes/no
What is an ordinal variable?
: >2 categories and they are ordered
e.g. pain (non/mild/moderate/severe)
What is a nominal variable?
Neither binary nor ordinal
e.g. ethnicity (Caucasian, Asian, Black)
What is a numerical variable?
Measured on a number scale
What are different types of discrete variables?
Discrete variables: Distinct number of values
e.g. age in years, number of transfusions, parity
Continuous variables: Any value within a certain range
e.g. blood pressure, head circumference
What is a discrete variable?
Distinct number of values
e.g. age in years, number of transfusions, parity
What is a continous variable?
Any value within a certain range
e.g. blood pressure, head circumference
What is important about collecting data?
That you collect it in its most informative measure as you will never be able to return back to the point of collection. Data variables can be transformed into different types later on.
What are different ways of summarising categoric data?
Proportion = Number in the category divided by the total
Percentage = proportion x 100
Rate = the number with the event per / (people or time)
Odds =Number in the category / number not in that category
What is the Mean?
Most widely known measure of centre
= sum of all measurements/total number of measurements
Whats the median?
value that falls halfway along the frequency distribution (50th Centile)
How do you summarise the median in odd and even numbers?
Odd - exactly in the middle
Even - the average of the middle two values
When data is symmetric how will this effect the mean and median?
They will be exactly the same.
What is bad about the mean as a comparison?
The mean is highly influenced by a single extreme value. This skewed distribution can unfairly affect the mean, which will then not accurately represent the data set.
What is the aim when summarising numeric measurements?
Aim: to best summarise the data
Median = always representative of the centre of the data
Whereas, the mean is only representative if the distribution of the
data is symmetric
Mean = each measure is directly involved in its calculation very
sensitive to changes in the data and heavily influenced by outlying
measurements
Describe the Range and its positives and negatives.
Range: difference between the largest and the
smallest values of the distribution
It ignores the bulk of the data
By definition, it depends on the two most
extreme (and hence possibly ‘odd’) values
Describe the IQR and its + and -.
Inter-quartile range: the range within which 50% of the
sample values lie
More representative of the majority of the data
Does not depend on the oddest or extreme values like
the range does
More stable summary measure
What is variance?
A measure of variation. It tells us the deviations observed from the mean.
How is variance (S2) calculated?
Sum of deviations/ (n-1)
What is the standard deviation?
The average deviation from the mean.
How can the SD be calculated?
Square root of variance.
Describe the SD and its + and -.
Standard deviation is more sensitive to changes in the
data than the range and inter-quartile range
Standard deviation = more powerful summary measure
of the spread of the data as it makes more
comprehensive use of the entire dataset
If the mean is NOT a meaningful summary of the centre
of the data, then the same follows for the standard
deviation as this is based on distances from the mean
If data is evenly distributed how should it be represented?
- Mean
2. Variance or SD
How should skewed data be represented?
- Median
2. IQR
What is the normal distribution?
It is completely and fully defined by only two
parameters, mean and standard deviation. It is represented in an even, symmetrical distribution of data.
Mean always at the centre of the distribution (at its
peak)
Standard deviation presents how spread the
distribution is around the mean
What is the SD with 95% of values within normal distribution?
95% of values will lie ±1.96 SD from the mean
What is the SD with 68% of values within normal distribution?
68% of values will lie ±1 SD from the mean
How can one test normality with data?
Statistically test whether the data could have come from a normal range. This can easily be performed by calculating the range about the mean of a SD of ±1.96 (testing that 95% of the population falls within this range demonstrates its normal)
Mean - 1.96
Mean + 1.96
If its lower limit is infeasible then its not normal.
What is the target population?
The population that is
ideal for meeting the measurement objectives
What is the survey population?
The target population
modified to take into account practical
constraints
What is the standard error?
It measures how precisely the population mean is estimated by the sample mean.
How standard error calculated?
Standard error of the mean is calculated by
dividing the standard deviation of the
measurements by the square root of the sample
size.
How do we know the SD for populations when testing on samples?
We do not usually know the STANDARD
DEVIATION of the population whose mean we are
trying to estimate.
We use the sample standard deviation to
approximate the population standard deviation.
This is OK if the sample is not small (> 20) and
the sample is approximately normally distributed
What is a confidence interval?
A range of values so defined that there is a specified probability that the value of a parameter lies within it.
The interval (sample±1.96SE) will contain the population mean for 95% of random samples.
The ends of the interval are known as the limit.
What is the benefit of a CI?
They attach a level of precision to a sample estimate to help interpretation.