3. Medical Statistics intro Flashcards
what is a STATISTIC
a numerical summary of a SAMPLE
what is a PARAMETER
a numerical summary of the POPULATION
what is CATEGORICAL data
QUALITATIVE
what is NUMERICAL data
QUANTITATIVE
CATAGORICAL data can be split into:
NOMINAL and ORDINAL
what is NOMINAL vs ORDINAL data with examples
(categorical)
- Nominal: categories are mutually exclusive and UNORDERED
eg. sex, blood group, ethnicity, survival after 10 years
- Ordinal: categories are mutually exclusive and ORDERED
eg. disease stage, education level, heart murmur grade
NUMERICAL data can be split into…
DISCRETE and CONTINUOUS
what is DISCRETE vs CONTINUOUS data (numerical)
- Discrete: take only INTEGER VALUES (COUNT 0,1,2..)
eg. NUMBER OF pregnancies, number of asthma exacerbations
- Continuous: take ANY VALUE in a given interval
eg. weight, blood pressure, cholesterol levels, age
PROS and CONS of CONVERTING NUMERICAL to CATEGORICAL
(eg systolic bp (mmHg) —> hypertensive (>140), normotensive (<140)
PROS:
- EASIER to DESCRIBE POPULATION by the % of people AFFECTED
- EASIER to make TREATMENT DECISIONS if population is GROUPED
CONS:
- LOSE INFORMATION
- how to DECIDE CUT OFF? what is abnormal?
how to DESCRIBE the DISTRIBUTION of CATEGORICAL variables (what to look at)
- the category with the LARGEST FREQUENCY (MODAL CATEGORY)
- how FREQUENTLY each category was OBSERVED (%)
how to DESCRIBE the DISTRIBUTION of NUMERICAL variables (what to look at)
- SHAPE (do observations cluster in certain intervals?)
- CENTRE (where does a typical observation fall?)
- VARIABILITY (how tightly are the observations clustering around a centre)
DESCRIBING CATEGORICAL DATA:
PROPORTION vs PERCENTAGE (how to calculate)
PROPORTION : the NUMBER OF OBSERVATIONS in that category DIVIDED by the TOTAL NUMBER of OBSERVATIONS
PERCENTAGE = PROPORTION X 100
DESCRIBING CATEGORICAL DATA:
PROPORTIONS and PERCENTAGES are also called … and serve as a way to..
RELATIVE FREQUENCIES
serve as a way to SUMMARIZE the DISTRIBUTION of a CATEGORICAL variable NUMERICALLY
DESCRIBING CATEGORICAL DATA:
what is the ABSOLUTE CHANGE (and how to calculate)
describes the ACTUAL INCREASE or DECREASE from a REFERENCE VALUE to a NEW VALUE
ABSOLUTE CHANGE = NEW VALUE - REFERENCE VALUE
DESCRIBING CATEGORICAL DATA:
what is the RELATIVE CHANGE (and how to calculate)
describes the size of the ABSOLUTE CHANGE in COMPARISON to the REFERENCE VALUE
expressed as %
RELATIVE CHANGE = NEW VALUE - REFERENCE VALUE /
REFERENCE VALUE
X100
DESCRIBING CATEGORICAL DATA:
Percentages are also commonly used to compare 2 numbers. there is REFERENCE VALUE and COMPARED VALUE (compared to reference)
how do you calculate ABSOLUTE DIFFERENCE
ABSOLUTE DIFFERENCE
= COMPARED VALUE - REFERENCE VALUE
DESCRIBING CATEGORICAL DATA:
Percentages are also commonly used to compare 2 numbers. there is REFERENCE VALUE and COMPARED VALUE (compared to reference)
how do you calculate RELATIVE DIFFERENCE (%)
RELATIVE DIFFERENCE
= COMPARED VALUE - REFERENCE VALUE /
REFERENCE VALUE
X 100
DESCRIBING CATEGORICAL DATA:
ABSOLUTE vs RELATIVE
ABSOLUTE = difference/change
RELATIVE = Percentage change
eg weight loss 200 kg —> 180 kg
absolute weight loss = 20 kg
relative weight loss = 10% (20/200 x 100)
DESCRIBING CATEGORICAL DATA:
if a value is 20% MORE than the reference value, it is ….% OF the reference value
120% OF the reference (100 + P)
DESCRIBING CATEGORICAL DATA:
is a value is 20% LESS than the reference value, it is …% OF the reference value
80% OF the reference (100 - P)
DESCRIBING NUMERICAL DATA:
what type of graph visualises the DISTRIBUTION of a QUANTITATIVE variable
HISTOGRAM
DESCRIBING NUMERICAL DATA:
three questions to ask:
- does the distribution have a SINGLE MOUND / PEAK (MODE)
- what is the SHAPE of the distribution
- do the data CLUSTER together, or is there a GAP such that one or more observations noticeably differ from the rest
DESCRIBING NUMERICAL DATA:
what is UNIMODAL vs BIMODAL distribution
UNIMODAL: SINGLE MOUND/PEAK (mode)
BIMODAL: 2 DISTINCE MOUNDS (modes)
DESCRIBING NUMERICAL DATA:
SHAPE of the distribution can be:
SYMMETRIC: left half is mirror image of right half
SKEWED TO THE LEFT (NEGATIVELY SKEWED) : LONGER LEFT TAIL
SKEWED TO THE RIGHT (POSITIVELY SKEWED): LONGER RIGHT TAIL
DESCRIBING NUMERICAL DATA:
is LEFT SKEWED data positive or negative and give an example of a left skew
NEGATIVELY SKEWED
longer, skewed left tail
eg LIFE SPAN
relatively low deaths at young age, most deaths at older age
DESCRIBING NUMERICAL DATA:
is RIGHT SKEWED data positive or negative and give an example of a right skew
POSITIVELY SKEWED
longer, skewed right tail. starts high and slopes down
eg. INCOME
most observations at low income, relatively few are rich
DESCRIBING NUMERICAL DATA:
in a NORMAL DISTRIBUTION what is the 68-95-99.7 % RULE
- within 1 STANDARD DEVIATION of the MEAN (above/below): 68% of observations
- within 2 STANDARD DEVIATIONS of the MEAN: 95% of observations
- within 3 STANDARD DEVIATIONS: ALL or NEARLY ALL observations
DESCRIBING NUMERICAL DATA:
NORMAL DISTRIBUTION
what % of observations are within 1 STANDARD DEVIATION
68%
DESCRIBING NUMERICAL DATA:
NORMAL DISTRIBUTION
what % of observations are within 2 STANDARD DEVIATIONS
95%
How to calculate MEAN
sum of all values / total number of values
MODE is most often used with which data type
CATEGORICAL DATA
the SHAPE of a distribution INFLUENCES whether the MEAN is LARGER or SMALLER than the MEDIAN
how is the MEAN in relation to the MEDIAN in a SYMMETRIC DISTRIBUTION
MEAN = MEDIAN
at the middle peak
the SHAPE of a distribution INFLUENCES whether the MEAN is LARGER or SMALLER than the MEDIAN
how is the MEAN in relation to the MEDIAN in a LEFT SKEWED DISTRIBUTION
MEAN is SMALLER than the MEDIAN (usually)
(median is closer to peak, mean is closer to long tail in unimodal)
the SHAPE of a distribution INFLUENCES whether the MEAN is LARGER or SMALLER than the MEDIAN
how is the MEAN in relation to the MEDIAN in a RIGHT SKEWED DISTRIBUTION
MEAN is LARGER than the MEDIAN (usually)
(median closer to peak, mean closer to long tail in unimodal)
for SKEWED DISTRIBUTIONS is mean or median PREFERRED
MEDIAN
because it better represents what is TYPICAL
is MEDIAN affected by OUTLIERS
RESISTANT to outliers
is MEAN affected by OUTLIERS
YES
NOT RESISTANT to outliers
is MODE affected by OUTLIERS
NO
outliers do NOT affect mode
what is affected severely by OUTLIERS
RANGE
so not very informative
MEAN and STANDARD DEVIATION are also sensitive to outliers
STANDARD DEVIATION measures the..
SPREAD of data
STANDARD DEVIATION gives a measure of … by …
VARIATION
by summarising the deviations of each observation from the mean and calculating an adjusted average of these deviations
see calculation
what is the VARIANCE of a set of values
SQUARE of STANDARD DEVIATION
variance = s ^2
the LARGER the STANDARD DEVIATION the …
GREATER the VARIABILITY
when does S = 0
(standard deviation)
when all observations have the same value
otherwise s > 0
STANDARD DEVIATION and variance UNITS
same units as the original observations
variance has squared units
can OUTLIERS and SKEWS AFFECT STANDARD DEVIATION
NOT RESISTANT
strong skewness and outliers can greatly INCREASE S
the INTERQUARTILE RANGE IQR is ..
the DISTANCE between the THIRD QUARTILE and FIRST QUARTILE
IQR = Q3 - Q1
gives the spread of MIDDLE 50% of data
how do you calculate when an observation is a POTENTIAL OUTLIER
1.5 X IQR
potential outlier if 1.5 x IQR below Q1 or above Q3
PERCENTILES:
a pth percentile is a value such that..
p % of the observation falls below or at that value
eg. 90th percentile
90% of data falls below that percentile, 10% above
QUARTILES:
Q1,Q2,Q3 divide a set of date into … groups with …% of the values in each group
4 groups
25%
the 5 NUMBER SUMMARY is the basis of a BOX PLOT and consists of:
- MINIMUM VALUE
- Q1
- Q2 (MEDIAN)
- Q3
- MAXIMUM VALUE
potential outliers marked separately and may be above maximum/ below minimum
what is a Z SCORE and how do you CALCULATE it
the NUMBER OF STANDARD DEVIATIONS that a given value is ABOVE/BELOW the MEAN
Z = OBSERVATION - MEAN / STANDARD DEVIATION
a POSTIVE and NEGATIVE Z SCORE indicates…
Positive: Observation is ABOVE the Mean
Negative: Observation is BELOW the mean
what does a Z SCORE of 2 say
that the data value is 2 STANDARD DEVIATIONS ABOVE the MEAN
(-2 means 2 s BELOW mean)
Z SCORES allows us to tell..
how UNUSUAL an observation is
LARGER Z SCORE (positive or negative) = MORE UNUSUAL
(-1.3 is more unusual than 1.2)
an observation from a BELL-SHAPED distribution is a POTENTIAL OUTLIER if its Z SCORE is
BELOW - 3 or ABOVE 3
(3 standard deviations out)