3. Medical Statistics intro Flashcards

1
Q

what is a STATISTIC

A

a numerical summary of a SAMPLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a PARAMETER

A

a numerical summary of the POPULATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is CATEGORICAL data

A

QUALITATIVE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is NUMERICAL data

A

QUANTITATIVE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

CATAGORICAL data can be split into:

A

NOMINAL and ORDINAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is NOMINAL vs ORDINAL data with examples
(categorical)

A
  • Nominal: categories are mutually exclusive and UNORDERED

eg. sex, blood group, ethnicity, survival after 10 years

  • Ordinal: categories are mutually exclusive and ORDERED

eg. disease stage, education level, heart murmur grade

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

NUMERICAL data can be split into…

A

DISCRETE and CONTINUOUS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is DISCRETE vs CONTINUOUS data (numerical)

A
  • Discrete: take only INTEGER VALUES (COUNT 0,1,2..)

eg. NUMBER OF pregnancies, number of asthma exacerbations

  • Continuous: take ANY VALUE in a given interval

eg. weight, blood pressure, cholesterol levels, age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

PROS and CONS of CONVERTING NUMERICAL to CATEGORICAL

(eg systolic bp (mmHg) —> hypertensive (>140), normotensive (<140)

A

PROS:
- EASIER to DESCRIBE POPULATION by the % of people AFFECTED
- EASIER to make TREATMENT DECISIONS if population is GROUPED

CONS:
- LOSE INFORMATION
- how to DECIDE CUT OFF? what is abnormal?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to DESCRIBE the DISTRIBUTION of CATEGORICAL variables (what to look at)

A
  • the category with the LARGEST FREQUENCY (MODAL CATEGORY)
  • how FREQUENTLY each category was OBSERVED (%)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to DESCRIBE the DISTRIBUTION of NUMERICAL variables (what to look at)

A
  • SHAPE (do observations cluster in certain intervals?)
  • CENTRE (where does a typical observation fall?)
  • VARIABILITY (how tightly are the observations clustering around a centre)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DESCRIBING CATEGORICAL DATA:

PROPORTION vs PERCENTAGE (how to calculate)

A

PROPORTION : the NUMBER OF OBSERVATIONS in that category DIVIDED by the TOTAL NUMBER of OBSERVATIONS

PERCENTAGE = PROPORTION X 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DESCRIBING CATEGORICAL DATA:

PROPORTIONS and PERCENTAGES are also called … and serve as a way to..

A

RELATIVE FREQUENCIES

serve as a way to SUMMARIZE the DISTRIBUTION of a CATEGORICAL variable NUMERICALLY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DESCRIBING CATEGORICAL DATA:

what is the ABSOLUTE CHANGE (and how to calculate)

A

describes the ACTUAL INCREASE or DECREASE from a REFERENCE VALUE to a NEW VALUE

ABSOLUTE CHANGE = NEW VALUE - REFERENCE VALUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DESCRIBING CATEGORICAL DATA:

what is the RELATIVE CHANGE (and how to calculate)

A

describes the size of the ABSOLUTE CHANGE in COMPARISON to the REFERENCE VALUE
expressed as %

RELATIVE CHANGE = NEW VALUE - REFERENCE VALUE /
REFERENCE VALUE
X100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DESCRIBING CATEGORICAL DATA:

Percentages are also commonly used to compare 2 numbers. there is REFERENCE VALUE and COMPARED VALUE (compared to reference)

how do you calculate ABSOLUTE DIFFERENCE

A

ABSOLUTE DIFFERENCE
= COMPARED VALUE - REFERENCE VALUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DESCRIBING CATEGORICAL DATA:

Percentages are also commonly used to compare 2 numbers. there is REFERENCE VALUE and COMPARED VALUE (compared to reference)

how do you calculate RELATIVE DIFFERENCE (%)

A

RELATIVE DIFFERENCE
= COMPARED VALUE - REFERENCE VALUE /
REFERENCE VALUE

X 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

DESCRIBING CATEGORICAL DATA:

ABSOLUTE vs RELATIVE

A

ABSOLUTE = difference/change

RELATIVE = Percentage change

eg weight loss 200 kg —> 180 kg
absolute weight loss = 20 kg
relative weight loss = 10% (20/200 x 100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

DESCRIBING CATEGORICAL DATA:

if a value is 20% MORE than the reference value, it is ….% OF the reference value

A

120% OF the reference (100 + P)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

DESCRIBING CATEGORICAL DATA:

is a value is 20% LESS than the reference value, it is …% OF the reference value

A

80% OF the reference (100 - P)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DESCRIBING NUMERICAL DATA:

what type of graph visualises the DISTRIBUTION of a QUANTITATIVE variable

A

HISTOGRAM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

DESCRIBING NUMERICAL DATA:

three questions to ask:

A
  1. does the distribution have a SINGLE MOUND / PEAK (MODE)
  2. what is the SHAPE of the distribution
  3. do the data CLUSTER together, or is there a GAP such that one or more observations noticeably differ from the rest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DESCRIBING NUMERICAL DATA:

what is UNIMODAL vs BIMODAL distribution

A

UNIMODAL: SINGLE MOUND/PEAK (mode)

BIMODAL: 2 DISTINCE MOUNDS (modes)

24
Q

DESCRIBING NUMERICAL DATA:

SHAPE of the distribution can be:

A

SYMMETRIC: left half is mirror image of right half

SKEWED TO THE LEFT (NEGATIVELY SKEWED) : LONGER LEFT TAIL

SKEWED TO THE RIGHT (POSITIVELY SKEWED): LONGER RIGHT TAIL

25
Q

DESCRIBING NUMERICAL DATA:

is LEFT SKEWED data positive or negative and give an example of a left skew

A

NEGATIVELY SKEWED

longer, skewed left tail

eg LIFE SPAN
relatively low deaths at young age, most deaths at older age

26
Q

DESCRIBING NUMERICAL DATA:

is RIGHT SKEWED data positive or negative and give an example of a right skew

A

POSITIVELY SKEWED

longer, skewed right tail. starts high and slopes down

eg. INCOME
most observations at low income, relatively few are rich

27
Q

DESCRIBING NUMERICAL DATA:

in a NORMAL DISTRIBUTION what is the 68-95-99.7 % RULE

A
  • within 1 STANDARD DEVIATION of the MEAN (above/below): 68% of observations
  • within 2 STANDARD DEVIATIONS of the MEAN: 95% of observations
  • within 3 STANDARD DEVIATIONS: ALL or NEARLY ALL observations
28
Q

DESCRIBING NUMERICAL DATA:

NORMAL DISTRIBUTION
what % of observations are within 1 STANDARD DEVIATION

A

68%

29
Q

DESCRIBING NUMERICAL DATA:

NORMAL DISTRIBUTION
what % of observations are within 2 STANDARD DEVIATIONS

A

95%

30
Q

How to calculate MEAN

A

sum of all values / total number of values

31
Q

MODE is most often used with which data type

A

CATEGORICAL DATA

32
Q

the SHAPE of a distribution INFLUENCES whether the MEAN is LARGER or SMALLER than the MEDIAN

how is the MEAN in relation to the MEDIAN in a SYMMETRIC DISTRIBUTION

A

MEAN = MEDIAN

at the middle peak

33
Q

the SHAPE of a distribution INFLUENCES whether the MEAN is LARGER or SMALLER than the MEDIAN

how is the MEAN in relation to the MEDIAN in a LEFT SKEWED DISTRIBUTION

A

MEAN is SMALLER than the MEDIAN (usually)

(median is closer to peak, mean is closer to long tail in unimodal)

34
Q

the SHAPE of a distribution INFLUENCES whether the MEAN is LARGER or SMALLER than the MEDIAN

how is the MEAN in relation to the MEDIAN in a RIGHT SKEWED DISTRIBUTION

A

MEAN is LARGER than the MEDIAN (usually)

(median closer to peak, mean closer to long tail in unimodal)

35
Q

for SKEWED DISTRIBUTIONS is mean or median PREFERRED

A

MEDIAN

because it better represents what is TYPICAL

36
Q

is MEDIAN affected by OUTLIERS

A

RESISTANT to outliers

37
Q

is MEAN affected by OUTLIERS

A

YES

NOT RESISTANT to outliers

38
Q

is MODE affected by OUTLIERS

A

NO
outliers do NOT affect mode

39
Q

what is affected severely by OUTLIERS

A

RANGE
so not very informative

MEAN and STANDARD DEVIATION are also sensitive to outliers

39
Q

STANDARD DEVIATION measures the..

A

SPREAD of data

40
Q

STANDARD DEVIATION gives a measure of … by …

A

VARIATION
by summarising the deviations of each observation from the mean and calculating an adjusted average of these deviations

see calculation

41
Q

what is the VARIANCE of a set of values

A

SQUARE of STANDARD DEVIATION

variance = s ^2

42
Q

the LARGER the STANDARD DEVIATION the …

A

GREATER the VARIABILITY

43
Q

when does S = 0
(standard deviation)

A

when all observations have the same value

otherwise s > 0

44
Q

STANDARD DEVIATION and variance UNITS

A

same units as the original observations

variance has squared units

45
Q

can OUTLIERS and SKEWS AFFECT STANDARD DEVIATION

A

NOT RESISTANT

strong skewness and outliers can greatly INCREASE S

46
Q

the INTERQUARTILE RANGE IQR is ..

A

the DISTANCE between the THIRD QUARTILE and FIRST QUARTILE

IQR = Q3 - Q1

gives the spread of MIDDLE 50% of data

47
Q

how do you calculate when an observation is a POTENTIAL OUTLIER

A

1.5 X IQR

potential outlier if 1.5 x IQR below Q1 or above Q3

48
Q

PERCENTILES:
a pth percentile is a value such that..

A

p % of the observation falls below or at that value

eg. 90th percentile
90% of data falls below that percentile, 10% above

49
Q

QUARTILES:
Q1,Q2,Q3 divide a set of date into … groups with …% of the values in each group

A

4 groups
25%

50
Q

the 5 NUMBER SUMMARY is the basis of a BOX PLOT and consists of:

A
  • MINIMUM VALUE
  • Q1
  • Q2 (MEDIAN)
  • Q3
  • MAXIMUM VALUE

potential outliers marked separately and may be above maximum/ below minimum

51
Q

what is a Z SCORE and how do you CALCULATE it

A

the NUMBER OF STANDARD DEVIATIONS that a given value is ABOVE/BELOW the MEAN

Z = OBSERVATION - MEAN / STANDARD DEVIATION

52
Q

a POSTIVE and NEGATIVE Z SCORE indicates…

A

Positive: Observation is ABOVE the Mean

Negative: Observation is BELOW the mean

53
Q

what does a Z SCORE of 2 say

A

that the data value is 2 STANDARD DEVIATIONS ABOVE the MEAN

(-2 means 2 s BELOW mean)

54
Q

Z SCORES allows us to tell..

A

how UNUSUAL an observation is

LARGER Z SCORE (positive or negative) = MORE UNUSUAL

(-1.3 is more unusual than 1.2)

55
Q

an observation from a BELL-SHAPED distribution is a POTENTIAL OUTLIER if its Z SCORE is

A

BELOW - 3 or ABOVE 3

(3 standard deviations out)