2 - Summarising Data Flashcards

1
Q

What does each row & each column represent?

A

Each row = an OBSERVATION (or record) & represents 1 person

Each column = a VARIABLE (e.g race, gender, DOB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 types of variables ?

A
  1. A nominal-scale variable
    - Values are categories w/out numerical ranking e.g country of residence
    - Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
    - A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE
  2. An ordinal-scale variable
    - Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer
  3. An interval-scale variable
    - Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB
  4. A ratio-scale variable
    - Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 types of variables ?

A
  1. A nominal-scale variable
  2. An ordinal-scale variable
  3. An interval-scale variable
  4. A ratio-scale variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A nominal-scale variable

A
  • Values are categories w/out numerical ranking e.g country of residence
  • Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
  • A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

An ordinal-scale variable

A
  • Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

An interval-scale variable

A
  • Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A ratio-scale variable

A
  • Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What kind of variables are nominal- & ordinal-scale variables ?

A

QUALITATIVE or CATEGORICAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What kind of variables are interval- & ratio-scale variables?

A

QUANTITATIVE or CONTINUOUS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Frequency distributions are represented in a histogram, with 3 main features. What are they?

A
  1. Central location (peak of distribution)
  2. Spread (how widely dispersed it is on both sides of peak)
  3. Shape (where it is approx symmetrical)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 3 measures of central location?

A
  1. Mean
  2. Median
  3. Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is spread & what are the 2 measures?

A

Aka variation or dispersion

  1. Range
  2. Standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 2 possible shapes of a frequency distribution?

A

skewed vs symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does skewness refer to? What does +vely or -vely skewed mean?

A

skewness refers to the TAIL, not the hump → so a distribution skewed to L has a long L tail

If skewed to R → +vely skewed
If skewed to L → -vely “

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the normal of Gaussian distribution?

A

Classic bell-shaped curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the median?

A

Middle value of a set of data thats been put into rank order, value that divides the data into 2 halves

50th percentile (of the distribution)

16
Q

What is the mean?

A

Aka average

Best descriptive measure for data that are normally distributed

17
Q

What is used instead of MEAN for data values which are skewed or have outliers?

A

MEDIAN

18
Q

How does one select to use mean, median or mode?

A
  1. Characteristics of data – eg normally distributed or skewed & with/without outliers
  2. Reason for calculating the measure – eg descriptive or analytical purposes
Mean = measure of choice when data are normally distributed 
Median = measure for data not normally “
19
Q

When data is not normally distributed, median is not preferred. True or false?

A

True
Mean uses all the data & is sensitive to outliers
Mode & median → unaffected by outliers

20
Q

What are the 3 measures of spread?

A
  1. Range
  2. IQR
  3. SD
21
Q

What are percentiles?

A

Divide data into distribution of 100 equal parts

Pth percentile (P goes from 0 to 100) = value that has P % of values falling at or below it → 90th percentile has 90% of values “ “

22
Q

What are quartiles?

A

= grouping data into 4 equal parts/quartile
Each quartile = 25% of the data
Cut-off for the 1st quartile is the 25th percentile
Cut-off “ “ 2nd “ = 50th “
(etc etc)

23
Q

What is the IQR (interquartile range)?

A

Measure of spread used most commonly w/ the median

Represents the central portion of the distribution, from the 25th percentile to 75th percentile

24
Q

The IQR is generally used in conjunction with what?

A

median → together, useful to characterize central location & spread of any freq distributions → but esp skewed (asym) ones

25
Q

What is a box plot?

A

graphical representation of locality, spread & skewness groups of numerical data thru their quartiles

26
Q

Uses of IQR

A

If distrib is non-symmetric – use range & IQR (so median goes together w/ range & IQR)

27
Q

What is standard deviation (SD)?

A

Variability in a set of data

Commonly used w/ mean

28
Q

When is SD used?

A

Only when data is normally distributed (i.e data falls into bell-shaped curve)

For normally distributed data:

  • Mean = recommended measure of central location
  • SD = “ “ of spread
29
Q

What is the standard error (se) of the mean?

A

Variability we may expect in means of repeated samples taken from the same population

Divide SD by square root of n

30
Q

How is se calculated?

A

Divide SD by square root of n

31
Q

What is standard error/se of mean used for?

A

Calculation of confidence intervals (confidence limits) around the mean

32
Q

What is “inference”?

A

Epidemiologists conducting studies to make generalizations about the larger population

33
Q

What does a narrow vs wide confidence interval (CI) mean?

A

Narrow CI → high precision
Wide CI → low precision

Narrower the interval, the more precise the estimate

Big studies → WIDE confidence intervals (more confident ab data obtained)
Small studies → NARROW “ “

34
Q

What are confidence intervals (CIs) used for?

A

calculated for means but ALSO for:
- proportions, rates, risk ratios, odds ratios (& other measures where purpose = draw inferences from a study to the population)