2 - Summarising Data Flashcards by Hebah Mirza

What does each row & each column represent?

Each row = an OBSERVATION (or record) & represents 1 person

Each column = a VARIABLE (e.g race, gender, DOB)

How well did you know this?

Not at all

Perfectly

What are the 4 types of variables ?

A nominal-scale variable
- Values are categories w/out numerical ranking e.g country of residence
- Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
- A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE
An ordinal-scale variable
- Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer
An interval-scale variable
- Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB
A ratio-scale variable
- Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days

How well did you know this?

Not at all

Perfectly

What are the 4 types of variables ?

A nominal-scale variable
An ordinal-scale variable
An interval-scale variable
A ratio-scale variable

How well did you know this?

Not at all

Perfectly

A nominal-scale variable

Values are categories w/out numerical ranking e.g country of residence
Nominal variables w/ only 2 categories are v common: alive/dead, ill/well, vax/unvax, smoked/didn’t smoke
A nominal variable w/ 2 mutually exclusive categories = DICHOTOMOUS VARIABLE

How well did you know this?

Not at all

Perfectly

An ordinal-scale variable

Has values that can be ranked but aren’t necessarily evenly spaced e.g stage of cancer

How well did you know this?

Not at all

Perfectly

An interval-scale variable

Measured on a scale of equally spaced units, but w/out a true 0 point e.g DOB

How well did you know this?

Not at all

Perfectly

A ratio-scale variable

Interval variable w/ a true 0 pt e.g height in cm, systolic bp in mmHg, duration of illness in days

How well did you know this?

Not at all

Perfectly

What kind of variables are nominal- & ordinal-scale variables ?

QUALITATIVE or CATEGORICAL

How well did you know this?

Not at all

Perfectly

What kind of variables are interval- & ratio-scale variables?

QUANTITATIVE or CONTINUOUS

How well did you know this?

Not at all

Perfectly

Frequency distributions are represented in a histogram, with 3 main features. What are they?

Central location (peak of distribution)
Spread (how widely dispersed it is on both sides of peak)
Shape (where it is approx symmetrical)

How well did you know this?

Not at all

Perfectly

What are the 3 measures of central location?

Mean
Median
Mode

How well did you know this?

Not at all

Perfectly

What is spread & what are the 2 measures?

Aka variation or dispersion

Range
Standard deviation

How well did you know this?

Not at all

Perfectly

What are the 2 possible shapes of a frequency distribution?

skewed vs symmetrical

How well did you know this?

Not at all

Perfectly

What does skewness refer to? What does +vely or -vely skewed mean?

skewness refers to the TAIL, not the hump → so a distribution skewed to L has a long L tail

If skewed to R → +vely skewed
If skewed to L → -vely “

How well did you know this?

Not at all

Perfectly

What is the normal of Gaussian distribution?

Classic bell-shaped curve

How well did you know this?

Not at all

Perfectly

What is the median?

Study These Flashcards

Middle value of a set of data thats been put into rank order, value that divides the data into 2 halves

50th percentile (of the distribution)

What is the mean?

Study These Flashcards

Aka average

Best descriptive measure for data that are normally distributed

What is used instead of MEAN for data values which are skewed or have outliers?

Study These Flashcards

MEDIAN

How does one select to use mean, median or mode?

Study These Flashcards

Characteristics of data – eg normally distributed or skewed & with/without outliers
Reason for calculating the measure – eg descriptive or analytical purposes

Mean = measure of choice when data are normally distributed 
Median = measure for data not normally “

When data is not normally distributed, median is not preferred. True or false?

Study These Flashcards

True
Mean uses all the data & is sensitive to outliers
Mode & median → unaffected by outliers

What are the 3 measures of spread?

Study These Flashcards

Range
IQR
SD

What are percentiles?

Study These Flashcards

Divide data into distribution of 100 equal parts

Pth percentile (P goes from 0 to 100) = value that has P % of values falling at or below it → 90th percentile has 90% of values “ “

What are quartiles?

Study These Flashcards

= grouping data into 4 equal parts/quartile
Each quartile = 25% of the data
Cut-off for the 1st quartile is the 25th percentile
Cut-off “ “ 2nd “ = 50th “
(etc etc)

What is the IQR (interquartile range)?

Study These Flashcards

Measure of spread used most commonly w/ the median

Represents the central portion of the distribution, from the 25th percentile to 75th percentile

The IQR is generally used in conjunction with what?

median → together, useful to characterize central location & spread of any freq distributions → but esp skewed (asym) ones

What is a box plot?

graphical representation of locality, spread & skewness groups of numerical data thru their quartiles

Uses of IQR

If distrib is non-symmetric – use range & IQR (so median goes together w/ range & IQR)

What is standard deviation (SD)?

Variability in a set of data | Commonly used w/ mean

When is SD used?

Only when data is normally distributed (i.e data falls into bell-shaped curve) For normally distributed data: - Mean = recommended measure of central location - SD = “ “ of spread

What is the standard error (se) of the mean?

Variability we may expect in means of repeated samples taken from the same population Divide SD by square root of n

How is se calculated?

Divide SD by square root of n

What is standard error/se of mean used for?

Calculation of confidence intervals (confidence limits) around the mean

What is "inference"?

Epidemiologists conducting studies to make generalizations about the larger population

What does a narrow vs wide confidence interval (CI) mean?

Narrow CI → high precision Wide CI → low precision Narrower the interval, the more precise the estimate Big studies → WIDE confidence intervals (more confident ab data obtained) Small studies → NARROW “ “

What are confidence intervals (CIs) used for?

calculated for means but ALSO for: - proportions, rates, risk ratios, odds ratios (& other measures where purpose = draw inferences from a study to the population)

2 - Summarising Data Flashcards

(35 cards)