Summarizing Data Flashcards by Denisse Tiangco

Any characteristic that differs from person to person, such as height, sex, smallpox vaccination status, or physical activity pattern.

The value of a variable is the number or descriptor that applies to a particular person

Variable

How well did you know this?

Not at all

Perfectly

Epidemiologic database organized like a spreadsheet with rows and columns

Line listing

How well did you know this?

Not at all

Perfectly

Each row representing one person or case of disease

Record or observation

How well did you know this?

Not at all

Perfectly

Column contains information about one characteristic of the individual such as race or date of birth

Variable

How well did you know this?

Not at all

Perfectly

Categorical variable

Qualitative

Nominal

Ordinal

How well did you know this?

Not at all

Perfectly

Continuous

Quantitative

Interval

Ratio

How well did you know this?

Not at all

Perfectly

Categories without any numerical ranking such as county of residence

Alive or dead
Ill or well

Nominal scale

How well did you know this?

Not at all

Perfectly

Nominal variable with two mutually exclusive categories

Ill or well

Dichotomous

How well did you know this?

Not at all

Perfectly

Values that can be ranked but are not necessarily evenly spaced

Stage of cancer

Ordinal-scale variable

How well did you know this?

Not at all

Perfectly

Measured on a scale of equally spaced units, but without a true zero point such as date of birth

Interval-scale variable

How well did you know this?

Not at all

Perfectly

Interval variable with true zero point,

height in centimeters or duration of illness

Ratio-scale variable

How well did you know this?

Not at all

Perfectly

Where the distribution has its peak

Clustering at a particular value

Central location

Central tendency of a frequency distribution

How well did you know this?

Not at all

Perfectly

How widely dispered it is on both sides of the peak

Variation, dispersion

Distribution out from a central value

Independent of its central location

Spread

How well did you know this?

Not at all

Perfectly

Bell shaped curve

Normal distribution

How well did you know this?

Not at all

Perfectly

Three measures of central location

Mean
Median
Mode

Midrange
Geometric mean

How well did you know this?

Not at all

Perfectly

Third property of a frequency distribution where it may be asymmetrical or symmetric

Shape

How well did you know this?

Not at all

Perfectly

The tail of bell and not the hump

Skewness

Long tail to the left
Skewed to the left

How well did you know this?

Not at all

Perfectly

Distribution that has a central location to the left and a tail off to the right is said to be

positively skewed

skewed to the right

How well did you know this?

Not at all

Perfectly

Common in distributions that begin with 0

ex number of servings consumed, number of sexual partners

Skewed to the right

How well did you know this?

Not at all

Perfectly

Classic or symmetrical bell-shaped curve

Defined by a mathematical equation

Mean, median and mode coincide at the central peak but the area under the curve helps determine measures of spread such as the standard deviation and confidence interval

Normal distribution

Gaussian distribution

How well did you know this?

Not at all

Perfectly

Types of variable that may be summarized in ratio or proportion

Nominal
Ordinal
Interval
Ratio

How well did you know this?

Not at all

Perfectly

Types of variable where measures of central location may be employed

Study These Flashcards

Interval

Ratio

Types of variable where measures of central location may be employed

Study These Flashcards

Interval

Ratio

Provides a single value that summarizes an entire distribution of data

Study These Flashcards

Measure of central location

Ave age of affected

Selecting the best measure to use for a given distribution depends largely on two factors:

Shape or skewness of distribution | Intended use of measure

Value that occurs most often in a set of data

Mode

If the frequency distribution can have more than one mode

Bi-modal

In a histogram, the mode is the

Tallest column

Preferred measure of central location for addressing which value is the most popular or the most common Used almost exclusively as descriptive measure It is not typically affected by one or two extreme values (outliers)

Mode

Middle value of a set of data that has been put into rank order Value that divides the data into two halves with one half of the observations being smaller than the median value and the other half being larger 50th percentile of distribution

Median

Middle position =

(n+1)/2 If odd, middle position falls on single observation, median is the value of that observation If even, middle position falls between two observations, median equals the average of the two values

Good descriptive measure for data that are skewed because it is the central point of distribution Not generaly affected by extremes (outliers)

Median

Value that is closest to all other values in a distribution Add all observed values in the distribution Divide the sum by the number of observations

Mean

When the mean is subtracted from each observation in the data set, the sum of these differences is zero Also called center of gravity Point at which the distribution would balance Not a good measure for severely skewed data or have extreme values in one direction or another Affected by extreme value because the mean uses all of the observations in the distribution

Centering property of the mean

Halfway point or the midpoint of a set of observations Calculated as intermediate step in determining other measures Identify the smallest (minimum) observation and the largest (maximum) observation Add the minimum + maximum, then divide by two

Midrange

Mean or average of a set of data measured on a logarithmic scale Used when the logarithms of the observations are distributed normally (symmetrically) rather than the observations themselves

Geometric mean

Uses all data but not as sensitive to outliers as arithmetic mean

Geometric mean

Most sensitive to outliers

Midrange

Describe the dispersion (or variation) of values from that peak in the distribution

Measures of spread

Range Interquartile range Standard deviation

Difference between its largest (maximum) value and its smallest (minimum value) From the minimim to maximum

Range

Divide the data in a distribution into 100 equal parts 90th percentile has 90% of the observations at or below it

Percentile

Messure of spread most commonly used with median Central portion of distribution from 25th to 75th percentile

Interquartile range

Measure of spread used most commonly with the arithmetic mean Subtracting the mean from each observation The difference between the mean and each observation is squared to eliminate negative numbers Average is caculated and square root is taken to get back Variability of data

Standard deviation

Calculated when the data is more-or-less normally distributed ie data fal into a typical bell shaped curve Recommended measure of spread

Standard deviation

Variability we might expect in the arithmetic means of repeated samples taken from the same population Assumes that the data you have is actually a sample from a larger population Calculates confidence intervals around arithmetic mean

Standard error of mean

Indicates a measurement’s precision Based on the mean itself and some multiple standard of error (variability of means that might be calculated from repeated samples from the same population)

Confidence interval

Regardless of how data are distributed, means (particularly from large samples) tend to be normally distibuted

Central Limit Theorem

Range of values consistent with data from a study A guide to the variability in the study

Confidence intervals

Distribution where the mean, median and mode would have the same values

Bell shaped curve | Normal distribution

Normal type of distribution MCL? MOS?

Arithmetic mean | Standard deviation

Asymmetrical or skewed type of distribution MCL? MOS?

Median | Range or interquartile range

Exponential or logarithmic type of distribution MCL? MOS?

Geometric mean | Geometric standard

Summarizing Data Flashcards

(53 cards)