Summarizing Data Flashcards

1
Q

Any characteristic that differs from person to person, such as height, sex, smallpox vaccination status, or physical activity pattern.

The value of a variable is the number or descriptor that applies to a particular person

A

Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Epidemiologic database organized like a spreadsheet with rows and columns

A

Line listing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Each row representing one person or case of disease

A

Record or observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Column contains information about one characteristic of the individual such as race or date of birth

A

Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Categorical variable

Qualitative

A

Nominal

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous

Quantitative

A

Interval

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Categories without any numerical ranking such as county of residence

Alive or dead
Ill or well

A

Nominal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal variable with two mutually exclusive categories

Ill or well

A

Dichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Values that can be ranked but are not necessarily evenly spaced

Stage of cancer

A

Ordinal-scale variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measured on a scale of equally spaced units, but without a true zero point such as date of birth

A

Interval-scale variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interval variable with true zero point,

height in centimeters or duration of illness

A

Ratio-scale variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Where the distribution has its peak

Clustering at a particular value

A

Central location

Central tendency of a frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How widely dispered it is on both sides of the peak

Variation, dispersion

Distribution out from a central value

Independent of its central location

A

Spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bell shaped curve

A

Normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three measures of central location

A

Mean
Median
Mode

Midrange
Geometric mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Third property of a frequency distribution where it may be asymmetrical or symmetric

A

Shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The tail of bell and not the hump

A

Skewness

Long tail to the left
Skewed to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Distribution that has a central location to the left and a tail off to the right is said to be

A

positively skewed

skewed to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Common in distributions that begin with 0

ex number of servings consumed, number of sexual partners

A

Skewed to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Classic or symmetrical bell-shaped curve

Defined by a mathematical equation

Mean, median and mode coincide at the central peak but the area under the curve helps determine measures of spread such as the standard deviation and confidence interval

A

Normal distribution

Gaussian distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Types of variable that may be summarized in ratio or proportion

A

Nominal
Ordinal
Interval
Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of variable where measures of central location may be employed

A

Interval

Ratio

23
Q

Types of variable where measures of central location may be employed

A

Interval

Ratio

24
Q

Provides a single value that summarizes an entire distribution of data

A

Measure of central location

Ave age of affected

25
Q

Selecting the best measure to use for a given distribution depends largely on two factors:

A

Shape or skewness of distribution

Intended use of measure

26
Q

Value that occurs most often in a set of data

A

Mode

27
Q

If the frequency distribution can have more than one mode

A

Bi-modal

28
Q

In a histogram, the mode is the

A

Tallest column

29
Q

Preferred measure of central location for addressing which value is the most popular or the most common

Used almost exclusively as descriptive measure

It is not typically affected by one or two extreme values (outliers)

A

Mode

30
Q

Middle value of a set of data that has been put into rank order

Value that divides the data into two halves with one half of the observations being smaller than the median value and the other half being larger

50th percentile of distribution

A

Median

31
Q

Middle position =

A

(n+1)/2

If odd, middle position falls on single observation, median is the value of that observation

If even, middle position falls between two observations, median equals the average of the two values

32
Q

Good descriptive measure for data that are skewed because it is the central point of distribution

Not generaly affected by extremes (outliers)

A

Median

33
Q

Value that is closest to all other values in a distribution

Add all observed values in the distribution

Divide the sum by the number of observations

A

Mean

34
Q

When the mean is subtracted from each observation in the data set, the sum of these differences is zero

Also called center of gravity

Point at which the distribution would balance

Not a good measure for severely skewed data or have extreme values in one direction or another

Affected by extreme value because the mean uses all of the observations in the distribution

A

Centering property of the mean

35
Q

Halfway point or the midpoint of a set of observations

Calculated as intermediate step in determining other measures

Identify the smallest (minimum) observation and the largest (maximum) observation

Add the minimum + maximum, then divide by two

A

Midrange

36
Q

Mean or average of a set of data measured on a logarithmic scale

Used when the logarithms of the observations are distributed normally (symmetrically) rather than the observations themselves

A

Geometric mean

37
Q

Uses all data but not as sensitive to outliers as arithmetic mean

A

Geometric mean

38
Q

Most sensitive to outliers

A

Midrange

39
Q

Describe the dispersion (or variation) of values from that peak in the distribution

A

Measures of spread

40
Q

Measures of spread

A

Range
Interquartile range
Standard deviation

41
Q

Difference between its largest (maximum) value and its smallest (minimum value)

From the minimim to maximum

A

Range

42
Q

Divide the data in a distribution into 100 equal parts

90th percentile has 90% of the observations at or below it

A

Percentile

43
Q

Messure of spread most commonly used with median

Central portion of distribution from 25th to 75th percentile

A

Interquartile range

44
Q

Measure of spread used most commonly with the arithmetic mean

Subtracting the mean from each observation
The difference between the mean and each observation is squared to eliminate negative numbers
Average is caculated and square root is taken to get back

Variability of data

A

Standard deviation

45
Q

Calculated when the data is more-or-less normally distributed ie data fal into a typical bell shaped curve

Recommended measure of spread

A

Standard deviation

46
Q

Variability we might expect in the arithmetic means of repeated samples taken from the same population

Assumes that the data you have is actually a sample from a larger population

Calculates confidence intervals around arithmetic mean

A

Standard error of mean

47
Q

Indicates a measurement’s precision

Based on the mean itself and some multiple standard of error (variability of means that might be calculated from repeated samples from the same population)

A

Confidence interval

48
Q

Regardless of how data are distributed, means (particularly from large samples) tend to be normally distibuted

A

Central Limit Theorem

49
Q

Range of values consistent with data from a study

A guide to the variability in the study

A

Confidence intervals

50
Q

Distribution where the mean, median and mode would have the same values

A

Bell shaped curve

Normal distribution

51
Q

Normal type of distribution

MCL?
MOS?

A

Arithmetic mean

Standard deviation

52
Q

Asymmetrical or skewed type of distribution

MCL?
MOS?

A

Median

Range or interquartile range

53
Q

Exponential or logarithmic type of distribution

MCL?
MOS?

A

Geometric mean

Geometric standard