Chapter 3 Numerically Summarizing Data Flashcards

1
Q

Measures of Central Tendency

A

Give a feel for where the center of gravity of the data set is: Mean, Median, Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Arithmetic Mean

A

The average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Median

A

the value that lies in the middle of the data set when arranged in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode

A

The most frequent observation of the variable that occurs a the data set

A data set can have no mode, one mode, or more than one mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The symbol for the mean of a population

A

The Greek letter mu (μ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The symbol for the mean of a sample

A

(X ̅) “X-Hat”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The symbol for the median of a population

A

M

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The symbol for the median of a sample

A

m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Relation between the mean, median, and a distribution shape that is skewed to the left

A

Mean is substantially smaller than the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Relation between the mean, median, and a distribution shape that is symmetric

A

Mean roughly equal to median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Relation between the mean, median, and a distribution shape that is skewed to the right

A

Mean substantially larger than median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does it mean when it is said that a data set is resistant?

A

Extreme values (very large or small) relative to the data do not affect its value substantially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outlier

A

A data point that differs significantly from other observations. Results in a skewed distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the better measure of central tendency when the distribution is skewed?

A

The median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Measures of Dispersion

A

Show the degree to which the data in a population or sample is spread out: range, standard deviation, variance, interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Range

A

The difference between the largest data value and the smallest data value in a data set. Denoted as R

Range is not resistant to outlier values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Population Variance

A

The sum of the squared deviations from the population mean divided by the number of observations in the population, N. Denoted by the greek letter sigma squared (σ^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Population Standard Deviation

A

The positive square root of the population variance. Denoted by the Greek letter sigma (σ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The population variance and standard deviation are

A

Parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sample Variance

A

The sum of the squared deviations from the population mean divided by the size of the sample MINUS 1 (n - 1). Denoted by s squared (s^2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sample Standard Deviation

A

The positive square root of the sample variance. Denoted by s

22
Q

The sample variance and standard deviation are

A

Statistics

23
Q

When calculating the sample variance, the denominator is

A

n - 1

24
Q

The Empirical Rule

A

States that, if the summary measures of mean (μ) and standard deviation (σ) are known, and if the distribution is approximately bell-shaped:
≈ 68% of the data will lie within ±1σ of the mean
≈ 95% of the data will lie within ±2σ of the mean
≈ 99.7% of the data will lie within ±3σ of the mean

25
Q

Outlier for a bell-shaped distribution

A

Any data point less than -3 standard deviations (3σ) from the mean or more than 3σ from the mean

26
Q

Z-Score

A

Represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation. Round z-scores to the nearest hundredth.

27
Q

Z-Score for a data value in a population

A

= (x-μ)/σ, where μ is the population mean and σ is the

population standard deviation

28
Q

Z-Score for a data value in a sample

A

z= (x-X ̅)/s, where X ̅ is the sample mean and s is the sample standard deviation

29
Q

kth percentile

A

A value such that at least k percent of the observations are less than or equal to this value, and at least (100-k) percent of the observations are greater than or equal to this value. Denoted Pk

30
Q

1st quartile

A

Denoted Q1, divides the bottom 25% of the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile.

31
Q

2nd quartile

A

Denoted Q2, divides the bottom 50% of the data from the top 50% of the data, so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to the median.

32
Q

3rd quartile

A

Denoted Q3, divides the bottom 75% of the data from the top 25% of the data, so that the 3rd quartile is equivalent to the 75th percentile.

33
Q

Method for determining quartiles using Excel

A

To find Q1: =QUARTILE.EXC (highlight the data, 1)
To find Q2 (The Median, M): =QUARTILE.EXC (highlight the data, 2)
To find Q3: =QUARTILE.EXC (highlight the data, 3)

34
Q

Method for determining quartiles by Inspection

A

If the data set is relatively small, the direct “by inspection” method can be used:
Step 1: Arrange the data in ascending order.
Step 2: Determine the median, M, or second
quartile, Q2 .
Step 3: Divide the data set into halves: the
observations below (to the left of) M and the
observations above M. The first quartile, Q1 , is the
median of the bottom half, and the third quartile, Q3,
is the median of the top half.

35
Q

Interquartile range

A

Defines the range of the middle 50% of the observations in a data set. Denoted as IQR = Q3 – Q1

36
Q

When is it best to use the median as the measure of central tendency and the interquartile range as the measure of dispersion and why?

A

When the distribution of data is highly skewed or contains extreme observations; because these measures are resistant.

37
Q

Method for checking a data set for outliers

A

Step 1. Determine the first and third quartiles of the data.
Step 2: Compute the interquartile range.
Step 3: Determine the fences. Fences serve as cutoff
points for determining outliers.

         Lower fence = Q1 − 1.5 (IQR)
         Upper fence = Q3 + 1.5 (IQR)

Step 4: If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.

38
Q

Five-number summary

A

Consists of the smallest data value, Q1, the median, Q3, and the largest data value. Used to learn information about the extremes of the data set.

39
Q

Method for constructing a box plot using the TI-84 calculator

A

Step 1: Type data into L1
Step 2: 2nd > STAT PLOT
Step 3: Select PLOT 1 and set it to ON
Step 4: Select the box plot with outliers graph (4th from left)
Step 5: Press GRAPH button
Step 6: If graph is not visible: ZOOM > 9: ZOOM STAT

40
Q

Resistant

A

A numerical summary of data is said to be resistant if extreme observations (very large or small) relative to the data do not affect its value substantially.

41
Q

Multimodal

A

Describes a data set that has three or more values that occur with the highest frequency

42
Q

Bias

A

Occurs whenever a statistic consistently underestimates or overestimates a parameter

43
Q

Degrees of freedom

A

For the sample standard deviation, we call n−1 the degrees of freedom because the first n−1 observations have freedom to be whatever value they wish, but the nth observation has no freedom. It must be whatever value forces the sum of the deviations about the mean to equal zero.

44
Q

Describe the Distribution

A

Means to describe its shape (skewed left, skewed right, or symmetric), its center (mean or median), and its spread (standard deviation or interquartile range).

45
Q

Boxplot

A

A graphical summary of quantitative data used to identify the shape of a distribution and outliers.

46
Q

Bimodal

A

Describes a data set that has two values that occur with the highest frequency

47
Q

Deviation about the mean

A

For the ith observation in a population: xi – μ.

For the ith observation in a sample: xi − x ̅

48
Q

No mode

A

Occurs If no observation occurs more than once in a data set

49
Q

Quartiles

A

Divide data sets into fourths, or four equal parts.

50
Q

Measures of Position

A

z-sore, percentiles, outliers

51
Q

Method for Checking for Outliers by Using Quartiles

A

Step 1. Determine the first and third quartiles of the data.
Step 2: Compute the interquartile range.
Step 3: Determine the fences: LF = Q1 - 1.5 (IQR);
UF = Q3 + 1.5 (IQR)
Step 4: If a data point value is less than the lower fence or greater than the upper fence, it is considered an outlier.