Chapter 2 - Descriptive Statistics Flashcards

1
Q

John Tukey

A
  • 1915 - 2000
  • exploratory data analysis (EDA) = boxplots, stem-and-leaf plots
  • coined terms such as bit and software
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Features of a good numeric or graphic form of data submission

A
  • self-contained
  • understandable without reading the text
  • clearly labeled of attributes with well-defined terms
  • indicate principal trends in data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Measures of location

A
  • also known as measures of central tendency
  • data summarization is important before any inferences can be made
  • measure of location is useful for data summarization that defines the center or middle of the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Arithmetic mean limitation

A
  • oversensitive to extreme values

- in which case, it may not be representative of the location of the majority of sample points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Symmetric distribution

A

arithmetic mean is approximately the same as the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Positively skewed distribution

A
  • tail end is on the right side

- arithmetic mean tends to be larger than the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Negatively skewed distribution

A
  • tail end is on the left side

- arithmetic mean tends to be smaller than the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

A
  • the most frequently occurring value among all the observations in a sample
  • data distributions may have one or more modes (unimodal, bimodal, trimodal, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Range

A
  • the difference between the largest and smallest observations in a sample
  • range is very sensitive to extreme observations or outliers
  • larger the sample size n, the larger the range tends to be and the more difficult the comparison between ranges from data sets of varying sizes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Quantiles or percentiles

A
  • a better approach than range to quantifying the spread in data sets is percentiles or quantiles
  • percentiles are less sensitive to outliers and are not greatly affected by the sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard deviation

A

standard deviation is a reasonable measure of spread if the distribution is bell-shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Grouped data

A
  • when sample size is too large to display all the raw data, data are frequently collected in grouped form
  • the simplest way to display the data is to generate a frequency distribution using a statistical package
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Frequency distribution

A
  • frequency distribution = ordered display of each value in a data set together with its frequency
  • if the number of unique sample values is large, then a frequency distribution may still be too detailed
  • if the data is too large, then the data is categorized into broader groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of grouped data

A
  • bar graphs
  • stem and leaf plots
  • box and whisker plot
  • scatter plot
  • histogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bar graphs

A
  • identity of the sample points within the respective groups is lost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Stem and leaf plots

A
  • easy to compute the median and other quantities
  • each data point is converted into stem and leaf
  • the collection of leaves indicates the shape of the data distribution
17
Q

Box and whisker plot

A
  • uses the relationships among the median, upper quartile, and lower quartile to describe the skewness or symmetry of a distribution
  • a vertical bar connects the upper quartile to the largest non-outlying value in the sample
  • a vertical bar connects the lower quartile to the smallest non-outlying value in the sample
18
Q

Box and whisker plot (symmetric)

A
  • upper and lower quartiles should be approximately equally spaced from the median
19
Q

Box and whisker plot (positively skewed)

A
  • upper quartile is farther from the median than the lower quartile
20
Q

Box and whisker plot (negatively skewed)

A
  • lower quartile is farther from the median than the upper quartile
21
Q

Box and whisker plot (outlying value)

A
  • x > upper quartile + 1.5 IQR

- x < lower quartile - 1.5 IQR

22
Q

Box and whisker plot (extreme outlying value)

A
  • x > upper quartile + 3.0 IQR

- x < lower quartile - 3.0 IQR