Chapter 2 - Descriptive Statistics Flashcards
1
Q
John Tukey
A
- 1915 - 2000
- exploratory data analysis (EDA) = boxplots, stem-and-leaf plots
- coined terms such as bit and software
2
Q
Features of a good numeric or graphic form of data submission
A
- self-contained
- understandable without reading the text
- clearly labeled of attributes with well-defined terms
- indicate principal trends in data
3
Q
Measures of location
A
- also known as measures of central tendency
- data summarization is important before any inferences can be made
- measure of location is useful for data summarization that defines the center or middle of the sample
4
Q
Arithmetic mean limitation
A
- oversensitive to extreme values
- in which case, it may not be representative of the location of the majority of sample points
5
Q
Symmetric distribution
A
arithmetic mean is approximately the same as the median
6
Q
Positively skewed distribution
A
- tail end is on the right side
- arithmetic mean tends to be larger than the median
7
Q
Negatively skewed distribution
A
- tail end is on the left side
- arithmetic mean tends to be smaller than the median
8
Q
Mode
A
- the most frequently occurring value among all the observations in a sample
- data distributions may have one or more modes (unimodal, bimodal, trimodal, etc.)
9
Q
Range
A
- the difference between the largest and smallest observations in a sample
- range is very sensitive to extreme observations or outliers
- larger the sample size n, the larger the range tends to be and the more difficult the comparison between ranges from data sets of varying sizes
10
Q
Quantiles or percentiles
A
- a better approach than range to quantifying the spread in data sets is percentiles or quantiles
- percentiles are less sensitive to outliers and are not greatly affected by the sample size
11
Q
Standard deviation
A
standard deviation is a reasonable measure of spread if the distribution is bell-shaped
12
Q
Grouped data
A
- when sample size is too large to display all the raw data, data are frequently collected in grouped form
- the simplest way to display the data is to generate a frequency distribution using a statistical package
13
Q
Frequency distribution
A
- frequency distribution = ordered display of each value in a data set together with its frequency
- if the number of unique sample values is large, then a frequency distribution may still be too detailed
- if the data is too large, then the data is categorized into broader groups
14
Q
Types of grouped data
A
- bar graphs
- stem and leaf plots
- box and whisker plot
- scatter plot
- histogram
15
Q
Bar graphs
A
- identity of the sample points within the respective groups is lost