Chapter 3: Displaying and Summarizing Quantitative Data Flashcards
Define ‘Distribution’.
Slices up all the possible values of a quantitative variable into equal width bins and gives the number of values (or counts) falling into each bin.
Define ‘Histogram (relative frequency histogram)’.
Uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency (or relative frequency) of values falling in each bin.
Define ‘Gap’.
A region of the distribution where there are no values.
Define ‘Stem-and-leaf display’.
Shows quantitative data values in a way that sketches the distribution of the data. It’s best described in detail by example.
Define ‘Dotplot’.
Graphs a dot for each case against a single axis.
Define ‘Quantitative data condition’.
The data are values of a quantitative variable whose units are known.
Define ‘Shape’.
To describe the shape of a distribution, look for:
- single versus multiple modes
- symmetry versus skewness
Define ‘Centre’.
The place in the distribution of a variable that you’d point to if you wanted to attempt the impossible by summarizing the entire distribution with a single number. Measures of centre include the mean and median.
Define ‘Spread’.
A numerical summary of how tightly the values are clustered around the “centre”. Measures of spread include the standard deviation and IQR.
Define ‘Mode’.
A hump or local high point in the shape of the distribution of a variable. The apparent location of modes can change as the scale of a histogram is changed.
Define ‘Unimodal’.
Having one mode. This is a useful term for describing the shape of a histogram when it’s generally mound-shaped. Distributions with two modes are called bimodal. Those with more than two are multimodal.
Define ‘Uniform’.
A distribution that’s roughly flat is said to be uniform.
Define ‘Symmetric’.
A distribution is symmetric if the two halves on either side of the centre look approximately like mirror images of each other.
Define ‘Tails’.
The parts of a distribution that typically trail off on either side. Distributions can be characterised as having long tails (if they straggle off for some distance) or short tails (if they don’t).
Define ‘Skewed’.
A distribution is skewed if it’s not symmetric and one tail stretches out farther than the other. Distributions are said to be skewed left (or negatively) when the longer tail stretches to the left (or in the negative direction), and skewed right (or positively) when I goes to the right (or in the positive direction).
Define ‘Outliers’.
Extreme values that don’t appear to belong with the rest of the data. They may be unusual values that deserve further investigations or just mistakes; there’s no obvious way to tell. Don’t delete outliers automatically - you have to think about them. Outliers can affect many statistical analyses, so you should always be alert for them.
Define ‘Mean’.
Found by summing all the data values and dividing by the count:
y-bar = (Total / n) - (sum(y) / n)
It is usually paired with the standard deviation.
Define ‘Median’.
The middle value with half the data above and half below it. If n is even, it is the average of the two middle values. It is usually paired with the IQR.
Define ‘Resistant’.
A calculated summary measure is said to be resistant if it is affected only a limited amount by any small portion of the data, such as outliers.
Define ‘Range’.
The difference between the lowest and highest values in a data set.
Range = max - min
Define ‘Variance’.
The sum of squared deviations from the mean, divided by the count minus one.
s^2 = sum(y - y-bar)^2/(n-1)
Define ‘Standard deviation’.
The square root of the variance.
s = sqrt(sum(y - y-bar)^2/(n-1))
It is usually reported along with the mean.
Define ‘Percentile’.
The ith percentile is the number that falls above i percent of the data.
Define ‘Quartile’.
The lower quartile (Q1) is the value with a quarter of the data below it. The upper quartile (Q3 has three quarters of the data below it. The median and quartiles divide data into four parts of equal numbers of data values.