Chapter 4: Displaying and Summarizing Quantitative Data Flashcards

1
Q

Define ‘Distribution’.

A

Slices up all the possible values of a quantitative variable into equal width bins and gives the number of values (or counts) falling into each bin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define ‘Histogram (relative frequency histogram)’.

A

Uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency (or relative frequency) of values falling in each bin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define ‘Gap’.

A

A region of the distribution where there are no values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define ‘Stem-and-leaf display’.

A

Shows quantitative data values in a way that sketches the distribution of the data. It’s best described in detail by example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define ‘Dotplot’.

A

Graphs a dot for each case against a single axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define ‘Quantitative data condition’.

A

The data are values of a quantitative variable whose units are known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define ‘Shape’.

A

To describe the shape of a distribution, look for:

  • single versus multiple modes
  • symmetry versus skewness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define ‘Centre’.

A

The place in the distribution of a variable that you’d point to if you wanted to attempt the impossible by summarizing the entire distribution with a single number. Measures of centre include the mean and median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define ‘Spread’.

A

A numerical summary of how tightly the values are clustered around the “centre”. Measures of spread include the standard deviation and IQR.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define ‘Mode’.

A

A hump or local high point in the shape of the distribution of a variable. The apparent location of modes can change as the scale of a histogram is changed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define ‘Unimodal’.

A

Having one mode. This is a useful term for describing the shape of a histogram when it’s generally mound-shaped. Distributions with two modes are called bimodal. Those with more than two are multimodal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define ‘Uniform’.

A

A distribution that’s roughly flat is said to be uniform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define ‘Symmetric’.

A

A distribution is symmetric if the two halves on either side of the centre look approximately like mirror images of each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define ‘Tails’.

A

The parts of a distribution that typically trail off on either side. Distributions can be characterised as having long tails (if they straggle off for some distance) or short tails (if they don’t).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define ‘Skewed’.

A

A distribution is skewed if it’s not symmetric and one tail stretches out farther than the other. Distributions are said to be skewed left (or negatively) when the longer tail stretches to the left (or in the negative direction), and skewed right (or positively) when I goes to the right (or in the positive direction).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define ‘Outliers’.

A

Extreme values that don’t appear to belong with the rest of the data. They may be unusual values that deserve further investigations or just mistakes; there’s no obvious way to tell. Don’t delete outliers automatically - you have to think about them. Outliers can affect many statistical analyses, so you should always be alert for them.

17
Q

Define ‘Mean’.

A

Found by summing all the data values and dividing by the count:
y-bar = (Total / n) - (sum(y) / n)
It is usually paired with the standard deviation.

18
Q

Define ‘Median’.

A

The middle value with half the data above and half below it. If n is even, it is the average of the two middle values. It is usually paired with the IQR.

19
Q

Define ‘Resistant’.

A

A calculated summary measure is said to be resistant if it is affected only a limited amount by any small portion of the data, such as outliers.

20
Q

Define ‘Range’.

A

The difference between the lowest and highest values in a data set.
Range = max - min

21
Q

Define ‘Variance’.

A

The sum of squared deviations from the mean, divided by the count minus one.
s^2 = sum(y - y-bar)^2/(n-1)

22
Q

Define ‘Standard deviation’.

A

The square root of the variance.
s = sqrt(sum(y - y-bar)^2/(n-1))
It is usually reported along with the mean.

23
Q

Define ‘Percentile’.

A

The ith percentile is the number that falls above i percent of the data.

24
Q

Define ‘Quartile’.

A

The lower quartile (Q1) is the value with a quarter of the data below it. The upper quartile (Q3 has three quarters of the data below it. The median and quartiles divide data into four parts of equal numbers of data values.

25
Q

Define ‘Interquartile range (IQR)’.

A

The difference between the first and third quartiles.
IQR = Q3 - Q1
It is usually reported along with the median and 5-number summary.

26
Q

Define ‘5-number summary’.

A

A useful summary of data consisting of minimum, Q1, median, Q3, and maximum. The quartiles, along with the extremes, show the difference in spread on each side of the median. The difference between the quartiles is the IQR.

27
Q

Define ‘Boxplot’.

A

Displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values. Boxplots are particularly effective for comparing groups and for displaying possible outliers.

28
Q

What are some ways to display the distribution of a quantitative variable?

A

With a histogram, stem-and-leaf display, or dotplot.

29
Q

What factors should you use to describe distributions of a quantitative variable?

A

In terms of their shape, centre, and spread.

30
Q

What are the different possible shapes of a distribution?

A
  • A symmetric distribution has roughly the same shape reflected around the centre.
  • A skewed distribution extends farther on one side than on the other.
  • A unimodal distribution has a single major hump or mode; bimodal… multimodal…
  • Outliers are values that lie far from the rest of the data.
  • Report any other unusual feature of the distribution, such as gaps.
31
Q

When should you use the mean? The median? Standard deviation? 5-number summary?

A

The mean is suitable for the centre of a unimodal, symmetric distribution, as is the standard deviation.
The median is better for skewed distribution or one that has outliers. A 5-number summary (which includes the median) should be used in these cases.

32
Q

How do you make a boxplot?

A

Use the 5-number summary to make a boxplot. A boxplot shows the quartiles as the upper and lower ends of a central box, the median as a line across the box, and “whiskers” that extend to the most extreme values that are not nominated as outliers. Boxplots display separately any case that is more than 1.5*IQR beyond each quartile. These cases should be considered as possible outliers.

33
Q

For right-skewed distribution, label from largest to smallest the mean, median, and mode. For left-skewed?

A

Right: ModeMedian>Mean