a listing of these five values: minimum, Q1, median, Q3, and maximum)

chapter 2: describing distributions with numbers Flashcards by Brendan Hughes

measures

results of functions applied to the data

How well did you know this?

Not at all

Perfectly

the number of observations in our dataset

How well did you know this?

Not at all

Perfectly

mode

value that appears most often

How well did you know this?

Not at all

Perfectly

we call the dataset “bimodal” or “multimodal” when…

when many values appear the same number of times, and sometimes there will be two or more modes

How well did you know this?

Not at all

Perfectly

𝑥_𝑖

the value of the 𝑖^th observation in an ordered dataset

How well did you know this?

Not at all

Perfectly

The median M

the observation that has just as many observations to the left of it as to the right of it, or the value in our dataset that is greater than just as many values in our dataset as it is less than. To find its location (not its value) you use (n+1)/2.

How well did you know this?

Not at all

Perfectly

The minimum (or min) and maximum (or max)

the first and last in the list–or the smallest and the greatest values in our dataset - respectively

How well did you know this?

Not at all

Perfectly

range

The difference between the max and the min

How well did you know this?

Not at all

Perfectly

The first and third quartiles, (𝑄₁ and 𝑄₃)

the median of the values less than the median and the median of the values greater than the median, respectively. You calculate quartiles the way you calculate the median M

How well did you know this?

Not at all

Perfectly

five-number summary

a listing of these five values: minimum, Q₁, median, Q₃, and maximum)

How well did you know this?

Not at all

Perfectly

box plot

A visual representation of the five-number summary

How well did you know this?

Not at all

Perfectly

inter-quartile range (IQR)

the difference between the third and first quartiles,

IQR = 𝑄₃ − 𝑄₁

How well did you know this?

Not at all

Perfectly

Outlier Rule

If an observation has a value greater than 𝑄3 + (1.5 × 𝐼𝑄𝑅) or less than 𝑄1– (1.5 × 𝐼𝑄𝑅), then it can be considered an outlier

How well did you know this?

Not at all

Perfectly

The five-number summary is ideal for [blank]

skewed data or data with outliers

How well did you know this?

Not at all

Perfectly

True or false: Boxplots of multiple populations can be graphs together to compare their means and spreads

True

How well did you know this?

Not at all

Perfectly

mean

Study These Flashcards

an alternate measure of center, and will have the same units as our observations. Notice that when a median or quartile falls between two observations, we use the mean of their values

the mean x̄ formula

Study These Flashcards

x with a line over it (the mean) = 1 divided by n (the number of observations) times—i.e. the following divided by n— capital Sigma (the sum of) x(dropped I) (each observation in the ordered) dataset. In other words, the average of the observations.

standard deviation s

Study These Flashcards

an alternate measure of variability, and will also have the same units as the observations. Standard deviation is an “average” of how far observation values are from the mean of the dataset. It is the square root of the variance s²

the variance of the dataset s²

Study These Flashcards

The variance s^₂ of a set of observations is an average of the squares of the deviations of the observations from their mean.

the formula for standard deviation

Study These Flashcards

the square root of: 1 divided by n(number of observations) minus 1times—i.e. the following divided by n-1)—Sigma(the sum of) x(dropped i) (each observation) minus the mean, squared

Study These Flashcards

The standard deviation. This measures variability about the mean and should be used only when the mean is chosen as the measure of center. s is always zero or greater than zero. s = 0 only when there is no variability. s has the same units of measurement as the original observations.

resistant measures

Study These Flashcards

depend only upon the ordering of the data

non-resistant measures

Study These Flashcards

depend on the particular values of the observations (mean, standard deviation, and variance)

the mean and the median will coincide if…

Study These Flashcards

our distribution is symmetric

the mean will be further out in the tail if...

our distribution is skewed

proportions for our dataset

e.g. the number of observations witha value less than a given value over the size of the dataset

*Σ*

(capital sigma) means add them all up

*x̄*

the mean, or numeric average of the observations, add the values and divide by number of observations

True or false: the mean x̄ is resistant to outliers

True. Because the mean cannot resist the influence of extreme observations, we say that it is not a resistant measure of center.

*n* - 1

The **degrees of freedom** of the variance or standard deviation. When finding the variance, we divide the sum by one fewer than the number of observations. The reason is that the deviations *x_i - x̄* always sum to exactly 0 so that knowing *n* − 1 of them determines the last one.

choosing a summary (the *five number summary* vs. *x̄* and *s*)

The fivenumber summary is usually better than the mean and standard deviation for describing a skewed distribution or a distribution with strong outliers. Use *x̄* and *s* only for reasonably symmetric distributions that are free of outliers.

what should you do when you find an outlier?

try to find an explanation for it

What is the four step process?

1. **STATE:** What is the practical question, in the context of the real-world setting? 2. **PLAN:** What specific statistical operations does this problem call for? 3. **SOLVE:** Make the graphs and carry out the calculations needed for this problem. 4. **CONCLUDE:** Give your practical conclusion in the setting of the real-world problem.

chapter 2: describing distributions with numbers Flashcards

(33 cards)