chapter 2: describing distributions with numbers Flashcards

1
Q

measures

A

results of functions applied to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

n

A

the number of observations in our dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

mode

A

value that appears most often

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

we call the dataset “bimodal” or “multimodal” when…

A

when many values appear the same number of times, and sometimes there will be two or more modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

𝑥𝑖

A

the value of the 𝑖th observation in an ordered dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The median M

A

the observation that has just as many observations to the left of it as to the right of it, or the value in our dataset that is greater than just as many values in our dataset as it is less than. To find its location (not its value) you use (n+1)/2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The minimum (or min) and maximum (or max)

A

the first and last in the list–or the smallest and the greatest values in our dataset - respectively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

range

A

The difference between the max and the min

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The first and third quartiles, (𝑄1 and 𝑄3)

A

the median of the values less than the median and the median of the values greater than the median, respectively. You calculate quartiles the way you calculate the median M

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

five-number summary

A

a listing of these five values: minimum, Q1, median, Q3, and maximum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

box plot

A

A visual representation of the five-number summary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

inter-quartile range (IQR)

A

the difference between the third and first quartiles,

or

IQR = 𝑄3 − 𝑄1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outlier Rule

A

If an observation has a value greater than 𝑄3 + (1.5 × 𝐼𝑄𝑅) or less than 𝑄1– (1.5 × 𝐼𝑄𝑅), then it can be considered an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The five-number summary is ideal for [blank]

A

skewed data or data with outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True or false: Boxplots of multiple populations can be graphs together to compare their means and spreads

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

mean

A

an alternate measure of center, and will have the same units as our observations. Notice that when a median or quartile falls between two observations, we use the mean of their values

17
Q

the mean formula

A

x with a line over it (the mean) = 1 divided by n (the number of observations) times—i.e. the following divided by n— capital Sigma (the sum of) x(dropped I) (each observation in the ordered) dataset. In other words, the average of the observations.

18
Q

standard deviation s

A

an alternate measure of variability, and will also have the same units as the observations. Standard deviation is an “average” of how far observation values are from the mean of the dataset. It is the square root of the variance s2

19
Q

the variance of the dataset s2

A

The variance s2 of a set of observations is an average of the squares of the deviations of the observations from their mean.

20
Q

the formula for standard deviation

A

the square root of: 1 divided by n(number of observations) minus 1times—i.e. the following divided by n-1)—Sigma(the sum of) x(dropped i) (each observation) minus the mean, squared

21
Q

s

A

The standard deviation. This measures variability about the mean and should be used only when the mean is chosen as the measure of center. s is always zero or greater than zero. s = 0 only when there is no variability. s has the same units of measurement as the original observations.

22
Q

resistant measures

A

depend only upon the ordering of the data

23
Q

non-resistant measures

A

depend on the particular values of the observations (mean, standard deviation, and variance)

24
Q

the mean and the median will coincide if…

A

our distribution is symmetric

25
Q

the mean will be further out in the tail if…

A

our distribution is skewed

26
Q

proportions for our dataset

A

e.g. the number of observations witha value less than a given value over the size of the dataset

27
Q

Σ

A

(capital sigma) means add them all up

28
Q

A

the mean, or numeric average of the observations, add the values and divide by number of observations

29
Q

True or false: the mean x̄ is resistant to outliers

A

True. Because the mean cannot resist the influence of extreme observations, we say that it is not a resistant measure of center.

30
Q

n - 1

A

The degrees of freedom of the variance or standard deviation. When finding the variance, we divide the sum by one fewer than the number of observations. The reason is that the deviations xi - x̄ always sum to exactly 0 so that knowing n − 1 of them determines the last one.

31
Q

choosing a summary (the five number summary vs. and s)

A

The fivenumber summary is usually better than the mean and standard deviation for describing a skewed distribution or a distribution with strong outliers. Use and s only for reasonably symmetric distributions that are free of outliers.

32
Q

what should you do when you find an outlier?

A

try to find an explanation for it

33
Q

What is the four step process?

A
  1. STATE: What is the practical question, in the context of the real-world setting?
  2. PLAN: What specific statistical operations does this problem call for?
  3. SOLVE: Make the graphs and carry out the calculations needed for this problem.
  4. CONCLUDE: Give your practical conclusion in the setting of the real-world problem.