Chapter 3 (3.1-3.4): Descriptive Statistics and Analytics--Numerical Methods Flashcards

1
Q

In addition to describing the shape of a distribution, we want to describe the data set’s central tendency. This includes what 3 things?

A

Mean, median, and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A measure of central tendency represents the ________ (or middle) of the data.

A

center

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

average of the population measurements

A

population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a number calculated from all the population measurements that describes some aspect of the population

A

population parameter

(all the numbers we calculate using population measurements is called (population) parameter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a number calculated using the sample measurements that describes some aspect of the sample

A

sample statistic

(when we calculate mean, median, and mode using samples, this is called (sample) statistics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 3 measures of central tendency?

A

Mean, median, and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the average or expected value

A

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the value of the middle point of the ordered measurements

A

median (Md)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

the most frequent value

A

mode (Mo)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the symbol for population mean? What about sample mean?

A
  • the fancy M
    (the population mean is the value to expect, on average, in the long-run)
  • the x with a line over it
    (the sample mean is a point estimate of the population mean)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate mean?

A

Add all numbers and then divide that total by total number of classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mean is also called what other two things (these words are interchangeable)?

A
  1. Average
  2. expected value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For median, if the number of measurements is (odd/even), the median is the middlemost measurement in the ordering.
For median, if the number of measurements is (odd/even), the median is the average of the two middlemost measurements in the ordering.

A

odd; even

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do you have to do before calculating median?

A

Arrange the numbers in numerical (increasing) order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T or F: Modes are the values that are observed “most typically”.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If there are two modes, the data is ________.

A

bimodal
(ex: 3,4,5,5,5,6,6,6,7,8,9… 5 and 6 are bimodal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If there are more than two modes, the data is _________.

A

multimodal
(ex: 1,1,1,2,2,2,3,3,3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When data are in classes, the class with the (highest/lowest) frequency is the modal class.

A

highest
(the tallest box in the histogram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Mean, median, and mode are _________.

A

descriptives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the 13 descriptive statistics?

A
  1. Mean
  2. Median
  3. Mode
  4. Standard Error
  5. Standard Deviation
  6. Sample Variance
  7. Kurtosis
  8. Skewness
  9. Range
  10. Minimum
  11. Maximum
  12. Sum
  13. Count
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If mean=median=mode, then the curve will be:
a. skewed to the right
b. symmetrical
c. skewed to the left

A

b. symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If mode < median < mean, then the curve will be:
a. skewed to the right
b. symmetrical
c. skewed to the left

A

a. skewed to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

If mean < median < mode, then the data will be:
a. skewed to the right
b. symmetrical
c. skewed to the left

A

c. skewed to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

T or F: A population parameter describes some aspect of the population and is a number calculated using all population measurements. Its point estimate is calculated from a sample of measurements rather than all the population measurements.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What 3 things tells us the variation in our data (what are the 3 measures of variation)?

A
  1. Range
  2. Standard Deviation
  3. Variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do you calculate range?

A

Highest number - smallest number (in our data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

the average of the squared deviations of all the population measurements from the population mean

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

the square root of the population variance

A

standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What two measures of variation measures the spread of data from the mean (how far our data is spread from the mean)?

A

Variance and standard deviation

30
Q

What are the 2 steps to calculate variance?

A
  1. Calculate mean
  2. Take the mean and subtract it from each data point separately (one data point at a time) and square that, then add all of these numbers together and divide by total number of observations

(look at camera roll for example of this)

31
Q

How do you calculate standard deviation?

A

Just take the square root of the variance

32
Q

What is the standard deviation symbol?
What is the variance symbol?

A
  • The funky looking 6 shape
  • That 6 shape but squared

(look at camera roll for what population standard deviation and sample standard deviation looks like)

33
Q

The Empirical Rule for Normal Populations:
If a population has mean (fancy M) and standard deviation (funky looking 6) and is described by a normal curve, then:
1. _______ of the population measurements lie within one standard deviation of the mean: [M-6, M+6]
2. _______ lie within two standard deviations of the mean: [M-2(6), M+2(6)]
3. ________ lie within three standard deviations of the mean: [M-3(6), M+3(6)]

A
  1. 68.26%
  2. 95.44%
  3. 99.73%

(make sure you know these percentages; picture of this explained in camera roll)

34
Q

What is the formula for calculating z-scores?

A

z = (x - mean) / standard deviation

(For any x in a population or sample, the associated z score is z = (x-mean) / standard deviation)
(example of calculating z-scores in camera roll)

35
Q

the number of standard deviations that x is from the mean; indicates the relative location of a value within a population or sample

A

z scores (standardized value)

36
Q

If z score is positive, then our number is (greater than/less than/equal to) the mean.

A

greater than

37
Q

If z score is negative, then our number is (greater than/less than/equal to) the mean.

38
Q

If z-score is 0, then….

A

our number (x) is equal to the mean

39
Q

measures the size of the standard deviation relative to the size of the mean

A

coefficient of variation

40
Q

What is the formula for calculating the coefficient of variation?

A

(Standard deviation / mean) x 100%

(example of calculating this in camera roll)

41
Q

What is the coefficient of variation used for?

A

To measure risk

(as well as compare the relative variabilities of values about the mean, and compare the relative variability of populations or samples with different means and different standard deviations)

42
Q

If standard deviation is high, our data (is/is not) spread all over the mean, which means it has (high/low) risk.

43
Q

If standard deviation is small, our data is (spread our/near) the mean, which means it has (high/low) risk

A

low (or less)

44
Q

pth percentile:
P% are (above/below) P and (100-P) are (above/below) P.

A

below; above

(ex: if your score is 90th percentile, that means 90% of scores are below yours and (100-90) scores are above yours.

45
Q
  • The first quartile (Q1) is the _____ percentile.
  • The second quartile (Q2) (median) is the ______ percentile.
  • The third quartile (Q3) is the _____ percentile.
  • The interquartile range (IQR) is ______.
A
  • 25th
  • 50th (denoted Md)
  • 75th
  • Q3-Q1
46
Q

What are the 3 steps for calculating percentiles?

A
  1. Arrange the measurements in increasing (lowest to highest) order.
  2. Calculate the index i= (p/100) x n where p is the percentile to find. (n= count of data points)
  3. (a) if i is not an integer (whole number), round up and the next integer greater than i denotes the pth percentile
    (b) if i is an integer, the pth percentile is the average of the measurements in the i and i+1 ordered positions. (ex: i=2, so (2+3)/2 and this = P)
47
Q

What is the formula for calculating the index when calculating percentiles?

A

i = (p/100) x n

(example of calculating percentiles in camera roll)

48
Q

The 5 Number Summary is used to create what type of graph?

A

Box and whisker plot

49
Q

What is the 5 Number Summary?

A
  1. The smallest measurement
  2. The first quartile, Q1
  3. The median, Md (Q2)
  4. The third quartile (Q3)
  5. The largest measurement

(Once you have these numbers you can display this info visually using a box-and-whiskers plot)

50
Q

a convenient way of visually displaying the data through quartiles, and is easy to read and summarize

A

box and whisker plot

51
Q

The inner fences of a box-and-whiskers plot is located ____x_____ away from the quartiles

A

1.5 x IQR

(Q1 plus or minus (1.5 x IQR))

52
Q

What is the formula to calculate the lower limit for a box and whiskers plot?

A

Q1-1.5(IQR)

53
Q

What is the formula to calculate the upper limit for a box and whiskers plot?

A

Q3 + 1.5(IQR)

54
Q

T or F: If there is a long whisker (line) on right side, it is left-skewed.

A

False; right-skewed

55
Q

T or F: If there is a long whisker (line) on left side, it is left-skewed.

56
Q

measurements that are very different from other measurements; they are either much larger or much smaller than most of the other measurements

57
Q

________ lie beyond the limits of the box-and-whiskers plot; measurements less than the lower limit or greater than the upper limit

58
Q

T or F: Outliers skew our data.

59
Q

the length of the interval that contains the middle 50% of the data; is a single number, not a range nor an interval of numbers

A

interquartile range (Q3-Q1)

60
Q

a value below which lie the specified percentage of the measurements in the population or in the sample

A

percentile

61
Q

When points on a scatter plot seem to fluctuate around a straight line, there is a _______ relationship between x and y.

62
Q

T or F: A positive covariance indicates a positive linear relationship between x and y.

A

True (as x increases, y increases)

63
Q

T or F: A negative covariance indicates a negative linear relationship between x and y

A

True (as x increases, y decreases)

64
Q

T or F: A box and whiskers plot is used to study the relationships between 2 quantitative variables.

A

False; scatter plot

65
Q

What is the correlation coefficient called?

66
Q

When r > 0, this indicates a (positive/negative) relationship.

67
Q

When r < 0, this indicates a (positive/negative) relationship

68
Q

When r = 0, this indicates (positive/negative/no) relationship

69
Q

What does the correlation coefficient tell us?

A

How strong the relationship is between 2 variables (the strength of the relationship does NOT depend on the magnitude of data)

70
Q

a. sample correlation coefficient (r) is always between…
b. values near ___ show strong negative correlation.
c. values near ____ show no correlation
d. values near _____ show strong positive correlation

A

a. -1 and 1
b. -1
c. 0
d. 1

71
Q

If there is a linear relationship between x and y, you might wish to predict y on the basis of x. This requires the equation of a line describing the linear relationship. Line is calculated based on the _____ _____ line. What is the formula for this?

A
  • least squares
  • y = b0 +b1(x)

(b0 = y-intercept and b1 = slope)

(example of this in camera roll; will be on exam!)