Chpt 3 - Numerical Descriptive Measures Flashcards

1
Q

How can we organize numerical data?

A

Graphical Methods

Numerical Methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does a histogram compare to a bar chart in what data they are representing?

A

They are similar but bar charts are for categorical data and histograms are for numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does a histogram compare to a bar chart in how close the bars are to each other?

A

Bars are touching in a histogram, but not a bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does a histogram compare to a bar chart in what each bar represents?

A

Bar charts have each bar representing a different variable, but in a histogram each bar represents a group of values that the variable can take

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does a histogram compare to a bar chart in the height of each bar?

A

In a bar chart, the height of a bar is determined by frequency or relative frequency.

In a histogram, the height of the bar is the frequency or relative frequency of the group of values that the bar represents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How should we group the values when making a histogram for discrete data with only a small number of distinct values?

A

Single value grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When should single value grouping be applied to a histogram?

A

When using discrete data with only a small number of distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is single value grouping for a histogram?

A

Each bar represents a distinct value (similar to bar charts)

The height of the bar is determined by the frequency or relative frequency of the corresponding values in the sample

These would be called a frequency histogram or a relative frequency histogram respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of histogram uses the height of the bar to represent relative frequency?

A

relative frequency histogram :)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How should we group the values when making a histogram for discrete data with many distinct values?

A

Limit grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps to making a histogram using limit grouping?

A
  1. Choose an appropriate range which includes all the distinct values
  2. Divide the range into sub-intervals of equal strength
  3. Summarize the data using f or f/n table. Here a frequency is the number of individuals falling into a sub-interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When should limit grouping be applied to a histogram?

A

When using discrete data with many distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the number of sub-intervals that work best for limit grouping? Explain

A

Should be between 5-20

Otherwise it won’t tell information about the data. Imagine if there was only one bar in the histogram or each bar corresponding to a distinct value with 100 values. Gross lol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Let’s say we want to analyze how many hours per week students are studying. A survey of 20 people gave answers ranging from 5 hrs to 96 hours. How would you sub-intervals to make the limit grouping histogram?

A

Option A:
0-19
20-39
40-59
60-78
80-99

Option B:
0-9
10-19
etc. (would give 10 sub-intervals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What grouping is applied to continuous data when making a histogram?

A

Cutpoint grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is cutpoint grouping used in a histogram?

A

When using continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is cutpoint grouping?

A

Used for continuous data, it defines sub-intervals such athat any value (decimals or whole number) in an interval can be assigned to one, and only one, sub-interval. This is because the possible values that continuous variable can take is any number in an interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the steps to creating a histogram using cutpoint grouping?

A
  1. Choose the whole interval which includes all of the data values
  2. Divide this whole interval into 6 sub-intervals of equal length (i.e. 0-under 10, 10-under 20 etc.)
  3. Count the number of individuals falling into each sub-interval and summarize in a frequency or relative frequency table
  4. Plot the histogram with 1 bar corresponding to a sub-interval and the height of the bar = frequency or relative frequency as desired
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of organizing data?

A

To analyze the distribution of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is distribution and what are it’s 2 important features?

A

Distribution of a variable is a table, graph, or formula that provides

  1. All the possible values that this variable can take
  2. How often these values occur
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why is it important to determine the shape of the distribution of a variable?

Give an example

A

Plays a role in determining the appropriate inferential methods to analyze its data

If the distribution of a variable is bell shaped, a lot of inferential methods can be applied to analyze its data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the 3 important aspects when describing the shape of a distribution?

A

Symmetry

Skewness

Modality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is symmetry in regards to distribution shape?

A

The left side of the distribution mirrors the right side, such as a bell-shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is skewness in regards to distribution shape?

A

Used for an asymetric shape and therefore has a longer tail to one side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

If a distribution has a longer left tail, what is this called?

A

Left skewed, or negatively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is it called when the distribution has a longer right tail?

A

Right skewed or positively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is left skewed distribution?

A

When the left has a longer tail (so the peak is to the right)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is right skewed distribution?

A

When the right has a longer tail (so the peak is to the left)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is modality in regards to distribution shape?

A

Its the number of peaks in a distribution. May have one (unimodal), two (bimodal), or many (multimodal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a unimodal distribution?

A

There is only one peak in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is called when there are many peaks in the distribution?

A

multimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is bimodal distribution?

A

When there are 2 peaks in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are 2 well-known distribution shapes?

A

Bell-shaped

Uniform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the features of a bell-shaped distribution?

A

Unimodal
Symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is another name for a bell-shaped distribution?

A

Normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the features of a uniform model of distribution?

A
  1. If all the possible values that a variable can take have equal chance to happen, the distribution of this variable is a uniform distribution
  2. Uniform distributions have no mode and are symmetric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Give examples of graphical methods for organizing numerical data (4)

A

-histogram graph
-stem-and-leaf diagram
-dot-plot
-boxplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Give examples of numerical methods for organizing numerical data (2)

A

-calculating center of data (mode, mean, median)
-calculating spread (range, IQR, standard deviation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is the leaf?

A

The rightmost digit of the data value

2005 - leaf is 5

34 - leaf is 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is a stem?

A

All data values except the rightmost digit

2005 - stem is 200

34 - stem is 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the stem an leaf values of 15?

A

Leaf - 5
Stem - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are the stem and leaf values of 183

A

Leaf - 3
Stem - 18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are the steps to creating a stem-and-leaf diagram?

A
  1. Identify stem and leaf of each data value
  2. Draw a vertical line, write the stems from the smallest to largest in the vertical column to the left of the vertical line
  3. Write each leaf to the right of the vertical line in the same row as it’s corresponding stem
  4. Arrange the leaves in each row from the smallest to the largest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How is a dot plot read?

A

Each point corresponds to a data value. Points of the same value are stacked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are descriptive measures?

A

Using numerical methods to summarize numerical data which includes finding the center of a numerical data set and describing it’s spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the center of a data set?

A

The most typical value of the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the most typical value of a data set called?

A

Center

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are the 3 options for the center of a data set?

A

Mode, mean, median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

20 students are asked who they are going to vote for in the next election, these are the results

UCP - 8
Liberal - 5
NDP - 3
Green - 4

What is the mode?

A

UCP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is the mode of a data set?

A

The value that occurs most frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

20 students are asked who they are going to vote for in the next election, these are the results

UCP - 8
Liberal - 5
NDP - 3
Green - 4

What type of data is this?

A

Categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What value occurs most frequently in a data set?

A

The mode

53
Q

16 students were asked how many email addresses they had and below are the results

1 email - 3
2 emails - 4
3 emails - 7
4 emails - 2

What is the mode?

A

3 emails

54
Q

What is the mode in this data set?

{2, 4, 1, 6, 5, 7}

A

There is no mode in this example as no value occurs more than once

55
Q

What is the mode in this data set?

{2, 4, 1, 2, 4, 6, 5}

A

Two modes: 2, 4

56
Q

What does this symbol mean?

A

Pronounced X Bar

Denotes the mean of a data set

57
Q

What does this symbol mean?

A

Summation (or add up the included values)

58
Q

What is the mean for the following data set?

{5, 7, 10, 13, 15}

A

x̄ = ∑x / n

∑x = 5+7+10+13+15 = 50

n = 5

x̄ = 50/5 = 10

59
Q

How do we denote a sample mean?

A

Pronounced X Bar

60
Q

How do we denote a population mean?

A

μ

Pronounced mu

61
Q

How do you find the mean of the population?

A

μ = ∑x / N

So you add up all of the individual values of the whole population, and then divide that by the number of individuals in the entire population

62
Q

Is the sample mean the same as the population mean?

A

No, the sample mean is only an estimation of the population mean

63
Q

Because the sample mean is only an estimate of the population mean, what do we introduce?

A

Error or sample error

64
Q

How can we measure a sampling error?

A

By using statistical inferential methods (if we learn this later, I don’t know it yet lol)

65
Q

What are the steps to finding the median?

A

Sort the data values from the smallest to largest

If the number of data values is odd, the median is the middle value of the sorted data

If the number of the data values is even, the median is the average of the two values in the middle of the sorted data

66
Q

What is the median?

A

A numerical value separating the higher half of values in a data set from the lower half

67
Q

What is the numerical value separating the higher half of values in a data set from the lower half?

A

Median

68
Q

How do you determine the median if the number of data values in a set is odd?

A

It is the middle value of the sorted data

69
Q

How do you determine the median if the number of data values in a set is even?

A

It is the average of the two values in the middle of the sorted data

70
Q

Find the median in the data set

{4, 7, 9, 12, 101}

A

9

It’s just the middle number in the ordered data set

71
Q

Find the median of the following data set

{1, 5, 2, 7, 9}

A

reorder to

1, 2, 5, 7, 9

Median is 5

72
Q

Find the median of the following data set

{3, 6, 2, 8, 4, 7}

A

reorder to

2, 3, 4, 6, 7, 8

Median is the average of 4 and 6

(4+6)/2 = 5

73
Q

What can be used to describe the center of a data set?

A

mode, mean, median

74
Q

How do we describe the center of categorical data?

A

Mode

75
Q

What are the most common ways to find the center of a data set?

A

Mean and medians are used more commonly than mode

76
Q

If a data set does not have outliers and its distribution is symmetric, what method should be used for describing the center of the data?

A

Mean

77
Q

If a data set has outliers, what method should be used for describing the center of the data?

A

Median

78
Q

How do we determine which method should be used for describing the center of data?

A

Mode - used for categorical data

Mean - used for numerical sets that has symmetrical distribution and no outliers

Median - used for numerical sets that have outliers

79
Q

What is an outlier in a data set?

A

Observations very far away from most data values

80
Q

What can be used to describe the spread of a numerical data set?

A

Range
Interquartile range (IQR)
Standard deviation

81
Q

How do we calculate range?

A

Range = maximum-minimum

82
Q

Determine the range of the following data set:

{2, 8, 12, 38, 58}

A

Range = max - min

Range = 58 - 2 = 56

83
Q

Determine the range of the following data set:

{38, 12, 39, 24, 24, 5}

A

Range = max - min

Range = 39 - 5 = 34

84
Q

What equation is used to determine the IQR?

A

IQR = Q3 - Q1

85
Q

What does IQR stand for?

A

Interquartile Range

86
Q

How much of the data set is included in the IQR?

A

The middle 50% of the data values

87
Q

What does a small/large IQR tell us about the data?

A

Small IQR - small spread of the middle data values

Large IQR - Large spread of the middle data values

88
Q

What is Q2 equivalent to?

A

The median

89
Q

What are the steps to determining the IQR?

A
  1. Arrange data values in increasing order and determine the median (Q2)
  2. Find the higher half and lower half of the data set
  3. Find Q1, which is the median of the lower half, and Q3 which is the median of the upper half
  4. IQR = Q3-Q1
90
Q

Determine the IQR of the following data set:

{13, 15, 21, 25, 26, 27, 30,
32, 34, 35, 38, 41, 43, 236}

A

Q2 (median) = 31
Q1 = 25
Q3 = 38

IQR = Q3-Q1
IQR = 38-25 =13

91
Q

Determine the IQR of the following data set:

{13, 15, 16, 20, 21,
25, 26, 27, 30, 31,
32, 32, 34, 35,
38, 38, 41, 43, 46}

A

Q2 (median) = 31
Q1 = 23
Q3 = 36.5

IQR = Q3 - Q1
IQR = 36.5-23 = 13.5

92
Q

What is the best way to describe the range of a data set when there are outliers?

A

IQR

93
Q

While the range of a data set is easy to find, what is it very sensitive to?

A

Extreme values or outliers

94
Q

What does xi mean?

A

Data values in a set

95
Q

What is a standard deviation?

A

The “average” distance between data values and the sample mean

96
Q

What value determines the “average” distance between data values and the sample mean?

A

Standard deviation

97
Q

What is the notation for sample standard deviation?

A

s

98
Q

What does s stand for?

A

Standard deviation in a sample

99
Q

Which standard deviation equation needs to be used if you only have the sums but not the individual values?

A

The computing formula

100
Q

Which standard deviation equation should be used if you have all the individual values?

A

Either the defining formula or computing formula

101
Q

What is the difference in outcome (or answer) between the defining and computing formulas of standard deviation?

A

Nothing, the answers are the same, they just get you there a different way

102
Q

What is the defining formula for standard deviation

A

The square route of

   n-1
103
Q

What is the computing formula for standard deviation

A

The square route of

                          (∑xi) squared  (∑xi squared) -  --------------------
                                    n --------------------------------------------
                  n - 1
104
Q

What is the difference between (∑xi) squared and (∑xi squared)

A

(∑xi) squared = the values are added and then squared

(∑xi squared) = the values are squared and then added

105
Q

What does the value of s tell us about the spread of a set of data values?

A

It tells us the “average distance between data values and the sample mean, so if the s value is large, the spread is large, if the s value is small, the spread is small

106
Q

Generally speaking, if a data set has no outliers and is not skewed, what methods should be used to describe its center and spread?

A

Mean and standard deviation, respectively

107
Q

What is standard deviation sensitive to?

A

Outliers

108
Q

If a data set has outliers and is skewed, what methods should be used to describe its center and spread?

A

Median and IQR, respectively

109
Q

What is μ?

A

The population mean

Pronounced mu

110
Q

What is σ?

A

The population standard deviation

Pronounced sigma

111
Q

What is the population standard deviation denoted by?

A

σ

Pronounced sigma

112
Q

Why is the population mean (μ) and population standard deviation (σ) usually unknown?

A

Because ALL of the population values are needed, but this is often impossible to obtain

113
Q

What is a parameter?

A

Descriptive measure for a population including population mean (μ) or a population standard deviation (σ)

114
Q

Is a parameter fixed or variable?

A

It is fixed, for example, a population has only one mean (μ)

115
Q

What are statistics?

A

Descriptive measures for a sample such as sample mean (x̄) and sample standard deviation (s)

116
Q

Are statistics fixed or variable?

A

They are variable; each sample is going to have slightly different values and therefore slightly different sample means (x̄) and sample standard deviations (s)

117
Q

What are the properties of parameters?

A

fixed

usually unknown

118
Q

What are the properties of statistics?

A

easily calculated given examples

varies from sample to sample

119
Q

What is the five-number summary of a data set?

A

Minimum
Q1
Q2 (mean)
Q3
Maximum

120
Q

What is a boxplot used for?

A

Provide a graphical display of the center and variation of a numerical data set

121
Q

What is a boxplot based off?

A

The five-number summary

122
Q

What are the steps to creating a box plot?

A
  1. Draw short horizontal lines at Q1, Q2, Q3. Then connect them with vertical lines to form a box
  2. Find potential outliers which are data values < lower limit or > upper limit and denote these outliers by dots in the boxplot
  3. Find the max and min of the data values that are NOT outliers and draw short horizontal lines at these values; draw a “whisker” from the box to these lines
123
Q

How do you find the upper and lower limits of a box plot?

A

Upper limit = Q1 - 1.5 X IQR

Lower limit = Q3 + 1.5 X IQR

124
Q

What can we tell about the data set distribution when a boxplot has an upper whisker that is longer than the lower whisker and there is a large distance between the Q2-Q3 with a small distance between Q1-Q2?

A

It is right skewed

125
Q

How can you tell that a data set has a right skewed distribution when looking at a boxplot?

A
  • upper whisker is longer than lower whisker
  • large distance between Q2-Q3; small distance between Q1-Q2
126
Q

How can you tell that a data set has a left skewed distribution when looking at a boxplot?

A
  • lower whisker is longer than upper whisker
  • large distance between Q1-Q2; small distance between Q2-Q3
127
Q

How can you tell that a data set has a bell shaped distribution when looking at a boxplot?

A
  • upper and lower whiskers have equal lengths
  • the box in the middle is divided into 2 equal parts
128
Q

What can we tell about the data set distribution when a boxplot has a lower whisker that is longer than the upper whisker and there is a large distance between the Q1-Q2 with a small distance between Q2-Q3?

A

Left skewed distribution

129
Q

What can we tell about the data set distribution when a boxplot when the upper and lower whiskers have equal lengths and the box in the middle is divided into 2 equal parts?

A

It has a bell shaped distribution