Chpt 3 - Numerical Descriptive Measures Flashcards
How can we organize numerical data?
Graphical Methods
Numerical Methods
How does a histogram compare to a bar chart in what data they are representing?
They are similar but bar charts are for categorical data and histograms are for numerical data
How does a histogram compare to a bar chart in how close the bars are to each other?
Bars are touching in a histogram, but not a bar chart
How does a histogram compare to a bar chart in what each bar represents?
Bar charts have each bar representing a different variable, but in a histogram each bar represents a group of values that the variable can take
How does a histogram compare to a bar chart in the height of each bar?
In a bar chart, the height of a bar is determined by frequency or relative frequency.
In a histogram, the height of the bar is the frequency or relative frequency of the group of values that the bar represents
How should we group the values when making a histogram for discrete data with only a small number of distinct values?
Single value grouping
When should single value grouping be applied to a histogram?
When using discrete data with only a small number of distinct values
What is single value grouping for a histogram?
Each bar represents a distinct value (similar to bar charts)
The height of the bar is determined by the frequency or relative frequency of the corresponding values in the sample
These would be called a frequency histogram or a relative frequency histogram respectively.
What type of histogram uses the height of the bar to represent relative frequency?
relative frequency histogram :)
How should we group the values when making a histogram for discrete data with many distinct values?
Limit grouping
What are the steps to making a histogram using limit grouping?
- Choose an appropriate range which includes all the distinct values
- Divide the range into sub-intervals of equal strength
- Summarize the data using f or f/n table. Here a frequency is the number of individuals falling into a sub-interval
When should limit grouping be applied to a histogram?
When using discrete data with many distinct values
What is the number of sub-intervals that work best for limit grouping? Explain
Should be between 5-20
Otherwise it won’t tell information about the data. Imagine if there was only one bar in the histogram or each bar corresponding to a distinct value with 100 values. Gross lol
Let’s say we want to analyze how many hours per week students are studying. A survey of 20 people gave answers ranging from 5 hrs to 96 hours. How would you sub-intervals to make the limit grouping histogram?
Option A:
0-19
20-39
40-59
60-78
80-99
Option B:
0-9
10-19
etc. (would give 10 sub-intervals)
What grouping is applied to continuous data when making a histogram?
Cutpoint grouping
When is cutpoint grouping used in a histogram?
When using continuous data
What is cutpoint grouping?
Used for continuous data, it defines sub-intervals such athat any value (decimals or whole number) in an interval can be assigned to one, and only one, sub-interval. This is because the possible values that continuous variable can take is any number in an interval
What is the steps to creating a histogram using cutpoint grouping?
- Choose the whole interval which includes all of the data values
- Divide this whole interval into 6 sub-intervals of equal length (i.e. 0-under 10, 10-under 20 etc.)
- Count the number of individuals falling into each sub-interval and summarize in a frequency or relative frequency table
- Plot the histogram with 1 bar corresponding to a sub-interval and the height of the bar = frequency or relative frequency as desired
What is the purpose of organizing data?
To analyze the distribution of the data
What is distribution and what are it’s 2 important features?
Distribution of a variable is a table, graph, or formula that provides
- All the possible values that this variable can take
- How often these values occur
Why is it important to determine the shape of the distribution of a variable?
Give an example
Plays a role in determining the appropriate inferential methods to analyze its data
If the distribution of a variable is bell shaped, a lot of inferential methods can be applied to analyze its data
What are the 3 important aspects when describing the shape of a distribution?
Symmetry
Skewness
Modality
What is symmetry in regards to distribution shape?
The left side of the distribution mirrors the right side, such as a bell-shape
What is skewness in regards to distribution shape?
Used for an asymetric shape and therefore has a longer tail to one side
If a distribution has a longer left tail, what is this called?
Left skewed, or negatively skewed
What is it called when the distribution has a longer right tail?
Right skewed or positively skewed
What is left skewed distribution?
When the left has a longer tail (so the peak is to the right)
What is right skewed distribution?
When the right has a longer tail (so the peak is to the left)
What is modality in regards to distribution shape?
Its the number of peaks in a distribution. May have one (unimodal), two (bimodal), or many (multimodal)
What is a unimodal distribution?
There is only one peak in the distribution
What is called when there are many peaks in the distribution?
multimodal
What is bimodal distribution?
When there are 2 peaks in the distribution
What are 2 well-known distribution shapes?
Bell-shaped
Uniform
What are the features of a bell-shaped distribution?
Unimodal
Symmetric
What is another name for a bell-shaped distribution?
Normal distribution
What are the features of a uniform model of distribution?
- If all the possible values that a variable can take have equal chance to happen, the distribution of this variable is a uniform distribution
- Uniform distributions have no mode and are symmetric
Give examples of graphical methods for organizing numerical data (4)
-histogram graph
-stem-and-leaf diagram
-dot-plot
-boxplot
Give examples of numerical methods for organizing numerical data (2)
-calculating center of data (mode, mean, median)
-calculating spread (range, IQR, standard deviation)
What is the leaf?
The rightmost digit of the data value
2005 - leaf is 5
34 - leaf is 4
What is a stem?
All data values except the rightmost digit
2005 - stem is 200
34 - stem is 3
What are the stem an leaf values of 15?
Leaf - 5
Stem - 1
What are the stem and leaf values of 183
Leaf - 3
Stem - 18
What are the steps to creating a stem-and-leaf diagram?
- Identify stem and leaf of each data value
- Draw a vertical line, write the stems from the smallest to largest in the vertical column to the left of the vertical line
- Write each leaf to the right of the vertical line in the same row as it’s corresponding stem
- Arrange the leaves in each row from the smallest to the largest
How is a dot plot read?
Each point corresponds to a data value. Points of the same value are stacked
What are descriptive measures?
Using numerical methods to summarize numerical data which includes finding the center of a numerical data set and describing it’s spread
What is the center of a data set?
The most typical value of the data set
What is the most typical value of a data set called?
Center
What are the 3 options for the center of a data set?
Mode, mean, median
20 students are asked who they are going to vote for in the next election, these are the results
UCP - 8
Liberal - 5
NDP - 3
Green - 4
What is the mode?
UCP
What is the mode of a data set?
The value that occurs most frequently
20 students are asked who they are going to vote for in the next election, these are the results
UCP - 8
Liberal - 5
NDP - 3
Green - 4
What type of data is this?
Categorical data