Numeric summaries of data Flashcards
What do measures of location indicate?
Tells us where the data is clustering and what are the values in the middle.
What are the measures of location?
- Mean
- Median
- Mode
What is the mode?
The mode is the value that occurs most frequently in a dataset. It’s useful when you want to identify the most common or popular item, especially in categorical data.
How do you calculate the mode for numeric data?
For numeric data, you look for the value that appears most often. If multiple values appear with the same frequency, you may have multiple modes (bimodal or multimodal data).
What is the median?
The median is the middle value of a dataset when all the numbers are arranged in order from smallest to largest. It represents the central point where half the data lies below and half lies above.
Useful in cases of outliers
How do you find the median for an even number of data points?
When the dataset has an even number of values, the median is the average of the two middle values.
Example: For the dataset 18, 19, 20, 21, the median is (19 + 20)/2 = 19.5.
What is the mean?
The mean (or average) is calculated by adding up all the values in a dataset and dividing by the number of data points. It represents the overall “balance point” of the data.
Can be sensitive to outliers which can skew the result.
Why do the mode, median, and mean sometimes give different values?
The mode, median, and mean often differ because they each describe the center of the data in different ways.
* The mode shows the most frequent value.
* The median is the middle value when the data is ordered.
* The mean is the average of all values.
When a dataset is skewed (meaning it has extreme high or low values), the mean is affected the most, while the median and mode stay closer to the middle. The mean can be pulled away from the other two by very large or very small numbers.
How does skewness affect these measures of location?
Skewness means the data is unevenly spread out, either with more values on one side than the other.
* In positively skewed data (with high outliers), the mean is higher than the median and mode.
* In negatively skewed data (with low outliers), the mean is lower than the median and mode.
How does the mode apply to categorical data?
For categorical data (like types of fruits or nationalities), the mode is the category that occurs most often. This is called the modal response.
How do you find the mode for grouped data?
In grouped data, where values are placed into ranges (such as test scores from 50–59, 60–69, etc.), you find the mode by identifying the group (or class) with the highest frequency. This group is called the modal class.
What is bimodal data?
Bimodal data is when a dataset has two modes, meaning there are two distinct values or groups that appear with the highest frequency.
Example: If the number of times an ATM is used per day falls into two common ranges, 60–69 and 80–89, and both have the same frequency, the data is bimodal because there are two “peaks.”
What are some problems with the mode?
- Mode is not useful when dataset has many unique values with no frequently occuring values.
- The mode is sensitive to how data is grouped in continuous datasets. Changing the grouping intervals/ ranges can lead to a different mode, making it unreliable.
- In datasets with a large range of values, the mode only tells you the most frequent value but not the overall pattern or distribution of the data.
What is the median?
The median is the middle value of a dataset when the data is arranged in order. It’s useful because it isn’t affected by extremely high or low values (outliers).
How do you calculate the median?
Step 1: Rank the Data: Arrange the dataset in order from smallest to largest.
Example: For the dataset 7, 3, 5, 1, 2, you first order it: 1, 2, 3, 5, 7.
Step 2: Find the Middle:
* If the dataset has an odd number of values, the median is the exact middle value.
Example: For 1, 2, 3, 5, 7, the median is 3 because it’s the middle number.
* If the dataset has an even number of values, the median is the average of the two middle numbers. Example: For the dataset 18, 19, 20, 21, the median is- (19+20)/2=19.5.