Chapter 3: Summarising Data Flashcards
What is a measure of central tendency?
Represents the ‘centre’ of a set of data, including mode, median, and mean.
Define mode in data.
The one that appears the most; the most common value.
What is a modal class?
The class with the highest frequency.
What is the median?
The middle value of a dataset.
How do you find the median of discrete data?
- Put the numbers in order from smallest to largest. 2. Find the (n+1)th value, which indicates the median position.
What is the formula to find the median position?
(n + 1) / 2.
What should you do if the median position is a decimal?
Find the two surrounding values and average them.
How do you find the median in grouped data?
Identify the median class which contains the median position.
What is the estimated median using linear interpolation?
Use ½ n to find the median position and calculate within the median class.
What is the mean (arithmetic mean)?
The sum of all values divided by the number of values.
Provide the formula for mean.
𝑥̅ = ∑𝑥 / n.
How do you calculate the mean from a frequency table?
Add an extra column for f × x, sum it, and divide by total frequency.
What is the formula for weighted mean?
Weighted Mean = ∑(weight × value) / ∑weights.
What is the geometric mean?
The nth root of the product of all values.
Why is transforming data useful?
To simplify calculations with large numbers.
What happens to the mode when new values are added?
It could change if the new value affects which value appears most.
How does adding a value greater than the median affect the median?
The median might increase.
What is the range in statistics?
The difference between the largest and smallest values.
What is the formula for range?
Range = Largest Value - Smallest Value.
Define interquartile range (IQR).
The middle 50% of the data when in order.
What is the formula for the interquartile range?
IQR = Upper Quartile - Lower Quartile.
What is the lower quartile (LQ)?
The value at 25% of the way through the data.
True or False: The mean is always affected by extreme values.
True.
List the advantages of using mode.
- Easy to use
- Always a value in the data
- Unaffected by extreme values
- Can be used with quantitative and qualitative data
List the disadvantages of using mode.
- May not exist or may have multiple modes
- Cannot be used to calculate measures of spread
- Not always representative of the data.
What are the advantages of using median?
- Easy to find when data is in order
- Unaffected by outliers
- Best with skewed data
- Can calculate quartiles, IQR, and skew.
What are the disadvantages of using median?
- May not be a data value
- Not always representative of the data.
What are the advantages of using mean?
- Uses all the data
- Can calculate standard deviation and skew.
What are the disadvantages of using mean?
- May not be a data value
- Affected by extreme values or outliers
- Can be distorted by open-ended classes.
What is the formula for Interquartile Range (IQR)?
IQR = UQ - LQ
Define Lower Quartile (LQ).
The value ¼ of the way through the data; 25% of the data is less than the LQ.
Define Upper Quartile (UQ).
The value ¾ of the way through the data; 25% of the data is above the UQ.
How is LQ calculated for discrete data?
LQ = ¼(n+1)th value
How is UQ calculated for discrete data?
UQ = ¾(n+1)th value
What is the Interpercentile Range (IPR)?
The difference between two percentiles.
What does a Box Plot represent?
Important features of the data and gives a summary of the spread/skew of the data.
What are the five pieces of information included in a Box Plot?
- Minimum Value
- Lower Quartile (LQ)
- Median
- Upper Quartile (UQ)
- Maximum Value
What is the formula for calculating standard deviation (SD) using discrete data?
σ = √(1/n ∑(x - x̅)²) or σ = √(∑x²/n - (∑x)²/n²)
What does a smaller standard deviation (SD) indicate?
The data is closer to the mean.
What does a larger standard deviation (SD) indicate?
The data is more spread out from the mean.
What is the Interdecile Range?
The difference between the first and ninth deciles.
How are outliers defined in relation to IQR?
Values that are more than 1.5 x IQR above UQ or below LQ.
What is the formula to identify outliers?
Outliers are values > UQ + (1.5 x IQR) or < LQ - (1.5 x IQR)
How can outliers be identified using mean and standard deviation?
Values more than 3 SD away from the mean.
What does skewness describe?
The shape of the distribution and how the data is spread out.
What indicates a positive skew?
Most values are at the beginning of the data set, with the tail going in the positive direction of the x-axis.
What indicates a negative skew?
Most values are at the end of the data set, with the tail pointing towards the negative direction of the x-axis.
What does a symmetrical distribution mean?
The data is evenly distributed on both sides of the median.
What is the formula for calculating skewness?
Skewness = 3(means - median) / standard deviation
When comparing data sets, what should be considered?
- Measure of average (mean/median/mode)
- Measure of spread (range/IQR/SD)
- Skewness
What is the first step in drawing Box Plots?
Calculate LQ, UQ, median, and identify minimum and maximum values.
What is the significance of the median in a Box Plot?
It marks the middle of the data, with 50% of the data above and below this value.
What is the relationship between mean, median, and mode when comparing two data sets?
Mean/median/mode for data set A is larger than mean/median/mode for data set B, so on average, data set A is more than data set B.
What does a larger range/IQR/SD indicate when comparing two data sets?
Range/IQR/SD for data set A is larger than that of data set B, so the results of data set A are more spread out/less consistant than those of data set B.
What does a smaller range/IQR/SD imply about a data set?
Data set A has a smaller range/IQR/SD than data set B, which means the results for data set A are more consistant.
How does standard deviation relate to the closeness of values to the mean?
Lower SD means values are closer to the mean and therefore higher SD means values are more spread out from the mean.
What does it mean if a box plot for a data set is positively skewed?
Box plot for data set A is positively skewed, indicating that the majority of results were lower with few higher results.
What does it mean if a box plot for a data set is negatively skewed?
Box plot for data set A is negatively skewed, indicating that the majority of results were higher with few lower results.
What should always be referenced when interpreting data?
Always make reference to individual values and mention which data set is larger/smaller than the other clearly.
When comparing averages, what terms should be used?
Comparing averages involves using mean, median, and mode.
What should be included when interpreting data comparisons?
Always interpret in context and link back to the scenario in the question and labels on axes.
What is the importance of pairing appropriate values when comparing data?
When comparing data, make sure to pair the appropriate values of average and spread.
List the measures of average.
- Mode
- Median
- Mean
List the measures of spread.
- Range
- Range/IQR
- Range/SD