Numeric summaries of data Flashcards

1
Q

What do measures of location indicate?

A

Tells us where the data is clustering and what are the values in the middle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the measures of location?

A
  • Mean
  • Median
  • Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the mode?

A

The mode is the value that occurs most frequently in a dataset. It’s useful when you want to identify the most common or popular item, especially in categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you calculate the mode for numeric data?

A

For numeric data, you look for the value that appears most often. If multiple values appear with the same frequency, you may have multiple modes (bimodal or multimodal data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the median?

A

The median is the middle value of a dataset when all the numbers are arranged in order from smallest to largest. It represents the central point where half the data lies below and half lies above.

Useful in cases of outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you find the median for an even number of data points?

A

When the dataset has an even number of values, the median is the average of the two middle values.
Example: For the dataset 18, 19, 20, 21, the median is (19 + 20)/2 = 19.5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the mean?

A

The mean (or average) is calculated by adding up all the values in a dataset and dividing by the number of data points. It represents the overall “balance point” of the data.

Can be sensitive to outliers which can skew the result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do the mode, median, and mean sometimes give different values?

A

The mode, median, and mean often differ because they each describe the center of the data in different ways.
* The mode shows the most frequent value.
* The median is the middle value when the data is ordered.
* The mean is the average of all values.

When a dataset is skewed (meaning it has extreme high or low values), the mean is affected the most, while the median and mode stay closer to the middle. The mean can be pulled away from the other two by very large or very small numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does skewness affect these measures of location?

A

Skewness means the data is unevenly spread out, either with more values on one side than the other.
* In positively skewed data (with high outliers), the mean is higher than the median and mode.
* In negatively skewed data (with low outliers), the mean is lower than the median and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does the mode apply to categorical data?

A

For categorical data (like types of fruits or nationalities), the mode is the category that occurs most often. This is called the modal response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you find the mode for grouped data?

A

In grouped data, where values are placed into ranges (such as test scores from 50–59, 60–69, etc.), you find the mode by identifying the group (or class) with the highest frequency. This group is called the modal class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is bimodal data?

A

Bimodal data is when a dataset has two modes, meaning there are two distinct values or groups that appear with the highest frequency.
Example: If the number of times an ATM is used per day falls into two common ranges, 60–69 and 80–89, and both have the same frequency, the data is bimodal because there are two “peaks.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some problems with the mode?

A
  • Mode is not useful when dataset has many unique values with no frequently occuring values.
  • The mode is sensitive to how data is grouped in continuous datasets. Changing the grouping intervals/ ranges can lead to a different mode, making it unreliable.
  • In datasets with a large range of values, the mode only tells you the most frequent value but not the overall pattern or distribution of the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the median?

A

The median is the middle value of a dataset when the data is arranged in order. It’s useful because it isn’t affected by extremely high or low values (outliers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate the median?

A

Step 1: Rank the Data: Arrange the dataset in order from smallest to largest.
Example: For the dataset 7, 3, 5, 1, 2, you first order it: 1, 2, 3, 5, 7.

Step 2: Find the Middle:
* If the dataset has an odd number of values, the median is the exact middle value.
Example: For 1, 2, 3, 5, 7, the median is 3 because it’s the middle number.
* If the dataset has an even number of values, the median is the average of the two middle numbers. Example: For the dataset 18, 19, 20, 21, the median is- (19+20)/2=19.5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the formula for the median position?

A

To find the position of the median, use the formula:
Median position= (n+1)/2

where n is the number of data points. This formula tells you where the median is located in the ordered list.

17
Q

What is the measure of spread?

A

A measure of spread (also known as a measure of variability or dispersion) refers to a statistic that describes how much the values in a dataset vary or spread out from the central value (like the mean or median). It helps to show whether the data points are closely clustered together or widely dispersed.

18
Q

What are the measures of spread for the median?

A
  • Range: One way to understand the spread around the median is by calculating the range, which is the difference between the maximum and minimum values.
    Example: In the dataset 18, 19, 20, 21, 36, the range is
    Range=36−18=18
    This range tells us how spread out the data is, but it can be misleading if there are outliers (like 36 in this example).
  • Interquartile Range (IQR): A more robust way of measuring the spread is the interquartile range (IQR), which focuses on the middle 50% of the data, ignoring extreme values. The IQR is the difference between the first and third quartiles (Q1 and Q3), which split the data into four equal parts.
    Example: For a dataset, if Q1 = 18 and Q3 = 21, then the IQR is:
    IQR=21−18=3
    IQR=21−18=3

This gives a better idea of how the data clusters around the median.

19
Q

What are quartiles?

A

Quartiles are values that divide a dataset into four equal parts, each containing 25% of the data. They help in understanding the distribution and spread of the data by showing how it is divided into quarters.

20
Q

What are the 4 quartiles?

A
  • Q1 (First Quartile): The value below which 25% of the data lies.
  • Q2 (Second Quartile or Median): The middle value of the data (50% below and 50% above).
  • Q3 (Third Quartile): The value below which 75% of the data lies.
  • Q4 (Technically, the maximum value): The highest value in the dataset. However, we don’t usually refer to “Q4” because it represents the 100th percentile, the end of the data.
21
Q

How to calculate the first quartile?

A

Formula= (n+1)/4

n is the number of data or count of data points. the ans gives position

22
Q

How to calculate the second quartile (median)?

A

(n+1)/2

answer reveals position of value

23
Q

How to calculate the third quartile?

A

3 * ((n+1)/4)

answer reveals position of value

24
Q

What is the five-figure summary?

A

The five-figure summary is a quick way to describe the distribution of a dataset by highlighting five key values. These five values help understand the spread and shape of the data.

25
Q

What does the five-figure summary consist of?

A
  1. Minimum: The smallest value in the dataset.
  2. Q1 (First Quartile): The 25th percentile, where 25% of the data lies below this value.
  3. Median (Q2): The middle value or the 50th percentile, dividing the data into two equal parts.
  4. Q3 (Third Quartile): The 75th percentile, where 75% of the data lies below this value.
  5. Maximum: The largest value in the dataset.
26
Q

What is a box plot?

A

A box and whisker plot, often called a box plot, is a visual representation of a dataset’s distribution. It provides a quick summary of the data’s central tendency, spread, and outliers using the five-number summary (minimum, Q1, median, Q3, and maximum).

27
Q

What are the components of a box plot?

A
  1. The Box: Rectangular box representing IQR (Q1 to Q3). A narrow box indicates that the data is tightly clustered, while a wide box shows more variability.
  2. The Median line: A line inside the box represents the median. If the median is closer to Q1, the data is right-skewed. If the median is closer to Q3, the data is left-skewed.
  3. The Whiskers: Lines extending from the edges of the box representing minimum and maximum values.
  4. Outliers: data points that lie outside the whiskers often marked with dots or small circles.