Chapter 3: Summarising Data Flashcards

1
Q

What is a measure of central tendency?

A

Represents the ‘centre’ of a set of data, including mode, median, and mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define mode in data.

A

The one that appears the most; the most common value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a modal class?

A

The class with the highest frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the median?

A

The middle value of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you find the median of discrete data?

A
  1. Put the numbers in order from smallest to largest. 2. Find the (n+1)th value, which indicates the median position.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula to find the median position?

A

(n + 1) / 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What should you do if the median position is a decimal?

A

Find the two surrounding values and average them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you find the median in grouped data?

A

Identify the median class which contains the median position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the estimated median using linear interpolation?

A

Use ½ n to find the median position and calculate within the median class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the mean (arithmetic mean)?

A

The sum of all values divided by the number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Provide the formula for mean.

A

𝑥̅ = ∑𝑥 / n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you calculate the mean from a frequency table?

A

Add an extra column for f × x, sum it, and divide by total frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula for weighted mean?

A

Weighted Mean = ∑(weight × value) / ∑weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the geometric mean?

A

The nth root of the product of all values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is transforming data useful?

A

To simplify calculations with large numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens to the mode when new values are added?

A

It could change if the new value affects which value appears most.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does adding a value greater than the median affect the median?

A

The median might increase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the range in statistics?

A

The difference between the largest and smallest values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the formula for range?

A

Range = Largest Value - Smallest Value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define interquartile range (IQR).

A

The middle 50% of the data when in order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the formula for the interquartile range?

A

IQR = Upper Quartile - Lower Quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the lower quartile (LQ)?

A

The value at 25% of the way through the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: The mean is always affected by extreme values.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

List the advantages of using mode.

A
  • Easy to use
  • Always a value in the data
  • Unaffected by extreme values
  • Can be used with quantitative and qualitative data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

List the disadvantages of using mode.

A
  • May not exist or may have multiple modes
  • Cannot be used to calculate measures of spread
  • Not always representative of the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the advantages of using median?

A
  • Easy to find when data is in order
  • Unaffected by outliers
  • Best with skewed data
  • Can calculate quartiles, IQR, and skew.
27
Q

What are the disadvantages of using median?

A
  • May not be a data value
  • Not always representative of the data.
28
Q

What are the advantages of using mean?

A
  • Uses all the data
  • Can calculate standard deviation and skew.
29
Q

What are the disadvantages of using mean?

A
  • May not be a data value
  • Affected by extreme values or outliers
  • Can be distorted by open-ended classes.
30
Q

What is the formula for Interquartile Range (IQR)?

A

IQR = UQ - LQ

31
Q

Define Lower Quartile (LQ).

A

The value ¼ of the way through the data; 25% of the data is less than the LQ.

32
Q

Define Upper Quartile (UQ).

A

The value ¾ of the way through the data; 25% of the data is above the UQ.

33
Q

How is LQ calculated for discrete data?

A

LQ = ¼(n+1)th value

34
Q

How is UQ calculated for discrete data?

A

UQ = ¾(n+1)th value

35
Q

What is the Interpercentile Range (IPR)?

A

The difference between two percentiles.

36
Q

What does a Box Plot represent?

A

Important features of the data and gives a summary of the spread/skew of the data.

37
Q

What are the five pieces of information included in a Box Plot?

A
  • Minimum Value
  • Lower Quartile (LQ)
  • Median
  • Upper Quartile (UQ)
  • Maximum Value
38
Q

What is the formula for calculating standard deviation (SD) using discrete data?

A

σ = √(1/n ∑(x - x̅)²) or σ = √(∑x²/n - (∑x)²/n²)

39
Q

What does a smaller standard deviation (SD) indicate?

A

The data is closer to the mean.

40
Q

What does a larger standard deviation (SD) indicate?

A

The data is more spread out from the mean.

41
Q

What is the Interdecile Range?

A

The difference between the first and ninth deciles.

42
Q

How are outliers defined in relation to IQR?

A

Values that are more than 1.5 x IQR above UQ or below LQ.

43
Q

What is the formula to identify outliers?

A

Outliers are values > UQ + (1.5 x IQR) or < LQ - (1.5 x IQR)

44
Q

How can outliers be identified using mean and standard deviation?

A

Values more than 3 SD away from the mean.

45
Q

What does skewness describe?

A

The shape of the distribution and how the data is spread out.

46
Q

What indicates a positive skew?

A

Most values are at the beginning of the data set, with the tail going in the positive direction of the x-axis.

47
Q

What indicates a negative skew?

A

Most values are at the end of the data set, with the tail pointing towards the negative direction of the x-axis.

48
Q

What does a symmetrical distribution mean?

A

The data is evenly distributed on both sides of the median.

49
Q

What is the formula for calculating skewness?

A

Skewness = 3(means - median) / standard deviation

50
Q

When comparing data sets, what should be considered?

A
  • Measure of average (mean/median/mode)
  • Measure of spread (range/IQR/SD)
  • Skewness
51
Q

What is the first step in drawing Box Plots?

A

Calculate LQ, UQ, median, and identify minimum and maximum values.

52
Q

What is the significance of the median in a Box Plot?

A

It marks the middle of the data, with 50% of the data above and below this value.

53
Q

What is the relationship between mean, median, and mode when comparing two data sets?

A

Mean/median/mode for data set A is larger than mean/median/mode for data set B, so on average, data set A is more than data set B.

54
Q

What does a larger range/IQR/SD indicate when comparing two data sets?

A

Range/IQR/SD for data set A is larger than that of data set B, so the results of data set A are more spread out/less consistant than those of data set B.

55
Q

What does a smaller range/IQR/SD imply about a data set?

A

Data set A has a smaller range/IQR/SD than data set B, which means the results for data set A are more consistant.

56
Q

How does standard deviation relate to the closeness of values to the mean?

A

Lower SD means values are closer to the mean and therefore higher SD means values are more spread out from the mean.

57
Q

What does it mean if a box plot for a data set is positively skewed?

A

Box plot for data set A is positively skewed, indicating that the majority of results were lower with few higher results.

58
Q

What does it mean if a box plot for a data set is negatively skewed?

A

Box plot for data set A is negatively skewed, indicating that the majority of results were higher with few lower results.

59
Q

What should always be referenced when interpreting data?

A

Always make reference to individual values and mention which data set is larger/smaller than the other clearly.

60
Q

When comparing averages, what terms should be used?

A

Comparing averages involves using mean, median, and mode.

61
Q

What should be included when interpreting data comparisons?

A

Always interpret in context and link back to the scenario in the question and labels on axes.

62
Q

What is the importance of pairing appropriate values when comparing data?

A

When comparing data, make sure to pair the appropriate values of average and spread.

63
Q

List the measures of average.

A
  • Mode
  • Median
  • Mean
64
Q

List the measures of spread.

A
  • Range
  • Range/IQR
  • Range/SD