section 2.1: examining numerical data Flashcards

1
Q

What is a scatterplot used for?

A

Visualizing the relationship between two numerical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a dot plot visualize?

A

One numerical variable. Darker colors represent areas with more observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a stacked dot plot represent?

A

Higher bars indicate areas with more observations, aiding in judging the center and shape of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of a histogram?

A

Provides a view of data density, showing where data is relatively more common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the term ‘center’ refer to in statistics?

A

Mean or average of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for the sample mean?

A

x̄ = (x1 + x2 + x3 + … + xn) / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the population mean computed?

A

Computed the same way as sample mean, usually impossible to calculate due to lack of access to the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does x̄ represent?

A

Sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does μ represent?

A

Population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define unimodal

A

A distribution with a single peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between bimodal and multimodal?

A

Bimodal has two peaks, while multimodal has several prominent peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What characterizes a uniform distribution?

A

No apparent peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does ‘right skewed’ refer to?

A

A distribution with a tail extending to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does ‘left skewed’ mean?

A

A distribution with a tail extending to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the formula for variance?

A

s^2 = (sum of(x - x̄)^2)/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is standard deviation calculated?

A

s = √(s^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the median in a dataset?

A

The value that splits the data in half when ordered in ascending order

18
Q

What does Q1 represent?

A

25th percentile, also called the first quartile

19
Q

What is the 50th percentile also known as?

A

The median

20
Q

What does Q3 represent?

A

75th percentile, also called the third quartile

21
Q

Define interquartile range (IQR)

A

The range where the middle 50% of the data lies, calculated as IQR = Q3 - Q1

22
Q

What is the maximum upper whisker reach?

A

Q3 + 1.5 x IQR

23
Q

What is the maximum lower whisker reach?

A

Q1 - 1.5 x IQR

24
Q

Define an outlier

A

An observation beyond the maximum reach of the whiskers

25
Q

What are robust statistics?

A

Median and IQR

26
Q

What are not robust statistics?

A

Mean, variance (standard deviation)

27
Q

When describing distributions, what three aspects do we focus on?

A

Center, shape, and spread of distributions

28
Q

Which plots are used for 2-numerical variable distributions?

A

Scatter plot

29
Q

Which plots are used for 1-numerical variable distributions?

A

Dot plot, stacked dot plot, histogram, box plot

30
Q

Why are histograms important?

A

They are the most important distributions for analysis

31
Q

How can the chosen bin width affect a histogram?

A

It can alter the story the histogram is telling

32
Q

What does the median represent in relation to data values?

A

50% of the values are below it and 50% are above

33
Q

What are ways to measure center?

A
  • Histograms
  • Mean (average)
  • Median
34
Q

What are ways to measure shape?

A
  • Modality
  • Skewness
35
Q

What are ways to measure spread?

A
  • Variance (standard deviation)
  • IQR
36
Q

For skewed distributions, which measures are more helpful?

A

Median and IQR to describe center and spread

37
Q

For symmetric distributions, which measures are more helpful?

A

Mean and SD to describe center and spread

38
Q

Which variable is expected to be uniformly distributed: (a) heights of KSU students, (b) salaries of a random sample of people from North Carolina, (c) house prices in America, (d) birthdays of classmates (day of the month)?

A

(d) Birthdays of classmates (day of the month)

39
Q

Why is it important to look for outliers?

A
  • Identify extreme skew in the distribution
  • Identify data collection and entry errors
  • Provide insight into interesting features of the data
40
Q

How would replacing the largest value with $10 million affect the mean, median, standard deviation, and IQR of household income?

A
  • Mean: increase
  • Median: may not change much
  • Standard deviation (variance): increase
  • IQR: stay the same
41
Q

If the smallest value in household income is replaced with $10 million, how does it affect the mean and median?

A
  • Mean: increase
  • Median: stay the same or not change by much
42
Q

For estimating typical household income for a student, is the mean or median more relevant?

A

The median, because the distribution is skewed