section 2.2: considering categorical data Flashcards

1
Q

what is a contingency table?

A

a table that summarizes data for two categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a bar plot?

A

common way to display a single categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a relative-frequency bar plot?

A

a bar plot where proportions instead of frequencies are shown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how are bar plots different than histograms?

A

Bar plots are used for displaying distributions of categorical variables, while
histograms are used for numerical variables. The x-axis in a histogram is a
number line, hence the order of the bars cannot be changed, while in a bar plot
the categories can be listed in any order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is variance?

A

the standard deviation squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the equation for variance?

A

s^2 = (sum of(x - x̄)^2)/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what points make a larger difference in variance?

A

points that are far away from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we use the squared deviation in the calculation of variance?

A

To get rid of negatives so that observations equally distant from the mean are weighed equally.
To weigh larger deviations more heavily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is standard deviation?

A

the square root of the variance, and has the
same units as the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the median?

A

the value that splits the data in half when ordered in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the 50th percentile?

A

the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the 25th percentile?

A

the first quartile, Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the 75th percentile?

A

the third quartile, Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is interquartile range (IQR)?

A

where the middle 50% of the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the equation for IQR?

A

IQR = Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does the box in a box plot represent?

A

represents the middle 50% of the data, and
the thick line in the box is the median

17
Q

what is the max upper whisker reach of a box plot?

A

Q3 + 1.5 x IQR

18
Q

what is the max lower whisker reach of a box plot?

A

Q1 - 1.5 x IQR

19
Q

what is an outlier?

A

observation beyond the maximum reach of the whiskers

20
Q

why is it important to look for outliers?

A

Identify extreme skew in the distribution.

Identify data collection and entry errors.

Provide insight into interesting features of the data.

21
Q

what are the robust statistics?

A

median and IQR

22
Q

what are the non-robust statistics?

A

mean, variance (standard deviation)

23
Q

for skewed distributions it is often more helpful to use ___________ to describe the center and spread

A

median and IQR

24
Q

for symmetric distributions it is often more helpful to use __________ to describe the center and spread

A

the mean and SD

25
Q

if a distribution is symmetric, the center is defined as _______

A

the mean
mean ~ median

26
Q

if a distribution is skewed or has extreme outliers, the center is defined as _______

A

the median

27
Q

if a distribution is right-skewed, the mean is

A

greater than the median

28
Q

if a distribution is left-skewed, the mean is

A

less than the median

29
Q

what is a side by side bar plot?

A

Displays the same information by placing
bars next to, instead of on top of, each other

30
Q

what is a standardized stacked bar plot?

A

a stacked bar plot where the variables are measured as a proportion compared to the whole

31
Q

what is a mosaic plot?

A

visualization technique suitable for contingency tables that resembles a standardized stacked bar plot with the
benefit that we still see the relative group sizes of the primary variable as well.

32
Q

what are the ways to measure center?

A

histograms, mean (average), median

33
Q

what are the ways to measure shape?

A

modality, skewness

34
Q

what are the ways to measure spread?

A

variance (standard deviation), IQR

35
Q

If you would like to estimate the typical household income for a student, would you be more interested in the mean or median income?

A

the median, because the distribution is skewed