Module 2: Summarizing Data Flashcards

1
Q

scatterplots

A

-useful for visualizing the relationship between 2 numerical variables
-each point is a single case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

dot plots

A

useful for visualizing one numerical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is one way to measure the centre of a distribution of data?

A

the mean (average)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sample statistic

A

point estimate of the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

histograms

A

-view of data density
-convenient for describing the shape of the data distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what do higher bars on histograms represent?

A

where the data are relatively more common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 types of modality

A

unimodal, bimodal, multimodal, and uniform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 types of skewness

A

right skewed, left skewed, or symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2 measures of variability

A

variance and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

deviation

A

distance of an observation from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

variance

A

-average squared deviation from the mean
-tells you the amount of spread in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

standard deviation

A

-square root of the variance and has the same units as the data
-useful for considering how far data are distributed around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

box plot

A

summarizes a data set using 5 statistics while also plotting unusual observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

5 statistics (plus 1 optional one) used for box plots

A

upper whisker, Q3, median, Q1, lower whisker, mean (optional)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

median

A

value that splits the data in half when ordered in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the median when there are an even number of observations?

A

the average of the 2 values in the middle

17
Q

25th percentile =

A

first quartile Q1

18
Q

50th percentile=

A

median

19
Q

75th percentile=

A

third quartile Q3

20
Q

outlier

A

-observation beyond the max reach of the whiskers
-appears extreme relative to the rest of the data

20
Q

interquartile range, IQR

A

range between Q3 and Q1

20
Q

whiskers

A

-capture data outside of the IQR box

20
Q

T or F: median and IQR are more robust to skewness and outliers than mean and SD

A

true

20
Q

is distribution is skewed or has extreme outliers, centre is often defined as _________

A

the median

20
Q

if distribution is symmetric, centre is often defined as _______

A

the mean

21
Q

contingency table

A

summarizes data for 2 categorical variables

21
Q

name 2 plots that combine numerical and categorial data to compare numerical data across groups

A

side-by-side plots and multiple histograms

21
Q

bar plot

A

displays a single categorical variable

22
Q

relative frequency bar plot

A

bar plot where there are proportions instead of frequencies

23
Q

stacked bar plot

A

graphical display of contingency table info for counts

24
Q

side-by-side bar plot

A

same info as stacked bar plot but has info beside each other instead of on top

25
Q

frequency

A

shows the count in each category
*difficult to interpret if groups have unequal numbers

26
Q

row proportion

A

shows the proportion of the row total
*easier to compare between rows

27
Q

column proportion

A

useful to show proportion of explanatory variable