Module 2: Summarizing Data Flashcards

1
Q

scatterplots

A

-useful for visualizing the relationship between 2 numerical variables
-each point is a single case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

dot plots

A

useful for visualizing one numerical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is one way to measure the centre of a distribution of data?

A

the mean (average)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sample statistic

A

point estimate of the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

histograms

A

-view of data density
-convenient for describing the shape of the data distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what do higher bars on histograms represent?

A

where the data are relatively more common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 types of modality

A

unimodal, bimodal, multimodal, and uniform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 types of skewness

A

right skewed, left skewed, or symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2 measures of variability

A

variance and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

deviation

A

distance of an observation from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

variance

A

-average squared deviation from the mean
-tells you the amount of spread in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

standard deviation

A

-square root of the variance and has the same units as the data
-useful for considering how far data are distributed around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

box plot

A

summarizes a data set using 5 statistics while also plotting unusual observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

5 statistics (plus 1 optional one) used for box plots

A

upper whisker, Q3, median, Q1, lower whisker, mean (optional)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

median

A

value that splits the data in half when ordered in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the median when there are an even number of observations?

A

the average of the 2 values in the middle

17
Q

25th percentile =

A

first quartile Q1

18
Q

50th percentile=

19
Q

75th percentile=

A

third quartile Q3

20
Q

outlier

A

-observation beyond the max reach of the whiskers
-appears extreme relative to the rest of the data

20
Q

interquartile range, IQR

A

range between Q3 and Q1

20
Q

whiskers

A

-capture data outside of the IQR box

20
Q

T or F: median and IQR are more robust to skewness and outliers than mean and SD

20
Q

is distribution is skewed or has extreme outliers, centre is often defined as _________

A

the median

20
if distribution is symmetric, centre is often defined as _______
the mean
21
contingency table
summarizes data for 2 categorical variables
21
name 2 plots that combine numerical and categorial data to compare numerical data across groups
side-by-side plots and multiple histograms
21
bar plot
displays a single categorical variable
22
relative frequency bar plot
bar plot where there are proportions instead of frequencies
23
stacked bar plot
graphical display of contingency table info for counts
24
side-by-side bar plot
same info as stacked bar plot but has info beside each other instead of on top
25
frequency
shows the count in each category *difficult to interpret if groups have unequal numbers
26
row proportion
shows the proportion of the row total *easier to compare between rows
27
column proportion
useful to show proportion of explanatory variable