Module 2: Summarizing Data Flashcards
scatterplots
-useful for visualizing the relationship between 2 numerical variables
-each point is a single case
dot plots
useful for visualizing one numerical variable
what is one way to measure the centre of a distribution of data?
the mean (average)
sample statistic
point estimate of the population mean
histograms
-view of data density
-convenient for describing the shape of the data distribution
what do higher bars on histograms represent?
where the data are relatively more common
4 types of modality
unimodal, bimodal, multimodal, and uniform
3 types of skewness
right skewed, left skewed, or symmetric
2 measures of variability
variance and standard deviation
deviation
distance of an observation from the mean
variance
-average squared deviation from the mean
-tells you the amount of spread in the data
standard deviation
-square root of the variance and has the same units as the data
-useful for considering how far data are distributed around the mean
box plot
summarizes a data set using 5 statistics while also plotting unusual observations
5 statistics (plus 1 optional one) used for box plots
upper whisker, Q3, median, Q1, lower whisker, mean (optional)
median
value that splits the data in half when ordered in ascending order