Module 3: Data Visualization Flashcards
Data visualization
A distribution of a variable is a description of the values it can take on, & a count of how often each value occurs
Sample distributions
The distribution of a sample we obtain from a population - lists the possible outcomes & the number of times each occurs in this sample
One observation
One unit in the sample (or single coin flip result)
Assess the distribution
We want to describe the distribution in a useful way to enable subsequent analysis - shape, extreme values, centre, & spread
Shape of distribution
Left/negatively skewed: the mean is less than the median Right/positively skewed: the mean is greater than the median symmetric: the mean & the median are equal
Centre of the distribution
Use a measure of central tendency: the mean, median or mode. Report the median if you have a skewed distribution - report the mean if you have a symmetric distribution
Spread of distribution
Use a measure of variation: the range, mean absolute deviation(MAD), variance, & standard deviation - for boxplot use the interquartile range (IQR)
Spread (visually)
Evaluated visually by looking at the relative heights of the bars
The 5 number summary
-the minimum (min) -Q1(the 25th percentile) -the median (Q2, the 50th percentile -Q3 (the 75th percentile) -the maximum (max)
EXCEL percentiles
Q1=percentile.exc(values, 0.25) - Q2=percentile.exc(values, 0.50) - Q3=percentile.exc(values, 0.75)
Percentiles
The pth percentile of a data set is the value such that p percent of the observations are less than or equal to the value
Quartile
A quartile of a data set is the 25th percentile (Q1, Q2, Q3)
The interquartile range (IQR)
A measure of spread of a data set, & is the difference between Q3 & Q1 - IQR= Q3-Q1
The inner fence
1.5 x IQR. - if a sample value is greater than Q3 + inner fence, or less than Q1 - inner fence it is extreme
Lower whisker
(Q1 - IF, Q1)