Chapter 1 ~ Exploring Data Flashcards
Distribution
Indicates what values a variable takes on and the frequency (i.e. how often) at which it takes on these variables
Outlier
An individual observation that falls outside the overall pattern of the graph
Relative frequency histogram
Has the same shape as a histogram with the exception that the vertical axis measures relative frequencies instead of frequencies
What are the key features of a histogram?
Centre
Spread
Shape
What are the three basic shapes of histograms?
Symmetric
Skewed right
Skewed left
What are the three measures of centre?
Mean
Median
Mode
Sample mean
Arithmetic average or arithmetic mean
Mode
Element or elements that occur most often.
Median
“Middle number” of the data when it has been arranged in increasing order.
Bimodal
A data set with two modes
Median position formula
(n + 1)/2
What are the measures of spread?
Range
Interquartile range (IQR)
Five number summary
Variance and standard deviation
Range
Largest # – smallest #
Interquartile range (IQR)
Q3 – Q1
Five number summary
Minimum, Q1, median, Q3, maximum
Standard deviation
Measures how numbers are spread out from the mean.
Non resistant to outliers.
What should be used to describe a symmetric distribution?
Mean and standard deviation
What should be used to describe a skewed distribution?
Median, IQR, five number summary
1.5 IQR Rule
A data point is considered to be an outlier if it lies more than 1.5 IQRs below Q1 or 1.5 IQRs above Q3
Boxplot
(Aka box and whisker plot)
A graph which displays the five number summary of a set of data
Modified boxplot
A graph which also displays the five number summary of a data set, however it also indicates whether the data set contains outliers (according to the 1.5 IQR Rule)
Side by side boxplots
Can be used to compare the distributions of two data sets