Week 5 - Visualising Variability Flashcards
What is a random variable?
A quantity with values not known with certainty.
Define variation in statistics.
The difference in a variable measured over observations.
What does a frequency distribution describe?
The values of a variable and how often they appear in the data.
What is a categorical variable?
Data consisting of labels or names for which arithmetical manipulation is impossible.
What is a quantitative variable?
Data consisting of numerical values for which arithmetical manipulation is possible.
What is a sample?
A subset of a population.
What is relative frequency?
The proportion of items belonging to a class.
How is percent frequency calculated?
Relative frequency multiplied by 100.
What does a probability distribution characterize?
The variability of a random variable.
What does Benford’s law state?
In many data sets, the proportion of observations in which the first digit is 1, 2, 3, 4, 5, 6, 7, 8, 9 follows a specific distribution.
Define a histogram.
A column chart with no spaces between the columns, representing frequency of bins.
What is the recommended number of bins in a histogram?
Between 5 and 20, depending on the number of observations.
How is approximate bin width calculated?
Largest value minus smallest value divided by the number of bins.
What is a kernel density chart?
A continuous alternative to a histogram that uses kernel density estimation.
What does skewness represent in a distribution?
The lack of symmetry in a quantitative distribution.
What is a frequency polygon?
A visualization tool that uses lines to connect the counts of observations in bins.
What is a trellis display?
A vertical or horizontal arrangement of individual charts that differ only by the data they display.
Define mean.
Sum of the values divided by the sample size.
How is the median determined?
Average of the two middle points if the sample size is even; middle number if odd.
What is mode?
The most frequent value in a data set.
How is range calculated?
Largest value minus smallest value in the set.
What is standard deviation?
Based on average deviation from the mean.
Define percentile.
The pth percentile is a value that exceeds p% of the observations in the set.
What is the interquartile range?
Q3 minus Q1.
What is statistical inference?
The process of collecting sample data to make estimates or draw conclusions about a population.
What is a confidence interval?
A parameter estimate such as the mean or proportion of a population.
What does margin of error represent?
The uncertainty on the parameter estimate at a given confidence level.
What factors influence the margin of error for a confidence interval on a mean?
- The confidence level
- The variability of sample values (s.d.)
- The sample size
What is time series data?
A sequence of observations on a variable measured at successive points in time.
What is a time series chart?
A line chart with time units on the horizontal axis and variable values on the vertical axis.