week 5 data visual Flashcards
What is a random variable?
A quantity with values not known with certainty.
Define variation in the context of data.
The difference in a variable measured over observations.
What does a frequency distribution describe?
The values of a variable and how often they appear in the data.
What is a categorical variable?
Data consisting of labels or names for which arithmetical manipulation is impossible.
What is a quantitative variable?
Data consisting of numerical values for which arithmetical manipulation is possible.
What is a sample in statistics?
A subset of the population that makes data collection feasible.
What is the relative frequency of a bin?
The proportion of items belonging to a class.
How is percent frequency calculated?
Relative frequency multiplied by 100.
What characterizes a probability distribution?
It characterizes the variability of a random variable.
What is a histogram?
A column chart with no spaces between the columns, used for quantitative data.
What is the recommended number of bins for a histogram?
Between 5-20 depending on the number of observations.
What should the width of bins in a histogram be?
The same for all bins.
What is the first bin in a histogram supposed to include?
The smallest value in the data.
What is a frequency polygon?
A visualisation tool useful for comparing distributions using lines instead of columns.
What is a trellis display?
A vertical or horizontal arrangement of individual charts that differ only by the data they display.
What is a strip chart used for?
Displaying individual values.
What is the mean in statistics?
The average value of a dataset.
What does the median represent?
The middle value in a dataset when ordered.
What is the mode?
The value that appears most frequently in a dataset.
What is the range in a dataset?
The largest value minus the smallest value.
How is standard deviation defined?
Based on the average deviation from the mean.
What does the Pth percentile indicate?
A value that exceeds p% of the observations in the set.
What is the interquartile range (IQR)?
Q3 minus Q1, representing the middle 50% of a dataset.
What is a confidence interval?
A parameter estimate such as the mean or the proportion of a population of interest.