Module 2A Visualising Variability Flashcards
What is variation in the context of data analysis?
The spread or difference between data points in a dataset, showing how much they differ from each other.
What is a random variable?
A quantity whose value is uncertain and can vary based on chance.
What does a frequency distribution describe?
The value of a variable and how often they appear in the data
What is a categorical variable?
Data that consists of labels or names for which arithmetic manipulation is impossible
Examples include gender, color, or brand names.
Define a quantitative variable.
Data that consists of numerical values for which arithmetical manipulation is possible
Examples include age, height, or income.
What is a sample in statistics?
A subset of the population that makes data collection feasible
Samples are used to infer characteristics about larger populations.
What is relative frequency?
The proportion of times a value occurs in a dataset, calculated as: Frequency of a value / Total number of values
How is percent frequency calculated?
Frequency of a value / Total number of values * 100
What is a probability distribution?
It shows how the possible values of a random variable are distributed and the likelihood of each value occurring.
What does Benford’s Law state?
States that in many data sets, the proportion of observations in which the first digit is 1, 2, 3, 4, 5, 6, 7, 8, or 9 follows a specific distribution
This law is often used in fraud detection and data analysis.
What does skewness represent in a quantitative distribution?
The lack of symmetry in a quantitative distribution
It indicates how much the distribution deviates from a normal distribution.
What is a frequency polygon?
A line graph that shows the distribution of data by plotting the midpoints of each class interval and connecting them with straight lines
What is a Trellis Display?
A grid of small graphs that shows how data patterns change across different categories or conditions. (Same formatting but different data sets)
What is the first quartile?
25th percentile
Quartiles divide the data set into four equal parts.
What is the second quartile also known as?
The median
It represents the middle value of the data set.
How is the interquartile range calculated?
3rd quartile minus 1st quartile
It measures the spread of the middle 50% of the data.
What does the mean represent?
The sum of the values divided by the sample size
It is a measure of central tendency.
How is the median defined?
The middle value of the sample size; if sample size is even, take the average of the two middle points
It is less affected by outliers compared to the mean.
What is the mode?
The most frequent value(s) in the data set
A data set can have multiple modes or none at all.
How is the range calculated?
Largest value minus smallest value in the set
It gives a measure of the spread of the data.
What does standard deviation measure?
The average deviation from the mean
It quantifies the amount of variation or dispersion in a set of values.
What is the Empirical Rule for bell-shaped distributions regarding data values within one standard deviation?
68% of the data values lie within one standard deviation of the mean
This rule provides a quick estimation of data spread.
What percentage of data values lie within two standard deviations of the mean according to the Empirical Rule?
95%
This helps in understanding the distribution of data points.
What percentage of data values lie within three standard deviations of the mean according to the Empirical Rule?
99.7%
This is known as the 68-95-99.7 rule.
What does a Box-and-Whisker Plot use to display data?
It uses the measures of variability to display data.
It shows the median, quartiles, and potential outliers.
What is a Violin Chart?
An advanced visualization that combines a box and whisker chart with a rotated and mirrored kernel density chart
It provides a richer representation of data distribution.
What is statistical inference?
The process of using data from a sample to make conclusions or predictions about a larger population.
What is a confidence interval?
Provides a range of values within which the true population parameter is expected to lie.
How is a confidence interval on a mean calculated?
Sample mean ± margin of error
It reflects the uncertainty associated with the sample mean.
What does the margin of error represent?
The maximum expected difference between the sample estimate and the true population value. (Uncertainty on the parameter)
What is time series data?
Data collected or recorded at regular time intervals, showing how values change over time.
It is often used for forecasting and trend analysis.
What is a time series chart?
A line graph that shows how data points change over time, with time on the x-axis and the measured values on the y-axis.