Topic 1 Descriptive Statistics Flashcards
Data are time-consuming, ____, and of varying quality.
commercially sensitive
A data set of size __ is denoted as: {xi}ᵢ₌₁,…,ₙ
n
What’s the difference between a dot plot and a histogram?
Dot plots show individual data points (multi-frequency data set) while histograms group data into bins and show frequencies.
What affects the appearance of a histogram the most?
A. Sample size
B. Axis label
C. Bin size
D. Title
C. Bin size
Relative frequency = frequency / ____
total number of data points (n)
What is plotted on the vertical axis of a histogram?
The absolute (or relative) frequency is plotted on the
vertical axis
What does F(x) in a CDF plot represent?
The relative frequency of data ≤ x
What is the downside of using histograms?
Downside is that histogram differs depending on bin size
F(x) =
- 0 if x < x₍₁₎
- j/n if xⱼ ≤ x < xⱼ₊₁
- 1 if x ≥ x₍ₙ₎
Formula for Arithmetic Mean
x̄ = (1/n) ∑xi
Which measure is affected most by outliers?
Mean
Which is not a measure of central tendency?
A. Median
B. Mode
C. Range
D. Mean
C. Range
The geometric mean is only used for ____ data.
non-zero
What does interquartile range (IQR) measure?
The spread between Q3 and Q1 (middle 50% of data)
Formula for Sample variance (unbiased)
s² = (1/(n−1)) ∑(xi − x̄)²
Formula for Sample Standard Deviation (unbiased)
s = √(s²) = √[(1/(n−1)) ∑(xi − x̄)²]
What measures asymmetry?
Skewness
The coefficient of variation is given by: vx = ____ / x̄
standard deviation (sx)
Why is the mean absolute deviation more robust than standard deviation?
It is less influenced by outliers
What does a positive skew indicate about the data?
Mean > Median > Mode
Formula
Biased skewness:
g₁(x) = (1/n) ∑(xi − x̄)³ / ωₓ³
Skewness is a ____ quantity (unit-less).
non-dimensional
What is the median if n is odd?
median(x) = x₍ₙ₊½₎
What is the mode?
The most common value
Formula for geometric mean:
x* = (∏ⁿᵢ₌₁ xᵢ)¹⁄ⁿ
What is used to test if two datasets have a linear relationship?
Covariance and correlation coefficient
Formula
Sample covariance:
cov(x,y) = (1/(n−1)) ∑(xi − x̄)(yi − ȳ)
Formula
Correlation coefficient:
cₓᵧ = (1/(n−1)) ∑(xi − x̄)(yi − ȳ) / (sx·sy)
A correlation coefficient of 0 means ____ correlation.
no
What is the range of correlation coefficient cₓᵧ?
A. [−2, 2]
B. [0, 1]
C. [−1, 1]
D. [−∞, ∞]
C. [−1, 1]
What does a negative skew indicate about the data?
mode > median > mean
What does a symmetric skew indicate about the data?
mode= median= mean
The quantile value for the i-th data point is given by:
yᵢ = (i − 0.5)/n
To determine the percentile qₚ(x), which condition must j satisfy?
A. j < p
B. j − 0.5/n < p/100 ≤ j + 0.5/n
C. j/n > p
D. j = p + 0.5
B. j − 0.5/n < p/100 ≤ j + 0.5/n
To estimate qₚ(x), take the mean of data values at positions xⱼ and ____.
xⱼ₊₁