Topic 1 Descriptive Statistics Flashcards

1
Q

Data are time-consuming, ____, and of varying quality.

A

commercially sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A data set of size __ is denoted as: {xi}ᵢ₌₁,…,ₙ

A

n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the difference between a dot plot and a histogram?

A

Dot plots show individual data points (multi-frequency data set) while histograms group data into bins and show frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What affects the appearance of a histogram the most?
A. Sample size
B. Axis label
C. Bin size
D. Title

A

C. Bin size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relative frequency = frequency / ____

A

total number of data points (n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is plotted on the vertical axis of a histogram?

A

The absolute (or relative) frequency is plotted on the
vertical axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does F(x) in a CDF plot represent?

A

The relative frequency of data ≤ x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the downside of using histograms?

A

Downside is that histogram differs depending on bin size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

F(x) =

A
  • 0 if x < x₍₁₎
  • j/n if xⱼ ≤ x < xⱼ₊₁
  • 1 if x ≥ x₍ₙ₎
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Formula for Arithmetic Mean

A

x̄ = (1/n) ∑xi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which measure is affected most by outliers?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which is not a measure of central tendency?
A. Median
B. Mode
C. Range
D. Mean

A

C. Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The geometric mean is only used for ____ data.

A

non-zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does interquartile range (IQR) measure?

A

The spread between Q3 and Q1 (middle 50% of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Formula for Sample variance (unbiased)

A

s² = (1/(n−1)) ∑(xi − x̄)²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Formula for Sample Standard Deviation (unbiased)

A

s = √(s²) = √[(1/(n−1)) ∑(xi − x̄)²]

17
Q

What measures asymmetry?

18
Q

The coefficient of variation is given by: vx = ____ / x̄

A

standard deviation (sx)

19
Q

Why is the mean absolute deviation more robust than standard deviation?

A

It is less influenced by outliers

20
Q

What does a positive skew indicate about the data?

A

Mean > Median > Mode

21
Q

Formula
Biased skewness:

A

g₁(x) = (1/n) ∑(xi − x̄)³ / ωₓ³

22
Q

Skewness is a ____ quantity (unit-less).

A

non-dimensional

23
Q

What is the median if n is odd?

A

median(x) = x₍ₙ₊½₎

24
Q

What is the mode?

A

The most common value

25
Q

Formula for geometric mean:

A

x* = (∏ⁿᵢ₌₁ xᵢ)¹⁄ⁿ

26
Q

What is used to test if two datasets have a linear relationship?

A

Covariance and correlation coefficient

27
Q

Formula
Sample covariance:

A

cov(x,y) = (1/(n−1)) ∑(xi − x̄)(yi − ȳ)

28
Q

Formula
Correlation coefficient:

A

cₓᵧ = (1/(n−1)) ∑(xi − x̄)(yi − ȳ) / (sx·sy)

29
Q

A correlation coefficient of 0 means ____ correlation.

30
Q

What is the range of correlation coefficient cₓᵧ?
A. [−2, 2]
B. [0, 1]
C. [−1, 1]
D. [−∞, ∞]

A

C. [−1, 1]

31
Q

What does a negative skew indicate about the data?

A

mode > median > mean

32
Q

What does a symmetric skew indicate about the data?

A

mode= median= mean

33
Q

The quantile value for the i-th data point is given by:

A

yᵢ = (i − 0.5)/n

34
Q

To determine the percentile qₚ(x), which condition must j satisfy?
A. j < p
B. j − 0.5/n < p/100 ≤ j + 0.5/n
C. j/n > p
D. j = p + 0.5

A

B. j − 0.5/n < p/100 ≤ j + 0.5/n

35
Q

To estimate qₚ(x), take the mean of data values at positions xⱼ and ____.

A

xⱼ₊₁