Week 12 - Descriptive statistics Flashcards

Question 1

Q

What are numerical measures of descriptive statistics?

Answer

A

measures of central tendency (location) and measures of dispersion (variability)

Question 2

Q

What are sample statistics?

Answer

A

If the measures are computed for data from a sample

Question 3

Q

What are population parameters?

Answer

A

If the measures are computed for data from a population

Question 4

Q

What is a sample statistic referred to?

Answer

A

as the point estimator of the corresponding population parameter

Question 5

Q

What are the 7 measures of location?

Answer

A

Mean
Median
Mode
Weighted Mean
Geometric Mean
Percentiles
Quartiles

Question 6

Q

What is the mean of a data set?

Answer

A

the average of all the data values

Question 7

Q

What is the sample mean?

Answer

A

The sample mean x̄ is a point estimate of the population mean m

Question 8

Q

What is the mean equation?

Answer

A

x̄ = ∑x_i/ n

numerator - sum of the values of the n observations
denominator - number of observations in the sample

Question 9

Q

What is the median of a data set?

Answer

A

is the value in the middle when the data items are arranged in ascending order

Question 10

Q

When is the mean the preferred measure of central location?

Answer

A

Whenever a data set has extreme values

Question 11

Q

When is the median most often reported for out of the measure of location?

Answer

A

annual income and property value data
A few extremely large incomes or property values can inflate the mean

Question 12

Q

How do we calculate the mean for an odd number of observations?

Answer

A

Say we have the following 7 observations:
Sort them in ascending order:
Median is the middle value: 19

Question 13

Q

How do we calculate the mean for an even number of observations?

Answer

A

Even number of observations:
Say we have 8 observations:
Sort them in ascending order:

Median is the average of the middle two values: (19 + 26)/2 = 22.5

Question 14

Q

Where are the mean and median on a symmetrical diagram?

Answer

A

equal at the middle

Question 15

Q

Where are the mean and median on a left skew diagram?

Answer

A

mode is at the top, going down the tail is median then mean

Question 16

Q

Where are the mean and median on a right skew diagram?

Answer

A

mode is at the top, going down the tail is median then mean

Question 17

Q

What is the mode?

Answer

A

The mode of a data set is the value that occurs with greatest frequency.
The greatest frequency can occur at two or more different values

Question 18

Q

What is bimodal data?

Answer

A

If the data have exactly two modes

Question 19

Q

What is multimodal data?

Answer

A

If the data have more than two modes

Question 20

Q

What is tthe weighted mean?

Answer

A

When the mean is computed by giving each data value a weight that reflects its importance

When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value

Question 21

Q

What is the weighted mean equation?

Answer

A

𝑥̅= (∑ 𝑤_𝑖 x 𝑥_𝑖)/ (∑𝑤_𝑖 )

x_i = value of observation i
w_i = weight for observation i

Question 22

Q

What is value weighted?

Answer

A

a type of weighted mean where the weights are based on the values themselves rather than being assigned separately

Question 23

Q

What is equal weighted return?

Answer

A

imple average of all returns, giving each asset or component the same importance, regardless of size or value. This is in contrast to a value-weighted return, where larger values (e.g., market capitalization) carry more weight

Question 24

Q

What is value weighted return equation?

Answer

A

value_x x r_x + value_y x r_y / value_x + value_y

Question 25

Q

What is equal weighted return equation?

Answer

A

=∑X_i / n

X_i = individual returns
n = number of assets or components

Question 26

Q

What is a portfolio return?

Answer

A

A portfolio return is the weighted average return of individual assets in the portfolio

usually equal the value weighted return

Question 27

Q

When is the geometric mean most appropriate to use?

Answer

A

most appropriate in situations where the data items to be summarised result from a ratio-type calculation, such as with growth rates or index numbers

calculated by multiplying all the numbers together and then taking the nth root of the product, where n is the total number of values

Question 28

Q

What is a percentile?

Answer

A

provides information about how the data are spread over the interval from the smallest value to the largest value

Admission test scores for colleges and universities are frequently reported in terms of percentiles

Question 29

Q

What is the pˆth percentile of a data set?

Answer

A

a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.

10th percentile of a data set is a value such that at least 10% of the items are less than or equal to 90% of the items

Question 30

Q

How to calculate a percentile?

Answer

A

Arrange the Data: Sort the data set in ascending order.

Determine the Position (i):
Calculate the position using the formula:
𝑖 = (p/100) x n where p is the desired percentile and n the number of observations

Locate the Percentile:
If 𝑖 is an integer, the p-th percentile is the average of the values at positions 𝑖 and 𝑖 +1
If 𝑖 is not an integer, round up to the next whole number, and the p-th percentile is the value at this position.

Question 31

Q

Example of percentile calculation

Answer

A

Consider a data set: 7, 10, 15, 20, 25.

To find the 40th percentile:
Arrange the Data: The data is already in ascending order.
Determine the Position (i):
p=40
n=5
𝑖 = (40/100)×5 = 2

Locate the Percentile:
Since 𝑖=2 is an integer, the 40th percentile is the average of the values at positions 2 and 3.
Values at positions 2 and 3 are 10 and 15, respectively.
40thpercentile = (10 + 15)/2=12.5

Therefore, the 40th percentile of this data set is 12.5.

Question 32

Q

What are quartiles?

Answer

A

specific percentiles
first quartile = 25th percentile
second quartile = 50th percentile = median
third quartile = 75th percentile

Question 33

Q

What does measures of variability (dispersion) help up to understand?

Answer

A

how data points spread out from the centre (mean or median). This is useful in decision-making, such as evaluating supplier delivery times, stock price volatility, or quality control in manufacturing.

Question 34

Q

What are the 5 main measures of variability (dispersion)?

Answer

A

Range
Interquartile Range (IQR)
Variance
Standard Deviation
Coefficient of Variation (CV%)

Question 35

Q

What is the range?

Answer

A

The range of a data set is the difference between the largest and smallest data values.

It is the simplest measure of variability.

It is very sensitive to the smallest and largest data values.

Question 36

Q

How to calculate the range?

Answer

A

Range = largest value - smallest value

Question 37

Q

What is the interquartile range?

Answer

A

The interquartile range of a data set is the difference between the third quartile and the first quartile.

It is the range for the middle 50% of the data.

It overcomes the sensitivity to extreme data values.

Question 38

Q

How to calculate the interquartile range?

Answer

A

IQR = 3rd quartile - 1st quartile

Question 39

Q

How is a box plot drawn?

Answer

A

with its ends located at the 1st and 3rd quartiles

a vertical line is drawn in the box at the location of the median (second quartile)

Dashed lines are drawn from the ends of the box to the smallest and largest data values inside the limits.

Data outside these limits are considered outliers
The locations of each outlier is shown with the symbol * .

Question 40

Q

How to calculate the lower limit and upper limit for a box plot for outliers?

Answer

A

the lower limit is located 1.5(IQR) below Q1
the upper limit is located 1.5(IQR) above Q3

Question 41

Q

What is the variance?

Answer

A

The variance is the average of the squared differences between each data value and the mean.

The variance is a measure of variability that utilises all the data.

It is based on the difference between the value of each observation (xi) and the mean (𝑥̅ for a sample, µ for a population).

Question 42

Q

What is the variance equation?

Answer

A

sˆ2 = [ ∑(x_i - x̄)ˆ2]/ (n-1)
for a sample
x_i - each individual data point
x̄ - sample mean
n - sample size

σˆ2 = [ ∑(𝑥_𝑖 −µ)ˆ2]/ N
for a population
x_i - each individual data point
𝜇 - population mean
𝑁 - total number of data points in the population

Question 43

Q

What is the standard deviation?

Answer

A

set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily interpreted than the variance.

Question 44

Q

How to calculate standard deviation?

Answer

A

s = √sˆ2 = √[ ∑(x_i - x̄)ˆ2]/ (n-1)
for a sample
x_i - each individual data point
x̄ - sample mean
n - sample size

σ = √σˆ2 = √[ ∑(𝑥_𝑖 −µ)ˆ2]/ N
for a population
x_i - each individual data point
𝜇 - population mean
𝑁 - total number of data points in the population

Question 45

Q

What is the coefficient of variation?

Answer

A

how large the standard deviation is in relation to the mean

Question 46

Q

How do you calculate the coefficient of variation?

Answer

A

CV = (s/x̄) x 100%
for a sample
s - sample standard
x̄ - sample mean

CV = (σ/𝜇) x 100%
for a population
σ = population standard deviation
𝜇 = population mean

Question 47

Q

Show an example of variance, standard deviation and coefficient of variation linked together

Answer

A

Variance: 𝑠^2= (∑(𝑥_𝑖 − x̄)ˆ2 )/ (𝑛−1) = 2,996.16

Standard Deviation: 𝑠= √(𝑠ˆ2 )= √2996.16 = 54.74

Coefficient of variation: (s/x̄) x 100% =(54.74/490.84) x 100% = 11.15%

the standard deviation is about 11% of the mean

Question 48

Q

What are the 2 measures of association between 2 variables?

Answer

A

covariance
correlation coefficient

Question 49

Q

What is the covariance a measure of?

Answer

A

a measure of the linear association between two variables.

Positive values indicate a positive relationship. Negative values indicate a negative relationship.

Question 50

Q

How do you calculate the covariance?

Answer

A

𝑠_XY= [ ∑(𝑥_𝑖 − x̄)(y_i - ȳ)]/ (𝑛−1)

for samples
x_i, y_i - individual data points for variables
x̄, ȳ - means of variables X and Y
n - sample size

σ_XY = [ ∑(𝑥_𝑖 − µ_𝑌)(y_i - µ_𝑌)]/ 𝑛

for populations
µ_x, µ_y - populations means of X and Y
n - population size

Question 51

Q

What is the correlation coefficient?

Answer

A

quantifies the strength and direction of the linear relationship between two variables (not necessarily causation, just because two variables are highly correlated, it does not mean that one variable is the cause of the other)

The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear relationship.
Values near +1 indicate a strong positive linear relationship

Question 52

Q

How to calculate correlation coefficient?

Answer

A

r_XY = S_XY / (S_X)(S_Y)
= [ ∑(𝑥_𝑖 − x̄)(y_i - ȳ)] / √(∑(x_i - x̄)ˆ2)(∑(y_i - ȳ)ˆ2)

for samples
x_i, y_i - individual data points for variables
x̄, ȳ - means of variables X and Y
n - number of data points

p_XY = σ_XY/ (σ_X)(σ_Y)
= [ ∑(𝑥_𝑖 − μ_x)(y_i - μ_y)] / √(∑(x_i - μ_x)ˆ2)(∑(y_i - μ_y)ˆ2)

for populations
x_i, y_i - individual data points for variables X and Y
μ_x, μ_y - population means for X and Y
n - population size (number of data points)

Question 53

Q

What are the different correlation coefficients?

Answer

A

Positive Correlation: If r>0, as one variable increases, the other tends to increase.

Negative Correlation: If r<0, as one variable increases, the other tends to decrease.

No Correlation: If r=0, there is no linear relationship between the two variables.

Strength:
Strong: r near 1 or -1
Weak: r near 0

Question 54

Q

What makes correlation coefficients perfect?

Answer

A

Perfect Positive Correlation (r=1): A straight line with a positive slope (both variables increase together in perfect proportion).

Perfect Negative Correlation (r=−1): A straight line with a negative slope (one variable increases as the other decreases in perfect proportion).

No Correlation (r=0): No linear pattern in the data.

Question 55

Q

Example of covariance and correlation coefficient calculation linked together

Answer

A

Sample covariance: 𝑠_XY= [ ∑(𝑥_𝑖 − x̄)(y_i - ȳ)]/ (𝑛−1) = -35.4/ 6-1 = -7.08

Sample correlation coefficient: r_XY = S_XY / (S_X)(S_Y) = -7.08/ (8.2192)(0.8944) = -0.9631