Week 12 - Descriptive statistics Flashcards

1
Q

What are numerical measures of descriptive statistics?

A

measures of central tendency (location) and measures of dispersion (variability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are sample statistics?

A

If the measures are computed for data from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are population parameters?

A

If the measures are computed for data from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a sample statistic referred to?

A

as the point estimator of the corresponding population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 7 measures of location?

A
  1. Mean
  2. Median
  3. Mode
  4. Weighted Mean
  5. Geometric Mean
  6. Percentiles
  7. Quartiles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mean of a data set?

A

the average of all the data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the sample mean?

A

The sample mean xΜ„ is a point estimate of the population mean m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the mean equation?

A

xΜ„ = βˆ‘x_i/ n

numerator - sum of the values of the n observations
denominator - number of observations in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the median of a data set?

A

is the value in the middle when the data items are arranged in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is the mean the preferred measure of central location?

A

Whenever a data set has extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is the median most often reported for out of the measure of location?

A

annual income and property value data
A few extremely large incomes or property values can inflate the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we calculate the mean for an odd number of observations?

A

Say we have the following 7 observations:
Sort them in ascending order:
Median is the middle value: 19

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we calculate the mean for an even number of observations?

A

Even number of observations:
Say we have 8 observations:
Sort them in ascending order:

Median is the average of the middle two values: (19 + 26)/2 = 22.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Where are the mean and median on a symmetrical diagram?

A

equal at the middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where are the mean and median on a left skew diagram?

A

mode is at the top, going down the tail is median then mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where are the mean and median on a right skew diagram?

A

mode is at the top, going down the tail is median then mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the mode?

A

The mode of a data set is the value that occurs with greatest frequency.
The greatest frequency can occur at two or more different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is bimodal data?

A

If the data have exactly two modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is multimodal data?

A

If the data have more than two modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is tthe weighted mean?

A

When the mean is computed by giving each data value a weight that reflects its importance

When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the weighted mean equation?

A

π‘₯Μ…= (βˆ‘ 𝑀_𝑖 x π‘₯_𝑖)/ (βˆ‘π‘€_𝑖 )

x_i = value of observation i
w_i = weight for observation i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is value weighted?

A

a type of weighted mean where the weights are based on the values themselves rather than being assigned separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is equal weighted return?

A

imple average of all returns, giving each asset or component the same importance, regardless of size or value. This is in contrast to a value-weighted return, where larger values (e.g., market capitalization) carry more weight

24
Q

What is value weighted return equation?

A

value_x x r_x + value_y x r_y / value_x + value_y

25
Q

What is equal weighted return equation?

A

=βˆ‘X_i / n

X_i = individual returns
n = number of assets or components

26
Q

What is a portfolio return?

A

A portfolio return is the weighted average return of individual assets in the portfolio

usually equal the value weighted return

27
Q

When is the geometric mean most appropriate to use?

A

most appropriate in situations where the data items to be summarised result from a ratio-type calculation, such as with growth rates or index numbers

calculated by multiplying all the numbers together and then taking the nth root of the product, where n is the total number of values

28
Q

What is a percentile?

A

provides information about how the data are spread over the interval from the smallest value to the largest value

Admission test scores for colleges and universities are frequently reported in terms of percentiles

29
Q

What is the pˆth percentile of a data set?

A

a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.

10th percentile of a data set is a value such that at least 10% of the items are less than or equal to 90% of the items

30
Q

How to calculate a percentile?

A

Arrange the Data: Sort the data set in ascending order.

Determine the Position (i):
Calculate the position using the formula:
​𝑖 = (p/100) x n where p is the desired percentile and n the number of observations

Locate the Percentile:
If 𝑖 is an integer, the p-th percentile is the average of the values at positions 𝑖 and 𝑖 +1
If 𝑖 is not an integer, round up to the next whole number, and the p-th percentile is the value at this position.

31
Q

Example of percentile calculation

A

Consider a data set: 7, 10, 15, 20, 25.

To find the 40th percentile:
Arrange the Data: The data is already in ascending order.
Determine the Position (i):
p=40
n=5
𝑖 = (40/100)Γ—5 = 2

Locate the Percentile:
Since 𝑖=2 is an integer, the 40th percentile is the average of the values at positions 2 and 3.
Values at positions 2 and 3 are 10 and 15, respectively.
40thpercentile = (10 + 15)/2=12.5

Therefore, the 40th percentile of this data set is 12.5.

32
Q

What are quartiles?

A

specific percentiles
first quartile = 25th percentile
second quartile = 50th percentile = median
third quartile = 75th percentile

33
Q

What does measures of variability (dispersion) help up to understand?

A

how data points spread out from the centre (mean or median). This is useful in decision-making, such as evaluating supplier delivery times, stock price volatility, or quality control in manufacturing.

34
Q

What are the 5 main measures of variability (dispersion)?

A
  1. Range
  2. Interquartile Range (IQR)
  3. Variance
  4. Standard Deviation
  5. Coefficient of Variation (CV%)
35
Q

What is the range?

A

The range of a data set is the difference between the largest and smallest data values.

It is the simplest measure of variability.

It is very sensitive to the smallest and largest data values.

36
Q

How to calculate the range?

A

Range = largest value - smallest value

37
Q

What is the interquartile range?

A

The interquartile range of a data set is the difference between the third quartile and the first quartile.

It is the range for the middle 50% of the data.

It overcomes the sensitivity to extreme data values.

38
Q

How to calculate the interquartile range?

A

IQR = 3rd quartile - 1st quartile

39
Q

How is a box plot drawn?

A

with its ends located at the 1st and 3rd quartiles

a vertical line is drawn in the box at the location of the median (second quartile)

Dashed lines are drawn from the ends of the box to the smallest and largest data values inside the limits.

Data outside these limits are considered outliers
The locations of each outlier is shown with the symbol * .

40
Q

How to calculate the lower limit and upper limit for a box plot for outliers?

A

the lower limit is located 1.5(IQR) below Q1
the upper limit is located 1.5(IQR) above Q3

41
Q

What is the variance?

A

The variance is the average of the squared differences between each data value and the mean.

The variance is a measure of variability that utilises all the data.

It is based on the difference between the value of each observation (xi) and the mean (π‘₯Μ… for a sample, Β΅ for a population).

42
Q

What is the variance equation?

A

sΛ†2 = [ βˆ‘(x_i - xΜ„)Λ†2]/ (n-1)
for a sample
x_i - each individual data point
xΜ„ - sample mean
n - sample size

ΟƒΛ†2 = [ βˆ‘(π‘₯_𝑖 βˆ’Β΅)Λ†2]/ N
for a population
x_i - each individual data point
πœ‡ - population mean
𝑁 - total number of data points in the population

43
Q

What is the standard deviation?

A

set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily interpreted than the variance.

44
Q

How to calculate standard deviation?

A

s = √sΛ†2 = √[ βˆ‘(x_i - xΜ„)Λ†2]/ (n-1)
for a sample
x_i - each individual data point
xΜ„ - sample mean
n - sample size

Οƒ = βˆšΟƒΛ†2 = √[ βˆ‘(π‘₯_𝑖 βˆ’Β΅)Λ†2]/ N
for a population
x_i - each individual data point
πœ‡ - population mean
𝑁 - total number of data points in the population

45
Q

What is the coefficient of variation?

A

how large the standard deviation is in relation to the mean

46
Q

How do you calculate the coefficient of variation?

A

CV = (s/xΜ„) x 100%
for a sample
s - sample standard
xΜ„ - sample mean

CV = (Οƒ/πœ‡) x 100%
for a population
Οƒ = population standard deviation
πœ‡ = population mean

47
Q

Show an example of variance, standard deviation and coefficient of variation linked together

A

Variance: 𝑠^2= (βˆ‘(π‘₯_𝑖 βˆ’ xΜ„)Λ†2 )/ (π‘›βˆ’1) = 2,996.16

Standard Deviation: 𝑠= √(𝑠ˆ2 )= √2996.16 = 54.74

Coefficient of variation: (s/xΜ„) x 100% =(54.74/490.84) x 100% = 11.15%

the standard deviation is about 11% of the mean

48
Q

What are the 2 measures of association between 2 variables?

A
  1. covariance
  2. correlation coefficient
49
Q

What is the covariance a measure of?

A

a measure of the linear association between two variables.

Positive values indicate a positive relationship. Negative values indicate a negative relationship.

50
Q

How do you calculate the covariance?

A

𝑠_XY= [ βˆ‘(π‘₯_𝑖 βˆ’ xΜ„)(y_i - Θ³)]/ (π‘›βˆ’1)

for samples
​x_i, y_i - individual data points for variables
xΜ„, Θ³ - means of variables X and Y
n - sample size

Οƒ_XY = [ βˆ‘(π‘₯_𝑖 βˆ’ Β΅_π‘Œ)(y_i - Β΅_π‘Œ)]/ 𝑛

for populations
Β΅_x, Β΅_y - populations means of X and Y
n - population size

51
Q

What is the correlation coefficient?

A

quantifies the strength and direction of the linear relationship between two variables (not necessarily causation, just because two variables are highly correlated, it does not mean that one variable is the cause of the other)

The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear relationship.
Values near +1 indicate a strong positive linear relationship

52
Q

How to calculate correlation coefficient?

A

r_XY = S_XY / (S_X)(S_Y)
= [ βˆ‘(π‘₯_𝑖 βˆ’ xΜ„)(y_i - Θ³)] / √(βˆ‘(x_i - xΜ„)Λ†2)(βˆ‘(y_i - Θ³)Λ†2)

for samples
x_i​, y_i - individual data points for variables
xΜ„, Θ³ - means of variables X and Y
n - number of data points

p_XY = Οƒ_XY/ (Οƒ_X)(Οƒ_Y)
= [ βˆ‘(π‘₯_𝑖 βˆ’ ΞΌ_x)(y_i - ΞΌ_y)] / √(βˆ‘(x_i - ΞΌ_x)Λ†2)(βˆ‘(y_i - ΞΌ_y)Λ†2)

for populations
x_i, y_i​ - individual data points for variables X and Y
ΞΌ_x, ΞΌ_y - population means for X and Y
n - population size (number of data points)

53
Q

What are the different correlation coefficients?

A

Positive Correlation: If r>0, as one variable increases, the other tends to increase.

Negative Correlation: If r<0, as one variable increases, the other tends to decrease.

No Correlation: If r=0, there is no linear relationship between the two variables.

Strength:
Strong: r near 1 or -1
Weak: r near 0

54
Q

What makes correlation coefficients perfect?

A

Perfect Positive Correlation (r=1): A straight line with a positive slope (both variables increase together in perfect proportion).

Perfect Negative Correlation (r=βˆ’1): A straight line with a negative slope (one variable increases as the other decreases in perfect proportion).

No Correlation (r=0): No linear pattern in the data.

55
Q

Example of covariance and correlation coefficient calculation linked together

A

Sample covariance: 𝑠_XY= [ βˆ‘(π‘₯_𝑖 βˆ’ xΜ„)(y_i - Θ³)]/ (π‘›βˆ’1) = -35.4/ 6-1 = -7.08

Sample correlation coefficient: r_XY = S_XY / (S_X)(S_Y) = -7.08/ (8.2192)(0.8944) = -0.9631