Week 2 Flashcards

1
Q

What is the arithmetic mean?

A
  • The arithmetic mean of a set of data is the sum of the data values divided by the number of observations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the median?

A

The middle observation of a set of observations that are arranged in increasing (or decreasing) order
If the sample size is an even number, the median is average of the two middle observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the mode?

A

The most frequently occurring value
1 mode = unimodal
2 modes = bimodal
3+ modes = multimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of these can best describe categorical data?

A

Categorical data is best described by the median or mode, not the mean.
However, the mode may not represent the true center of the numerical data. For this reason, the mode is used less frequently than either the mean or the median in business applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What data is best described by the mean?

A
  • Numerical data
  • However, in addition to the type of data, another factor to consider is the presence of outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is skewness?

A

Skewness is the degree of asymmetry observed in a probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is something skewed?

A

When data points on a bell curve are not distributed symmetrically to the left and right sides of the median, the bell curve is skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the types of skewness?

A
  • Distributions can be positive and right-skewed, or negative and left-skewed
  • A normal distribution exhibits zero skewness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do the different types of skewness mean?

A
  • Negative or left-skewed refers to a longer or fatter tail on the left side of the distribution, while positive, or right-skewed refers to a longer or fatter tail on the right
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens with the mean and median when the data is positively skewed?

A

The mean of positively skewed data will be greater than the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What about negatively skewed data?

A

The mean of the negatively skewed data will be less than the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When the distribution is right-skewed, what do we know about the mean?

A
  • A right-side or positive distribution means its tail is more pronounced on the right side than on the left
  • Since the distribution is positive, the assumption is that its value is positive
  • As such, most of the values end up on the left of the mean
  • This means that some of the most extreme values are on the right side
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What about the left?

A
  • Negative or left-skewed means the tail is more pronounced on the left rather than the right
  • ## Most values are found on the right side of the mean in negative skewness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you measure skewness?

A

There are 2 methods to measuring skewness - Pearson’s first and second coefficients of Skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Pearson’s first coefficient of skewness?

A

Subtracts the mode from the mean and divides the difference by the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Pearson’s second coefficient of skewness?

A

Subtracts the median from the mean, multiplies the difference by 3 and divides the product by the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When would you use the 2 different coefficients?

A
  • Pearson’s first coefficient is used if the data exhibits a strong mode
  • Pearson’s second coefficient is used may be preferable if the data has a weak mode or multiple modes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does skewness tell investors?

A
  • Investors value skewness because it highlights extremes, which are important for short- and medium-term decisions.
  • Unlike standard deviation, skewness doesn’t assume a normal distribution, making it better for predicting returns. - - Skewness risk arises when models underestimate the chance of extreme outcomes in skewed data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the geometric mean?

A

‘The nth root product of n numbers’
- Unlike the arithmetic mean, which adds values and divides by the number of values, the geometric mean multiplies them and then takes the nth root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why is the geometric mean useful?

A

It is very useful for calculating portfolio performance, because it takes into account compound interest
The calculation is based solely on the return figures and provides a direct, “apples-to-apples” comparison when evaluating two investment options across multiple time periods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are percentiles and quartiles?

A

These are measures that indicate the location, or position, of a value relative to the entire set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why are percentiles and quartiles used?

A

They are generally used to describe large data sets, for example surveys covering a nation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you find percentiles?

A

Data must be arranged in order from the smallest to the.largest values
The ‘P’th percentile is a value such that approximately P% of the observations are at or below that number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Percentile formula?

A

Rank = P / 100 * (N + 1)
P = desired percentile
N = Number of data points in your set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are quartiles?

A
  • Descriptive measures that separate large data sets into four quarters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How are quartiles created?

A

The first quartile, Q1, separates approx the smallest 25% of the data
Q2 separates 50% (Median) and Q3 separates 75% of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the five-number-summary?

A
  • A simple way to describe the distribution of a data set
  • Consists of:
    • Minimum, Q1, Median (Q2), Q3, Maximum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the range?

A

Difference between largest and smallest observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why is the range important?

A
  • It shows data spread
  • Quick summary of variability
  • Identifies outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the interquartile range?

A
  • The IQR shows the spread of the middle 50% of the data set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do you calculate IQR?

A

Q3 - Q1

32
Q

Why is the IQR important?

A
  • Measures spread: helps us understand how spread out the central portion of the data is, without being affected by extreme values
  • Robust to outliers
  • Helps detect outliers
33
Q

What is a box and whisker plot?

A

A graph that describes the shape of a distribution in terms of the five number summary

34
Q

How does a box and whisker plot work?

A

The Box:
- Spans from the first quartile to the third quartile (IQR)
- The line inside the box indicates the median
Whiskers:
- Extend from the edges of the box to the minimum and maximum values within 1.5 times the IQR
- Any points beyond whiskers are considered outliers

35
Q

How would we interpret the length of a box in a B&W plot?

A
  • Shows how spread out the middle of the data is.
  • A longer box means more variability in data
36
Q

What can we derive from the length of the whiskers?

A

The length of the whiskers gives an idea of the spread of the data outside the interquartile range
Short whiskers indicates that most data points are close to Q1 and Q3, while longer whiskers suggest a broader range

37
Q

Can you tell symmetry/skewness from a box plot?

A
  • Yes
  • If the box and whiskers are roughly symmetric, the data is likely evenly distributed
  • If the box is skewed to one side or the whiskers are uneven, it indicates skewness in the data
38
Q

Why are B&W plots useful?

A
  • Summarises data visually
  • Identifies outliers
  • Compares multiple data sets
39
Q

What is variance?

A
  • Variance tells you how far, on average, each data point is from the mean of the data set.
40
Q

What is population variance?

A
  • Measure of variability for an entire population
41
Q

What is sample variance?

A
  • Measure of variability within a sample drawn from a larger population
  • This gives a level of bias, so to correct this, we divide by n - 1 instead of n, which gives a slightly larger variance (Bessels correction)
42
Q

How do you calculate population variance?

A
  • Find the mean
  • Subtract the mean from each data point
  • Square each deviation
  • Sum the squared deviations
  • Divide by the total number of data points
43
Q

How do you calculate sample variance?

A
  • Find the sample mean
  • Subtract the mean from each data point
  • Square each deviation
  • Sum the squared deviations
  • Divide by n - 1
44
Q

What is standard deviation?

A
  • This is the square root of variance
  • The standard deviation measures the average spread around the mean
45
Q

What can we derive from standard deviation?

A
  • A low standard deviation means that the data points tend to be closer to the mean
  • A high standard deviation indicates that the data points are spread out over a wider range of values
46
Q

How do you calculate standard deviation?

A
  • Calculate the mean
  • Subtract the mean from each data point and square the result
  • Find the average of these squared differences
  • Take the square root of this average
47
Q

What is the Mean Absolute Variation?

A
  • MAD is a measure of the average distance between each data point and the mean of the dataset
  • Unlike SD, it focuses on the absolute differences between each value and the mean, making it simpler and less sensitive to extreme values
48
Q

What is the coefficient of variation?

A
  • Measure of relative variability
  • Compares the standard deviation to the mean
  • Shows how much variation exists in relation to the dataset
  • Expressed as a percentage
49
Q

How do you calculate the coefficient of variation?

A

(Standard deviation / mean) x 100

50
Q

What are the uses of CV?

A
  • Comparing variability in datasets that have different units or scales (like comparing prices of products in different currencies)
  • Assessing relative risk (comparing the risk of two assets with different expected returns)
51
Q

Example of CV?

A

If asset A has a CV of 15% and asset B has a CV of 5%, Asset A is considered riskier relative to its mean return compared to asset B

52
Q

What is Kurtosis?

A
  • A statistical measure that describes the shape of a distribution’s tails in relation to its overall shape.
  • It helps to understand how data is concentrated around the mean and whether there are extreme outliers or not
53
Q

What are the different types of kurtosis?

A
  • Mesokurtic
  • Leptokurtic
  • Platykurtic
54
Q

What does mesokurtic mean?

A

A normal distribution (bell curve) has a kurtosis of 3. Distributions with kurtosis near 3 are called mesokurtic

55
Q

What is leptokurtic?

A

Distributions with kurtosis > 3 are called leptokurtic. They have heavy tails and more extreme outliers, meaning data points are concentrated more in the tails and around the mean

56
Q

What is platykurtic?

A

Distributions with Kurtosis < 3 are called platykurtic. These have higher tails, meaning the distribution is more spread out and there are fewer extreme values or outliers compared to a normal distribution

57
Q

How do you find kurtosis?

A
  • Find the mean of the data
  • Subtract the mean from each data point
  • Raise each deviation to the power of 4
  • Sum these values
  • Divide the sum by the number of data points
  • Divide this by the square of the variance
  • Subtract 3
58
Q

What is covariance?

A

A measure of how 2 random variables change together. It indicates the direction of relationship between them

59
Q

What happens if covariance is positive?

A

Means that when one variable increases, the other tends to increase as well. They move in the same direction

60
Q

What happens if covariance is negative?

A

means that when one variable increases, the other tends to decrease. They move in opposite directions

61
Q

How do you calculate covariance?

A
  • Find the mean of both variables
  • Subtract the mean from each value
  • Multiply the deviations for each pair of data points ( X * Y)
  • ## Find the average of these products
62
Q

What is the correlation coefficient?

A

Measure that describes the strength and direction of a relationship between 2 variables
It tells you how closely the data points in a scatter plot fit a straight line and whether the variables move together or in opposite directions.

63
Q

What does the correlation coefficient range from?

A

-1 and 1
r = 1 : Perfect positive correlation (as one variable increases, so does the other)
r = 0 : No correlation
r = -1: Perfect negative correlation (as one variable increases, the other decreases

64
Q

How do you find population correlation coefficient?

A
  • Find the population mean
  • Calculate the deviations
  • Multiply the deviations of X and Y for each data point
  • Find the covariance between X and Y
  • Find the population standard deviations
  • Divide the covariance by the product of the standard deviations
65
Q

How do you calculate the average in excel?

A

=AVERAGE

66
Q

How do you find the geometric mean in excel?

A

=GEOMEAN

67
Q

How do you find the median in excel?

A

=MEDIAN

68
Q

How do you find the mode in excel?

A

=MODE

69
Q

How do you find variance based on a sample in excel?

A

=VAR.S

70
Q

How do you find population variance in excel?

A

=VAR.P

71
Q

How do you find sample standard deviation in excel?

A

=STDEV.S

72
Q

How do you find population standard deviation in excel?

A

=STDEV.P

73
Q

How do you find MAD in excel?

A

=AVEDEV

74
Q

How do you find skewness in excel?

A

=SKEW

75
Q

How do you find kurtosis in excel?

A

=KURT

76
Q

How do you find covariance in excel?

A

=COVARIANCE

77
Q

How do you find correlation coefficient in excel?

A

=CORREL