Organizing, Visualizing and Describing Data Flashcards

1
Q

Calculate the Arithmetic mean (average)

A

Dividing the sum of all values in a data set by the number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Calculate the Sample mean

A

Dividing the sum of all samples by the number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Calculate and explain the Geometric mean

Why are there differences between the arithmetic and geometric mean?

A

The geometric mean is a way to find the average of a set of numbers that are multiplied together.

In simple terms, it is used to calc the effective rate per period of the holding period return.

To calculate the geometric mean, you need to follow these steps:

  1. IF there are negative numbers (%), turn all numbers into a proportion (N+1)
  2. Multiply all the numbers together.
  3. Take the nth root of the product, where n is the total number of values.
  4. IF you have turned numbers into a proportion, -1 from the result

Due to variability. If all the values were the same, the means would match. The higher variability, the higher the difference in the two means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Calculate the Harmonic mean

A

The Harmonic mean tells you the average price you would pay for a share of stock over multiple periods if you invested the same amount each period.

To calculate the harmonic mean, you need to follow these steps:

Add up the reciprocals of all the numbers.
Divide the total number of values by the sum obtained in step 1.
Let’s use an example to understand how the harmonic mean works. Suppose we have three numbers: 2, 4, and 8. To find the harmonic mean, we’ll first find the reciprocals of these numbers and then add them together:

1/2 + 1/4 + 1/8 = 7/8

Next, we divide the total number of values (which is 3 in this case) by the sum of the reciprocals:

3 ÷ (7/8) = 24/7 ≈ 3.43

  1. Divide the sum of the reciprocals by the number of reciprocals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the uses for the various means

A

Arithmetic mean. Estimate the next observation, expected value of a distribution.
Geometric mean. Compound rate of returns over multiple periods.
Trimmed mean. Estimate the mean without the effects of a given percentage of outliers.
Winsorized mean. Decrease the effect of outliers on the mean. a 95% winzorised mean will substitute the 2.5 % lowest and highest outliers for the 97.5 and 2.5 value
Harmonic mean. Calculate the average share cost from periodic purchases in a fixed dollar amount (same monetary amount)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Calculate the Weighted mean

A

Calculating the weighted average involves:

  1. Multiplying each data point by its weight and summing those products.
  2. Then sum the weights for all data points.
  3. Finally, divide the weight*value products by the sum of the weight

EG Portfolio, Stocks are 50% and the return is 12%

0.5 x 12 To get the weight for stocks. Then sum to all other areas (bonds etc - that is the answer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Quantile? and What are the different Quantiles

A

Quantile is the general term for a value at or below which a stated proportion of the data in a distribution lies. Examples of quantiles include the following:

Quartile. The distribution is divided into quarters.
Quintile. The distribution is divided into fifths.
Decile. The distribution is divided into tenths.
Percentile. The distribution is divided into hundredths (percents).

Quantiles can also be expressed as percentiles

The difference between the 1st and 2nd quantile is the interquartile range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Calculate the position of the observation at a given percentile

What do you need to remember

A
  1. Ly=(n+1)y/100
  2. For results such as 8.4
  • count to the 8th value
  • subtract the 8th value from the 9th value and times it by .4

Sum the two figures above

n = number of observations

y = quantile / percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Calculate ‘Range’

A

Maximum Value - Minimum Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Calculate the Mean of Absolute Deviation

A

The mean absolute deviation (MAD) is the average of the absolute values of the deviations of individual observations from the arithmetic mean.

  1. Calculate the mean
  2. Subtract all values (absolute values) from the mean
  3. Sum those values together
  4. Divide by the number of observations

aka: Find the average of differences from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Calculate Sample Variance

I think this is the same for population variance too?

A
  1. Calculate the mean
  2. Subtract the mean from the values e.g 11 - mean
  3. Take the sum of the squared differences from the mean
  4. Divide by n-1

e.g:
10, 15, 20
n = 3
Mean = 15

(10-15)2 + (15-15)2 + (20,15)2 =

25 + 0 + 25 = 50

50 / 2 = 25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Calculate sample standard deviation

A

It is the Square Root of the Sample Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain how issues arise comparing two measures of dispersion

A

A direct comparison between two or more measures of dispersion may be difficult. For instance, suppose you are comparing the annual returns distribution for retail stocks with a mean of 8% and an annual returns distribution for a real estate portfolio with a mean of 16%. A direct comparison between the dispersion of the two distributions is not meaningful because of the relatively large difference in their means. To make a meaningful comparison, a relative measure of dispersion must be used. Relative dispersion is the amount of variability in a distribution relative to a reference point or benchmark. Relative dispersion is commonly measured with the coefficient of variation (CV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Calculate relative dispersion / Coefficient of variation

How do you interpret the results?

A

Standard Deviation of X

/

Average Value of X

Higher value = Higher dispersion (risk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Calculate and interpret target downside deviation

A

The calculation is the same as standard deviation, but you ONLY include values below the ‘target’ value

Remember to still include all values in the division (e.g n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe Skewness, and explain the effects on the: Mean, Median, Mode

A

For a symmetrical distribution, the mean = median = mode.

For a positively skewed, unimodal distribution, the mode Mean > Median > Mode . The mean is affected by outliers; in a positively skewed distribution, there are large, positive outliers, which will tend to pull the mean upward, or more positive. An example of a positively skewed distribution is that of housing prices. Suppose you live in a neighborhood with 100 homes; 99 of them sell for $100,000 and one sells for $1,000,000. The median and the mode will be $100,000, but the mean will be $109,000. Hence, the mean has been pulled upward (to the right) by the existence of one home (outlier) in the neighborhood.

For a negatively skewed, unimodal distribution, the Mean < Median < Mode. In this case, there are large, negative outliers that tend to pull the mean downward (to the left).

17
Q

Explain Kurtosis, and the three different types of Kurtosis

A

Kurtosis is a measure of the degree to which a distribution is more or less peaked than a normal distribution. It is a shape parameter.

Leptokurtic describes a distribution that is more peaked than a normal distribution
Platykurtic refers to a distribution that is less peaked, or flatter than a normal distribution
Mesokurtic if it has the same kurtosis as a normal distribution

To interpret kurtosis, note that it is measured relative to the kurtosis of a normal distribution, which is 3. Positive values of excess kurtosis indicate a distribution that is leptokurtic (more peaked, fat tails), whereas negative values indicate a platykurtic distribution (less peaked, thin tails). We can calculate kurtosis relative to that of a normal distribution as:

excess kurtosis = sample kurtosis − 3

18
Q

What is time series, cross section and panel data

A

Time Series - across time. EG EPS across quarters.

EPS for 12 firms for last q is cross sectional

Panel = both

19
Q

What is numerical and categorical data?

A

Numerical =- numbers, can perform calcs

Categorical= not numbers?

20
Q

Structured vs unstructured data?

A

Structured
Can perform analysis on it. E.G COGS

Unstructured
Text, audio, video

21
Q

Discrete vs continuous?

What does this data relate to?

A

Discrete = numerical that can only go up a certain number of values

Continuous = numerical that can be infinite

Numerical data

22
Q

Nominal vs ordinal?

What does this data relate to?

A

Nominal = names where order doesn’t make sense

ordinal = data can be grouped into a logical order.

Relates to categorical data

23
Q

what is the mean, median and mode?

what is a benefit of using the median as opposed to the mean?

A

Mean - average
median - middle number
mode - most frequent outcome (can be multiple - outcomes with equal occurance, ‘bi-modal’ one mode = unimodal)

benefit: median is less affected by extreme values

24
Q

What are the different types of frequencies?

Remember what for cumulative frequency?

A

Frequency - count of value in a single bracket
Relative frequency - Frequency as a % of all values

Cumulative Frequency - count of value in a bracket and in all brackets below.
Cumulative Relative Frequency - Cumulative frequency as a %
Remember: Take all values ‘less than’ the highest number?
Remember: Z tables are cumulative relative frequencies

25
Q

What is a contingency table?

What are the totals on a frequency table called?

A

Two dimensional array that displays the joint frequencies of two variables

E.g street names and days (car accidents).

Totals: Marginal frequencies

26
Q

How do you calculate the nth root on a BA II plus?

A
  1. Input the number you want to nth root
  2. y X -> enter nth root
  3. 1 / X
  4. -1 (if you are calculating geometric mean).
27
Q

Explain what a box and whisker plot is giving you

A
  • The range
  • The central thing ‘interquartile range (1st -> 3rd quartile)
  • Mean
28
Q

What is a shortcut for calculating the harmonic mean

A

Do n -> 1/x + n -> 1/x

29
Q

What is skew? What values of skew are significant?

A

It is a measure of the degree to which a distribution lacks symetry.

A symetrical distribution has skew = 0

absolute values above 0.5

30
Q

What is Kurtosis?

What is the kurtosis of a normal distribution?

What does positive excess kurtosis mean?

What is the significance of fat tails? What type of kurtosis is this?

What is excess kurtosis?

A

Measures the degree to which a distribution is more or less peaked than normal distribution

3

This means the tails will be fatter (leptokurtic)

This means the tails will be thinner (platykurtic)

It is kurtosis -3

31
Q
A