Quantitative methods (legacy PREREQ1-LM1) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are the 3 rules of money?

A
  1. Money soon is worth more than money later
  2. Larger cash flows are worth more than smaller
  3. Less risky cash flows are worth more than more risky
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 ways of thinking about interest rates?

A
  1. Required rate of return: RoR required by an investor ot lender.
    Money today * (1 + r) = money tomorrow
  2. Discount rate: rate at which some future value is discounted to arrive at a value today
    Money tomorrow / (1 + r) = Money today
  3. Opportunity cost: the value an investor or lender forgoes by chhoosing a particular action.
    I.e., r is the opportunity cost of current consumption

Typically required rate of return = discount rate = opportunity cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What 4 premiums will be built into the rate of return if I lend someone money, on top of the risk-free rate?

A
  1. Inflation premium: compensates for expected inflation ( π^e)
  2. Default risk premium: compensates lender for credit risk
  3. Liquidity premium: compensation for risk of loss versus fair value if an investment needs to be converted to cash quickly
  4. Maturity premium: greater interest rate risk (i.e., price risk) with longer maturities. This is because as yields increase, bond price increases. So if yields increase, your bond may be devalued.
    This will also include a premium for inflation.
    It is ultimately due to uncertainty: the longer the time period, the more uncertain we are about the level of expected inflation

Ideally these would be multiplicative rather than additive, but additive is just fine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the nominal risk-free rate?

A

r⌄f + π^e = nominal risk-free rate

Where r⌄f is the risk-free rate
and π^e is the inflation premium

The nominal risk-free rate might be measured by something like the return on a US Treasury 3-month T-bill
It build in an inflation premium as well as the underlying risk-free rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean to say that r must be in the same periodicity as N when calculating the future value of a single cash flow?

A

r represents the interest rate, N represents the number of periods

If the interest rate was 6% per year over 10 years with an annual periodicity, the final nominal value is 100(1.06)^10

If it had semi-annual periodicity it would be 100(1.03)^20

If it had quarterly periodicity it would be 100(1.015)^40

These will result in different values so we need to match the periodicity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we calculate FV?

A

Future Value = Present Value x (1 + r)^N

Where r = interest rate
N = number of periods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is simple interest?

A

Interest calculated on the original amount
Contrasted to compounded interest, which is calculated on the amount from the last period

i.e., 5% interest on £1000 over 20 years would return (0.05 * 1000 * 20) = £2000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you calculate future value of £10m you receive in 5yrs and invest at 9% RoR for 10 years?

A

It doesn’t matter when you receive it, it is still money invested for 5 years.
Method 1: FV = 10m(1.09)^10 = 23.7m
Method 2: N=10, I/Y = 9, PMT = 0
PV = -10m
CPT FV = 23.7m

To calculate value of the 10m today using this interest rate, we can discount it by (1.09)^5 and divide 10m by this amount
10m / (1.09)^5 = 6.5m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are interest rates stated?

A

Rates are ALWAYS quoted annually
That means if you see a 3-month T-bill yielding 3%, you do not get 3%, only 1/4 of 3% across the 3 months (which is 0.75%)

r⌄s = stated interest rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we calculate value of $1m held over 1 year with a rate of 3% that is compounded monthly?

A

FV = 1m (1 + (6% / 12)) ^ (12 x 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is continuous compounding?

A

This is really just an easier way of calculating or implementing the idea of daily compounding, which can get clunky to use (dividing rates by 365)

We have to use Euler’s constant, e, for continuous compounding. We multiply the present value by e to the power of rate x number of periouds

FV = PV x e ^(r x N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we press on the calculator to calculate using continuous compounding the future value of 50,000 at an interest rate of 7% held for 3 years?

A

On the calculator we do:
0.07 x 3 = 2nd function, LN x 50 000
We must use the equals because 0.07 x 3 should effectively be in brackets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to calculate stated rate if we know EAR?

A

If we know Effective Annual Rate we can work backwards to find the effective annual rate when we also know the periodicity.

Let’s say that we have an EAR of 10%

0.1 = (1 + rs /12)^12 - 1
1.1 = (1 + rs/12)^12
(1.1)^1/12 = 1 + rs/12
(1.1)^1/12 -1 = rs/12
((1.1)^1/12 - 1) x 12 = rs
0.0957 = 9.57% = rs

9.57% = stated rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we calculate stated rate if we have EAR using continuous compounding?

EAR = 5.5%

A

0.055 = e^rs - 1
1.055 = e^rs
ln(1.055) = rs
0.0535 = rs
5.35% = rs = stated return

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an annuity?

A

A finite set of level sequential cash flows

Something cannot be an annuity if:
- the cash flow differs
- some years are missed out
- the cash flows do not have an end date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an ordinary annuity?
What is an annuity due?

A

Ordinary annuity: where the first cash flow happens at the end of the first year
Annuity due: where the first cash flow happens at the beginning of the first year

Important because the cash can earn interest over the period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you calculate the future value of an ordinary annuity?

A

Enter number of years and press N
Enter payment amount and press PMT
Enter rate and press I/Y
Enter present value and press PV
Press CPT FV to calculate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you calculate the future value of an annuity due?

A

An annuity due starts paying in from the beginning of the first year, rather than the end
This means that interest can accrue over the year starting from t=0
The first payment out can therefore grow for the entire duration of the annuity, rather than n-1

You can calculate the value of an annuity due by just calculating the value of an ordinary annuity and multiply it by (1+r) to account for that additional year of compounding

You could also enter Begin mode (BGN) on your financial calculator to perform the annuity due calculation. However it might be sometimes inconvenient or lead to errors if you keep flipping back and forth. Therefore MM keeps his calculator in END mode and just multiplies at the end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you calculate the future value of unequal cash flows?

A

It can be calculated manually or using the calculator functions
Calculating it manually involves multiplying each annuity payment by the number of years it has to gain interest. Then adding these together.

We can use the calculator function NFV to find the future value of a series of cashflows. However not all calculators have this function. If they do, it works the same as the NPV function in terms of inputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you calculate the present value of a single cash flow?

A

Multiply the single cash flow (the future value) by:
(1 + r)^-N

Which is the same as:
FV / (1 + r)^N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you calculate the present value of a series of cash flows for an ordinary annuity?

A

PV = A [(1 - 1 / {1 + r}^N) / r ]

Where PV = Present Value
A = Annuity Due
r = rate of return / interest rate
N = number of years

i.e., if A = 1000 (payment in per year)
r = 0.07
N = 6

PV = 1000 * [(1 - 1 / 1.07^6 ) / 0.07)]
PV = 4 767

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you calculate the present value of a series of cash flows for an annuity due?

A

It is the initial payment in plus the PV of a series of cash flows for an ordinary annuity for N - 1 years of the annuity

I.e. you would add

PV = A + A [(1 - 1 / {1 + r}^[N-1]) / r ]

Where PV = Present Value
A = Annuity Due
r = rate of return / interest rate
N = number of years

You can add the time value of money keys to calculate that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a perpetuity?

A

An annuity that pays out forever.

The cash flows from a perpetuity are:
- level
- sequential
- infinite

We can find the present value of a perpetuity by dividing the amount the perpetuity pays per year by r
where r is the discount rate

I.e., if our perpetuity pays out £100 per year and the discount rate is 5%, the present value is 100/0.05 = £2000

Because of the constant discounting as time progresses, no matter what time you consider the perpetuity to start it will always have the same value, if you take the discounting into account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can we create a 7-year annuity from perpetuities?

A

We find 2 perpetuities that are identically matched. They pay out the same amount each period. However, one starts at t=0 and another at t=7. We go long the first one and short the second.

Until t=7, we are only exposed to cash flows from the long perpetuity. This gives us the annuity payments. When t=7 begins, we pay the perpetuity short using the cash from the perpetuity long. These balance out, and we are left with net zero cash flows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do we calculate the present value of a series of unequal cash flows using the NPV function on a financial calculator?

A

First clear the calculator pressing 2nd CF 2nd CE/C in order.

Then your screen will say CF0. This is the cash flow at the beginning.
Where there is no value press 0 and then the down arrow
In years where there is a value write the amount, press enter, and down arrow twice
At the last cash flow, press the down arrow once and then hit NPV
Then I will be displayed, This is the discount rate. Write a number, then press enter and the down arrow.
Then press NPV and CPT

Your value will be displayed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why might you calculate the present value of a series of unequal cash flows manually rather than using the calculator’s NPV function

A

It’s almost the same number of keystrokes
You could simply divide each amount by 1+ the discount rate to the power of the number of years of discounting

i.e.,10 000 / (1.04)
+ 20 000 / (1.04)^2
+ 30 000 / (1.04)^3
= PV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do we determine a growth rate given FV, PV, and N?

A

r = (FV/PV)^(1/N) - 1

I.e., let’s say future value = 2 000 000
present value = 450 000
N = 20 years

(2 000 000 / 450 000)^0.05 - 1 = 0.077 = 7.7%

By contrast, if FV = 1 500 000, PV = 550 000, N = 25 years:

(1 500 000 / 550 000)^0.04 - 1 = 4.1%

We can also use this to determine growth rate per year of a financial metric of a company found on its financial statements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do we solve for N? I.e., how long would it take to turn £100 into £500 at 10% compounded annually?

A

Solve for N:
FV = PV (1+r)^N
(1+r)^N = FV/PV
N ln (1+r) = ln (FV / PV)
N = ln (FV / PV) / ln (1 + r)

In this case:
N = ln (500 / 100) / ln (1+0.1)
N = 16.89

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How would you determine what your monthly payment for a £500,000 mortgage would be at a 4% interest rate compounded monthly over 20 years?

A

You can simply use the annuity formula!

If PV = A [(1 - 1 / {1 + r}^N) / r ]
then A = PV / [(1 - 1 / {1 + r}^N) / r ]

Make sure to modify the periodicity by dividing the interest rate by 12 and multiplying N by 12:

A = 500 000 / [(1 - 1 / {1 + 0.04/12}^12*20) / 0.04/12 ]
A = 500 000 / 165.022
A = £3 030 per month

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How do we solve for a payment to meet a retirement goal?
I just turned 23.
At age 53, I want to retire.
I expect to live for 30 years, until 83.
I want to receive £40,000 per year during this time period.
For the next 5 years, I can save £2000 per year
From 28 onwards, how much do I need to save per year to hit my retirement goal?
Assume our return is 6.25% (1/16)

Solve by bringing the value of the retirement income back.

A

First let’s calculate the value of the initial payments on my 28th birthday:
FV5: N=5, PMT=2000, I/Y=6.25, PV=0 CPT FV
= £11,339

Then let’s calculate the value of the future retirement income when I hit 53:
PV30: N=30, PMT=40 000, I/Y=6.25, FV=0 CPT PV = 536 173

Third, let’s compare the value of the future retirement income at my 28th birthday:
PV5 = PV30 / (1.0625)^25
PV5 = £117 783

Now let’s see how far short I am:
117 783 - 11 339 = 106 444

The PV of my earnings from 28 to 53 must therefore equal £106,444

N=25, FV = 0, I/Y = 6.25, PV = 106,444 CPT PMT = 8 526

So I would need to pay in £8 526 per year from 28 to 53 to hit my retirement goal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do we solve for a payment to meet a retirement goal?
I just turned 23.
At age 53, I want to retire.
I expect to live for 30 years, until 83.
I want to receive a nominal £100,000 per year during this time period.
For the next 10 years, I can save a nominal £4000 per year
From 33 onwards, how much do I need to save per year to hit my retirement goal?
Assume our return is 8%

Solve by bringing the value of initial payments forward.

A

First let’s calculate the value of the initial payments on my 28th birthday:
FV10: N=10, PMT=4000, I/Y=8, PV=0 CPT FV
= £57,946

Second, let’s calculate the value of the future retirement income when I hit 53:
PV30: N=30, PMT=100 000, I/Y=8, FV=0 CPT PV = £1,125,778

Third, let’s compare the value of the initial savings when I retire:
PV30 = PV10(1.08)^20 = £270 083

Fourth, let’s find the difference:
1,125,778 - 270 083 = 855 695

Fifth, let’s calculate what annual payment I would need to make over the last 20 years of working to reach this figure:
N=20, FV=855 695, I/Y=8, PV=0 CPT PMT = £18,699

Thus I would need to save £18,699 (nominal) every year whilst working from 33 to 53 to meet my retirement goals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is data?

A

A collection of numbers, characters, words or text that represents FACTS or INFORMATION

Thus,
1. Data is not knowledge
Analysis or interpretation brought to data brings knowledge
2. Data does not have to be numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the four types of data?

A

NOIR

Categorical data:
- Values that describe a quality or characteristic
- Mutually exclusive labels or groups (somethign cannot belong to more than one category)

Numberical data:
- Measured or counted quantities
- Quantitative

Categorical:
N for Nominal (no logical order)
O for Order (has a logical order or rank, with gaps or groups of any size)

Numerical:
I for Integer (Discrete): limited to a finite number of values
R for Ratio (Continuous): can take on any value within a range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the difference between cross-sectional, time series, and panel data?

A

Cross-sectional data involves multiple observations of a particular variable. I.e., the stock prices of 60 companies. In this case N=60

Time series data involves multiple observations of a particular variable for the same observational unit over time. For example, GM’ stock price over the last 60 months

Panel data is a combination of cross sectional and time series data. It might involve multiple observations of a particular variable (stock price of 60 companies) across a period of time (60 months).
Putting time down the y axis and companies along the x axis creates a data table of panel data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a variable?

A

A particular quality or characteristic we are tracking, like stock price or height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is an observation?

A

The value of a specific variable. E.g., GM at $53.50 (where the variable is stock price)
Tom at 93kg (where the variable is a person’s mass)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the difference between structured and unstructured data?

A

Structured data is highly organised in a pre-defined manner. I.e., stock prices, returns, earnings per share

Unstructured data has no organised form. E.g., news, social media posts, company filings, audio/video

Unstructured data is also sometimes called alternative data. it can be produced by individuals, business processes (credit card transactions), or generated by sensors. To be useful in data analysis, it must be transformed into structured data. This is what machine learning does: it adds structure to unstructured data and gets progressively better at doing this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is a one dimensional array? What is a two dimensional array?

A
  • A 1D array is a column of a spreadsheet showing observations for 1 variable. It could be cross sectional or time series data.
  • A 2D array is a rectangular array showing two or more variables. It is also known as a data table. It could be cross sectional or panel data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is frequency distribution in the context of one way tables?

A

The number of observations of a specific value or group of a variable. I.e., how many are there in that category. Frequency could also be relative, i.e., number in that category as a % of number in all categories.
It is sorted in ascending or descending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How do we assess frequency when dealing with numerical data?

A

We create bins (aka non overlapping intervals)

  1. Sory data in ascending order
  2. Find the range: max to min
  3. Decide on the number of intervals (k)
  4. Interval width = range / k (we always round up)

Be careful when choosing k. Too few leads to too much aggregation and loss of info
Too many results in insufficient aggregation, and too much noise included (i.e., only one observation in each interval)
You may have to play around with different k values to choose a good one that gives you the right amount of information. The ML algorithm is only as good as the data you give it
Interval no.1 will be the min value + width. When we specify intervals, a square bracket means it includes the value adjacent to it, a round bracket means it does not. i.e., (0,5] will include 5 but not 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How do we determine the size of the bins when attempting to turn continuous data into discreet data?

A
  1. Arrange the data in ascending order
  2. Minus high from low to get range
  3. Let k = a chosen number. Divide the range by k to get bin size
  4. Sort the data into each bin. Count the number falling into each.

If the data is too concentrated (ie a majority falling into one bin) or too spaced out (i.e. most bins have nothing or only one data point in) adjust k accordingly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is a cumulative frequency?

A

A sequence of partial sums that sum to N or 100%

So when you move from one bin to the next you add all those in previous bins (i.e., when going from bottom to top value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is a contingency table?

A

Summarises data for 2 or more categorical variables
Helps us visually find patterns
A 2 way table will have 2 variables

One variation might see 3 bins for small, mid, and large market capitalisation. These labels will be shown across the top
Then down the y axis we would see the categories of different sectors (nominal data) like communication services, consumer staples, energy etc. That way we could see where the concentration is within our data set. Along the right hand side and bottom we might see totals to compare
Every entry in the table is called a joint frequency
You could also express each item as a percentage of row total, column total, or overall total for comparability

We can pull a lot of information about a portfolio just by breaking it down and using a contingency table like this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are some applications of contigency tables?

A
  1. Confusion matrix
    Used to help assess the precision of a classification model i.e. in ML in Level
  2. Identify potential association between 2 categorical variables

For example, we can use a contingency table to help conduct a “chi square test of independence”

We would develop 2 tables, one where we just input actual values (i.e. low or high risk across, growth or value stock down), and one where we write down what we would expect to see for each value in this matrix

Then we would do the sum of [(observed - expected)^2 / expected]

The greater the Chi squared value, the higher the probability there is an association between the tested variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is a histogram?

A

Used to present distributions of numerical data. Useful when we want to compare to a normal distribution or a log distribution that has well defined properties and look for kurtosis, skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is a frequency polygon?

A

Created from joining together the tops of the histogram bars, giving you an understanding of the distribution without calculations.

Can also be in the form of cumulative frequency. Adds from low to high. Can see where the most observations are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is a bar chart?

A

For categorical rather than numerical data. Can be horizontal or vertical, stacked showing decomposition, or grouped when there are 2 variables (including a nominal and a numerical observation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is a tree map?

A

A set of coloured rectangles used to represent groups. Area = % of that group. We can have nested rectangles to decompose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is a word cloud?

A

Depicts frequency of unstructed data i.e., text. Colour can be used to display sentiment or simply distinguish between words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is a line chart?

A
  • Line chart. Used to visualise ordered observations. Typically used for time series data, to show changes and underlying trends. We could add other characteristics by also adding bubbles (i.e. EPS along stock price), in colours to show positive or negative EPS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is a scatter plot?

A

Used to visualise joint variation in 2 numerical values. There may be no relationship, linear relationship, or non-linear relationship. A scatter plot matrix can be used to assess pairwise association between many variables. Many scatter plots will be laid next to one another so you can spot if there are any trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is a heat map?

A

A contingency table with colour coded cells
Can be generated in BB terinal
Can also be used to visualise the degree of correlation among different variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

How should we select among visualisation types?

A

Decide whether you are looking to explore/present a relationship, distribution, or comparison

If a relationship, look toward scatter plots, heat maps

If a distribution of numerical data, look at histograms, freq polygons, and cumulative distribution charts
if a distr of categorical data, look at bar charts, tree and heat maps
If a distr of unstructured data, use a word cloud

If a comparison among categories, try a bar chart, tree map, or heat map
If a comparison over time, try a line chart

Heat maps are particularly versatile!

54
Q

What are the potential pitfalls when selecting visualisation type?

A
  1. Selecting an improper chart type that hinders accurate interpretation of the data
  2. Selecting data that favours a particular partisan conclusion
  3. Truncating the range of data (so you don’t get the full picture)
55
Q

What is a measure of central tendency?

A

A measure that specifies where data are centred.
Examples include:
- Arithmetic mean
- Median
- Mode
- Weighted mean
- Geometric mean
- Harmonic mean

The geometric mean is used a lot at level 3 and quite a lot at level 2. To really understand the Black-Scholes options pricing model and how Z-values are generated, you need to understand it as well

56
Q

What is the difference between population and sample?

A

Population is everything that we might want to look at. When we talk about population we are referring to parameters that describe data. Mean, Standard Deviation (a measure of dispersion)

A sample might only be the last two years. Sample statistics include x bar and s (these are descriptive statistics). S is the measure of dispersion for a sample.

We use greek letters for populations

Later on we’ll look at inferential statistics where we can say something about the population based on the sample

57
Q

What is the artithmetic mean?

A

The classic mean.
Sum of all the values you have divided by the number of values
You could have a cross sectional mean (i.e., average sales of 50 comanies)
Or a time-series mean (average sales for GM over 10 years)

When you sum all the deviations of values from the arithmetic mean you should get 0

We can calculate:
- variance by taking the deviations squared
- skew by taking the deviations cubed
- kurtosis by taking the deviations to the power of 4

58
Q

What is the disadvantage of arithmetic mean and what options do we have for dealing with it

A

The disadvantage of arithmetic mean is that it is highly sensitive to outliers.
The AM of 1,2,3,4,5,6,1000 is 145.86. This is not representative of any value of the set!

Options
1. Do nothing: AM may be appropriate if the value is legitimate and correct. It may contain meaningful information

  1. Use a trimmed mean. Exclude a small % of lowest and highest value. I.e., a 5% trimmed mean would delete top 2.5% and bottom 2.5% of the data
  2. Replace outliers with another value. we can use a winsorized mean.
    A 95% winsorized mean would be one where the top 2.5% of values are replaced by the value at which all others lie above. The bottom 2.5% replaced with the value at which all others lie below.
59
Q

What are the pros and cons of using the median to measure central tendency?

A

The median is the middlemost value of a set of observations. If there are 11 values the median is the 6th, if there are 10 then we find the mean of the 5th and 6th.

Pros:
- not affected by extreme values (outliers).
- It is thus useful for describing central tendency for non symmetrical distributions
- It can also describe symmetrical distributions. In a perfectly symmetrical one mean = median

60
Q

What are the characteristics of the mode?

A

The mode is the most frequently occuring value in a distribution.
Unimodal selects only 1 value that is most frequent
Bimodal selects two values that have the highest frequency
Trimodal selects three
And we can keep going.
There will be no mode if there is no value that occurs more frequently than any other. This would be a uniform distribution.

Pros:
- Only measure of central tendency that can be used with nominal (non-numerical, non-ordered) data.

For a symmetrical distribution, mode = median = mean

61
Q

What are the characteristics of a weighted mean?

A

Weighted mean is often used in calculating the return of a portfolio or expected return given a set of asset classes and weightings on those asset classes

We write it out as x-bar sub-w

When there is equal weighting weighted mean equals arithmetic mean

When weighting is greater than 1 in a portfolio context we have a long position
When weighting is less than 1 in a portfolio context we have a short position

WM weights can also be probabilities. We can multiple probability of bullish, neutral, and bearish scenarios by the expected return in each scenario to calculate the return of the S&P500 for example.

62
Q

What are the characteristics of geometric mean?

A

GM is used with rates of change over time or to compute growth rates. You can use GM to find compound growth rates whereas using AM over multiple periods would not actually tell you the compound growth rate. Thus AM is fine for 1 period. But if across multiple periods we use GM. I.e., to calculate CAGR of a company’s sales based on Y1 and Y6 sales.
GM is calculated by taking the nth root of all the values multiplied together, where n is the number of values.
GM is thus always less than or equal to AM. It is only equal to AM if all the values are the same. AM and GM diverge, with GM getting relatively smaller, as variability increases
When we calculate returns we add 1+% return so we don’t get neg values
I.e., +5% becomes 1.05, -6% becomes 0.94.
We then subtract 1 at the very end to get a %

GM can also be calculated as
e^(ln(multiplied values)/n)

63
Q

What is the relationship between AM and GM?

A

GM is also written as x-bar sub-g
AM is also written as x-bar sub-a

GM = AM - (sigma^2) /2

GM is the AM minus the variance of the observations / 2
Since variance is SD squared
We have to multiply all of this by t if over multiple periods

64
Q

What is the harmonic mean?

A

The AM upside down
When we get to the denominator of AM, we turn it upside down too

Instead of sum of observations / n, we invert to n / sum of observations
Instead of having sum of observations, we then do 1 / sum of observations

The end result is this gives much less weight to outliers
It is appropriate for averaging ratios when the ratios are repeatedly applied to a fixed quantity to yield a variable no of units
I.e., dollar cost averaging
You are buying a variable number of units each month for a fixed price.
You can calculate the average price paid in a much more concise fashion using HM

If you buy 1000 worth of shares over 2 months with share price first at 10 and then at 15:

x-bar sub-h = 2/((1/10)+(1/15)) = 12

65
Q

When will we use HM, GM, AM?

A

AM is used when including all values including outliers (so maybe when we think the outliers are important)

GM is used when compounding is involved

HM is used to avoid outliers

When variance increases, the means spread out, such that:
AM > GM > HM

However, AM x HM ~= GM^2
This tendency is stronger with lower variance but can be very accurate

66
Q

How do we calculate a percentile value?

A

First find the location of the percentile by doing (n+1) x (y/100)
Where y is the percentile we are looking for
And n is the number of values in our data set
Our data must be sorted by size.

When location ends up as an integer value, we are done
When location is a decimal, we use interpolation

So for example if location ends up as 6.8, we multiply the difference between location 6 and 7 by 0.8. Then we add this value to location 6.

The larger the dataset the more accurate the percentile value is

67
Q

What is a box and whisker plot used for?

A

The whiskers show the highest and lowest value
The box shows interquartile range
There will be a line through it to show median and often an x or dot to show arithmetic average

It can be used to
- rank performance of portfolios and investment managers in terms of the percentile or quartile in which they fall
- Perform investment research looking at the bottom and top return decile

68
Q

What are upper and lower fence?

A

Additional information sometimes included in a boxplot.
We find the upper fence by multiplying IQR by 1.5. We then add it to the upper bound of Q3
We find the lower fence my multiplying IQR by 1.5, and subtracting it from the lower bound of Q2

69
Q

What are deciles used for?

A

Rank performance of portfolios and investment managers in terms of the percentile or quartile in which they fall
In investment research we can find the bottom return decile for example and take a short position, and take a long position on the top return decile. This is something that a hedge fund might do.
This is also a typical way to isolate a factor as discussed in L3

70
Q

What is dispersion?

A

Captures the variability around the central tendency
Measures of absolute dispertion include range, mean absolute deviation, variance, and standard deviation

71
Q

What are the measures of dispersion commonly used?

A
  1. Range: max value less min value
    This uses only two observations and tells us nothing about the shape of the distribution however, so its simplicity cuts both ways. It could simply take a range between outliers or return the same range for wildly different skew, kurtosis, or variance
  2. Mean absolute deviation.
    We take all the deviations from the arithmetic mean and divide by n
    The difference is that we take the absolute value of each deviation rather than positive and negative, so they don’t cancel out
    All the observations have equal weight - actually observations further from the mean should have more weight, which is why we turn to…
  3. Variance (s^2 or sigma2) and standard deviation (s or sigma)
    we use s to notate these when talking about sample and the greek when talking about population
    Variance is just standard deviation squared
    Variance is the difference from the arithmetic mean squared over (n-1)
    The square root of all of this is s.d.
    we use n-1 because if we have n observations we can actually only take n-1 as random: the nth is constrained (we can calculate it using the n-1 observations plus the mean)
    Therefore we do not have n independent variables, we have n-1
    Thus we actually lose 1 degree of freedom
72
Q

Why is standard deviation more useful than variance?

A

sd is expressed in the same units as a mean
variance is more difficult to interpret since it’s in the units squared

However, variance can still be useful:

GM ~= AM - S^2 /2
S^2 x t = geometric variance
S sqrt(t) = geometric sample standard deviation

This last one is important - you see it in level 2 in the black scholes merton model

73
Q

What is target semideviation and why is it useful?

A

It is a measure of dispersion below the target figure
As an investor we are not concerned with volatility on the upside but rather only downside risk
Thus we can set a target and find variance below this target

We calculate S sub-target as:

sqrt( (sum of squared deviations of all x-sub i that are less than B) / n-1)

We use the n of the whole dataset not just those that are less than B.
This means when we change B, only the numerator changes, not the denominator
Thus when we change the target the measure of target deviation changes satisfyingly as well

This is technically because figures that are above the target are simply entered as a deviation of 0. So technically they are counted in this formula, just as 0.

74
Q

What is a measure of relative dispersion, and why is it useful

A

Coefficient of variation
CV = S / x-bar
Sample standard deviation over the arithmetic mean
where x-bar >0

For returns, CV measures the risk per unit of return
It allows for direct comparisoin of dispersion across different datasets (different orders of magnitude)

75
Q

How are probabilities estimated?

A

Empirical probabilities are based on historical observation. The past is assumed to be representative of the future (not necessarily true). The historical period must include occurrences of the event

Subjective probabilities involve adjusting an empirical probability based on an intuition or experience.
We may do this when there is a lack of empirical observations, or to make a personal assessment

A priori probabilities involve arriving at a conclusion based on deductive reasoning. I.e., if a die has 6 sides the probability of rolling 6 is 1/6. This is perhaps the most objective method of estimation

76
Q

How are probabilities stated?

A

2 ways:
if there is a 10% chance:

odds:
1 to 9 chance (for every 1 occurence we expect 9 non-occurence)
9 to 1 chance of non occurence.
probability:
1 in 10 chance (out of 10 instances we expect 1 occurence)

77
Q

What are conditional and unconditional probability?

A

Unconditional probabiltiy is P(A)
Conditional probability is P(A │ B)

Probability of A occuring given B
We could also illustrate this as a venn diagram. Conditional probability is the intersection between the A circle and B circle.
Unconditional probability would just be the value of the whole A circle

The mhltiplication rule means that P(AB) = P(A│B) x P(B)

Also therefore P(A│B) = P(AB) / P(B)

78
Q

What is complement?

A

A and A complement is the probability of A + the probability of not A
Therefore A + A complement = 1

79
Q

What is the addition rule?

A

P (A or B) = P (A v B) = P(A) + P(B) - P(AB)
Since we are double counting otherwise

80
Q

What are dependent and independent events?

A

2 events are independent iff:
P (A│B) = P(A)
or P(B│A) = P(B):
Knowing B tells us nothing about A

A dependent event is where P(A) is related to P(B)
e.g., A = stock Q rises
B = SP500 rises
A is most likely dependent on B

81
Q

What is the difference between combination and permutation?

A

A combination is the number of ways of selecting r objects from n where order does not matter
nCr = (^n r) = n! / ((n - r)! x r!)

A permutation is the number of ways of selecting r objects from n where order does matter
nPr = n! / (n - r)!
So we don’t divide by r! for permutations since there are more possibilites

A recombining lattice is like a probability tree that joins up
This is used for permutations
And in FA in asset price moves

82
Q

What is a probability distribution?

A

Specifies the probabilities associated with the possible outcomes of a random variable

83
Q

What are the 7 common probability distributions and why are they useful to know?

A

Uniform, binomial, normal, lognormal, Student’s (named after a person called Student!), chi-square, or F-distribution

Most distributions will look like one of these 7
So when we see a distribution we can say it is an “approximately normal” or “approximately chi square” distribution
This is useful because each of these common distributions has well-defined mathematical properties, which we can then use to analyse and interpret our data

84
Q

What is a random variable and what are the two forms it can take?

A

A random variable is a quantity whose future outcoems are uncertain.

It can be either
- Discrete: take on at most a countable number of possible values (possibly infinite)
- Continuous: cannot count the possible values

Every random variable is associated with a probability distribution that describes the variable completely

85
Q

What is a probability function?

A

Specifies the probabilities that a random variable can take

For discrete variables we would use p(x)
For continuous variables we would use the probability density function

The probability function has two key properties.
1. 0 =< p(x) =< 1 (any given probability within the data must be between or equal to 0 and/or 1
2. sum p(x) over all values of x equals 1. That is, if you add up all the values beneath the probability function they should add to 1

86
Q

What is a CDF?

A

Cumulative distribution function
Gives the probability that a variable X is less than equal to a particular value x
Can be used for percentile rank for example
It is a slope that goes from 0 to 1 (0 to 1 being on the y axis)

87
Q

What is a discrete uniform distribution?

A

All outcomes are equally likely
The probability distribution is a rectangle
Thus length x width = 1

It will look like stairs of equal height and width as a CDF

88
Q

What is a continuous uniform distribution?

A

The same as a discrete uniform distribution but with a continuous random variable
Also a rectangular probability distribution
An even slope upwards as a cumulative distribution function

89
Q

What is a Bernoulli random variable?

A

One based on the outcome of a trial which produces one of two outcomes (binomial outcomes), interpreted as 1 or 0
p(1) = p
p(0) = 1 - p

In n trials, we can have 0 to n successes
If each trial is a random variable, then the number of successes in n trials is also a random variable, known as a binomial random variable

90
Q

What is a binomial random variable?

A

The number of successes in n Bernoulli trials
Assumption:
1. p is constant for all trials
2. Trials are independent

A binomial random variable has a distribution completely described by 2 parameters
x ~ B(n, p)

  • To find how many successes (x ) are in n trials we can use nCr
    Because the order doesn’t matter

-[ When we ask how probable is it to have x successes in n trials we can do:
p^x (1 - p)^(n - x)

  • We multiply nCr by this to get the probability distribution function for a binomial random variable

n! / ((n - x)! x!) * p^x (1 - p)^(n - x)

91
Q

Why when we’re calculating probability are we only interested in the tails?

A

If we continue counting up past the mid point of the probability distribution we would misinterpret it
Such that we would deduce that achieving the top figure has a 100% chance
We have to count in the direction from the centre toward to tail

92
Q

How do we calculate mean and variance for Bernoulli and binomial distributions?

A

For Bernoulli,
mean = p
variance = p (1 - o)

For Binomial,
mean = np
binomial = np (1 - p)

93
Q

What is the central limit theorem?

A

The distribution of a large number of independent random variables with finite variance is approximately normal

Let’s say we take a whole bunch of samples of random variables that are not related to each other and find their means
The distribution of these means will be approximately normal

The central limit theorem tells us that because of this result a lot of data tends to be normally distributed

94
Q

What is a standard normal distribution?

A

A distribution where we have set the mean to 0 and standard deviation to 1
We may want to standardise our values (if they fall into an approximately normal distribution) and turn it into a standard normal distribution to allow data processing (using things like ML) and cross comparison

95
Q

Why do we use a normal distribution to model asset returns but not asset prices?

A

We use a normal distribution to model continuously compounded asset returns
We do not use it to model asset prices because the left tail of a nd goes to negative infinity, whereas asset prices go to 0

Asset returns are approximately normally distributed, so we can use nd to model (“close enough”)
However asset returns tend to be more kurtotic than normal (longer tails), and options add skew (pos/neg)
There is a lot more of this at L3

96
Q

What are the characteristics of nd?

A

A normal distribution has these 3 characteristics:
1. Described by 2 parameters, mu and sigma squared (population variance). The formula is X ~ N(mu, sigma squared)

  1. Skew = 0 and kurtosis = 3 (K sub-c = 0). Therefore median = median = mode
  2. A linear combination of 2 or more normal random variables is also normally distributed.
    So R sub-p = w sub-1 R sub-1 + w sub-2 R sub-2 + w sub-3 R sub-3 …. is also nd, althought it is multivariate. Each of these terms is a univariate random variable
97
Q

What 3 lists of variables define a multivariate normal distribution in a portfolio management context?

A
  1. All the mean returns of all the individual securities (n returns)
  2. All the securities’ variances (n variances)
  3. All pairwise correlations. There are (n^2 - n)/2 unique correlations

Usually in PM we do this at the asset class level, because if we did this at the level of individual securities it could quickly become unmanageable

98
Q

For a nd how many sd are required to capture 95% and 99% of outcomes?

A

95% = 1.96 standard deviations for a normal distribution
99% = 2.58 standard deviations for a normal distribution

99
Q

How do we standardise a normal distribution?

A

We need to set mean to 0 and standard deviation to 1

Let’s say we have a distribution of n=30
Our mean is 4.7
Our standard deviation is 3

For each obseration, we calculate z (the standardised value) as:
z = x sub-i - x-bar / sigma

z = 7.2 - 4.7 / 3 = 0.8333

100
Q

How do we calculate probabilites on a normal distribution?

A

If we have a z-value and excel it’s easy
We can use the function NORM.S.DIST(z, 1)

Where z is the z-value
And 1 means that we use a cumulative probability function rather than a probability density function

The output will be the probability from 0 to 1

If we want a z-value out, we can use NORM.S.INV(probability). This will output a z value from 0 to infinity, but most z values fall between 0 and 3. z-values can also be negative, but because the normal distribution is symmetrical we don’t need to worry about this

101
Q

How are NORM.S.INV(0.95) and NORM.S.INV(0.05) related?

A

NORM.S.INV(0.95) willl return the z-value for 95th percentile
1 - NORM.S.INV(0.95) = 5th percentile

102
Q

How do we find the probability that the return on a portfolio is greater than or equal to 12%, but less than or equal to 20%?

Mean return is 12%
SD is 22%

A

We can express this as:
P(12% =< R sub-p =< 20%)

We calculate the z-value as:
z = (x sub-i - x-bar) / sd

so z = ((20-12)/22) = 0.3636
and z = ((12-12)/22) = 0

Then to find the probabilities we use the NORM.S.DIST function and subtract one from the other:

NORM.S.DIST(0.3636, 1) - NORM.S.DIST(0,1)

103
Q

Why might we use a t-test over a z-test?

A

Student’s t distribution has fatter tails than the normal distribution (excess kurtosis / platykurtic)
Therefore if something is significant in a t-test it will definitely be significant in a z-test

104
Q

What are degrees of freedom?

A

Sample size minus 1 (or n - 1)

As degrees of freedom increases the tails of the t distribution are pulled in and added to the head, such that it converges to a normal distribution over n=200
Thus theoretically we would use a t test for small n values (below 200) and a z-test or normal distribution test for values above 200
However in practice we just use t really

105
Q

What are the test statistics for z-test and t-test?

A

z = (x-bar - mu) / (sigma / standard error)
where mu and sigma are population parameters. As such, only 1 estimate is used

t = (x-bar - mu) / (sigma / standard error)
Where x-bar and s are sample statistics. As such, 2 estimates are used

T-tests are used for hypothesis testing since they are more conservative, more stringent, and produce wide confidence intervals

106
Q

What is the chi-squared distribution?

A

A distribution of variance
The interesting thing about variance is you can’t have a negative value, because it’s deviations squared.
Like log normal, it is bounded below by 0.
Variance follows a very particular distribution, depending of number of parameters used to arrive at the distribution.
The distribution of variances is: the sum of the squares (of deviations) of k independent standard normally distributed random variables.

Degrees of freedom is n - 1, same as t-distribution
Because variance cannot be negative, the distribution flattens out. And with low degres of freedom (2, 3) the distribution gets pushed up against the y-axis
As such, as degrees of freedom increase, the distribution becomes more symmetrical and bell-shaped (though flattening)

107
Q

What is the F-distribution?

A

Bounded below by 0 like the chi square distribution
Because it is the ratio of 2 chi square variables

F = ((chi square sub1)/n sub1 - 1) / ((chi square sub2) / n sub2 - 1)

By convention, the larger figure is used as the numerator on top

F test is used in regression to test the significant of the whole regression. It is explained variance divided by unexplained variance. The higher the number is, the better the model is explaining all total variance.

108
Q

What are the excel functions for chi square distribution?

A

CHISQ.DIST(chi squared value, degrees of freedom)
Input is chi square value, output is a probability

CHISQ.INV(p, degrees of freedom)
Input is a probability, output is a chi square value

109
Q

What are the excel functions for cumulative t distribution?

A

T.DIST(t-value, degrees of freedom, 1)
Use 1 to specify cumulative distribution function.
Input is a t-value, output is a probability

T.INV(p, degrees of freedom)
Input is a probability, output is a t-value

110
Q

What are the excel functions for f distribution?

A

F.DIST(F-value, df for numerator, df for denominator, 1)

Input an f value and the degrees of freedom of the two variables, output is a probability

F.INV(p, df1, df2)

Input a probability and degrees of freedom for the variables, output is an f-value

111
Q

What is an estimator?

A

A formula used to estimate a statistic (ie variance)

112
Q

What are the desirable properties for estimators?

A
  1. Unbiasedness: an unbiased estimator is one whose expected value (the mean of its sampling distribution) equals the parameter it is intended to estimate

An unbiased estimator would be one where xbar = sum of xsubi / n
xbar = sum of xsubi / (n-1) would be biased upwards because it would increase the estimate of the mean upwards by 1

  1. Efficiency: an unbiased estimator is efficient if no other unbiased estimator has a sampling distribution with smaller variance

A more efficient estimator will have a taller head and thinner tails (even though both are unbiased

  1. Consistency: a consistent estimator is one for which the probability of estimates close to the value of the population parameter increase as the sample size increases

For example, if our estimation of Standard Error was SE = S/sqrt(n)
this would be a consistent estimator. Because as n increases standard error should decrease

113
Q

What is a confidence interval?

A

A range for which one can assert with a given probability (1-alpha), called the degree of confidence, that it will contain the parameter it is intended to estimate

I.e., lower limit <- xbar -> upper limit
This is a two sided confidence interval

114
Q

What is a point estimate?

A

An estimate for what a parameter is

115
Q

What are the two interpretations of a confidence interval?

A
  1. Probabilistic: in repeated sampling, 95% (for example) of such CIs will in the long run include or bracket the population mean
  2. Practical: 95% confident that a given CI contains the population mean
116
Q

How do we construct a CI?

A

Take the point estimate (xbar)
Add or substract the reliability factor, multiplied by the standard error

The reliability factor can be based on a z value or a t value
The standard error is sigma / sqrt(n) or s / sqrt(n) if you only have sample variance

If you multiply reliability factor x standard error by 2 you get the confidence interval, as it is plus minus

117
Q

What are the most common reliability factors?

A

90% confidence interval: 1.65 rf
95%: 1.96
99%: 2.58

118
Q

Do we use z or t to find our confidence interval if we have a large sample with variance unknown?

A

z, because as sample size increases t increases
i.e., if n=400 we would just use z
The reading tends to say over n=30 we would stop using t, but over 200 or 300 is where they converge. A “large sample size” is not really 50.
You can never be WRONG when using the t value because of the convergence

119
Q

How do we find t-value in excel?

A

=T.INV(probability, degrees of freedom)
gives you the t value or the negative t value

120
Q

Under what conditions would we use the z value?

A
  • You have to know the population variance
  • The population has to be either normally distributed OR your sample large
121
Q

How do we determine what sample size will be required to obtain a confidence interval of 1% can be created?

A

Let’s call this E:
xbar +/- ( t x s/sqrt(n) )

The width of the confidence interval will be 2E
Thus we can rearrange to:

n = [ (t x s) / E]^2

We would not expect standard deviation for the sample to change as n changes, but we would expect standard error to change.

122
Q

What is a data snooping bias?

A

The bias of searching a data set for statistical patterns or relationships. This is also known as data mining.

If alpha = 5%, testing 100 different variables, on average, will produce 5 significant relationships

Data snooping is typically not theory-driven, and lacks an economic rationale behind it.

123
Q

How do we minimise or avoid data snooping bias?

A
  1. To combat data snooping bias we must have a clear, well-formulated hypothesis. It must have an economic rationale and accompanying theory behind it.
  2. We split our data set into a training data set, a validation data set, and test data.
    - The training data is used to build and fit a model
    - The validation data set is sed to fit and tune the model.
    - The test data is used as an out-of-sample test to evaluate model fit. If data snooping is present, there will be insignificant model fit!
124
Q

What is sample selection bias?

A

Excluding some observations or time periods (basically choosing non-random samples)

i.e., survivorship bias: historical data may only include data for companies that survived
This would overstate the performance.

Another example would be using hedge fund indexes. Since they self-report, only well-performing funds may opt to report.

125
Q

What is look ahead bias?

A

Using information that was not available on the observation date.
I.e., models that use price and accounting data from the historical record, when the accounting data may not have been available on the same date.

For example, we can observe the price on Dec 31st, and book value on Dec 31st, but in fact BV may not have been reported until mid February. Linking BV and price on Dec 31st would be look ahead bias.

126
Q

What is time period bias?

A

Results in one time period may be specific to that time period.
Time period bias is typical of SHORT time series
However, time series that are too long risk including more than one regime or distribution

127
Q

What is statistical inference?

A

The process of making judgements about a larger group (population) based on a smaller group (sample)

E.g. hypothesis testing. Test to see whether a sample statistic is likely to come from a population with the hypothesised value of the population parameter.
i.e., does xbar = Msub0 ?

128
Q

What is a hypothesis?

A

A statement about one or more populations that are tested using sample statistics

Process:
1. State the hypothesis
2. Identify the appropriate test statistic
3. Specify the level of significance
4. State the decision rule
5. Collect data and calculate the test statistic
6. Make a decision

129
Q

How do we state the hypothesis?

A

Null (Hsub0) is assumed to be true, unless:
Alternative (Hsuba)

Typically we WANT to reject Hsub0
So we may hypothesise that our mean is greater than the population mean. xbar > Msub0
What we do is rule out that xbar =< Msub0
If we are successful then we reject Hsubo, thus “proving” our hypothesis

A two-sided test (two-tailed) is one where we can reject the null either side.
An Hsuba where mu /= 6% would be a two tailed test. We might rule out mu = 6% if it is less than 5.5 or 6.5% or greater.

A one-sided test (left or right taled test) is one where we can reject the null n one side.
A one sided side where HsubA is mu > 6%, for example, would be a right tailed test
A one sided test where HsubA is mu < 6% would be a left tailed test

The null will always contain the equality sign. We test at the point of equality

130
Q

How do we identify the appropriate test statistic?

A

If population variance is known (sigma squared) then we can use the z test

z = [xbar - musub0] / [sigma / sqrt(n)]

If population variance is unknown, we will default to a t test

t = [xbar - musub0] / [s / sqrt(n)]

131
Q

How do we specify the level of significance?

A

The level of significance depends on the SERIOUSNESS of making a mistake.
Usually we use 5% or 1%. We might use higher in social sciences and lower in physics

There are two types of mistake you can make:
false positive. If I determine someone is pregnant when they’re not
false negative. If I determine someone is NOT pregnant when they in fact are.

We can always decrease the likelihood of a Type 1 error by decreasing alpha (the significance level).
However, as alpha decreases, beta increases.
If we reduce the likelihood of Type 1 errors, we increase the chance of Type 2 errors (false negative)
The ONLY WAY to reduce both is to increase n (sample size). This is because we are decreasing the denominator. Thus the t-stat becomes larger.

To understand type 1, type 2, reject, do not reject, true hypothesis, false hypothesis, beta, and alpha, just draw out a little grid on your piece of paper. This makes things much easier.