Quantitative methods Flashcards
cross sectional data
> many observations of variables (subset)
same time period
time series data
> many observations
different time periods
panel data
> different time periods
many observations for each time period
combo of cross sectional and panel data
strong positive corr
steep positive line
most appropriate functional form of regression by inspecting the residuals
want residuals to be random
permissionless distributed ledger technology (DLT) networks
> No centralised place of authority exists
all users i(nodes) within the network have a matching copy of the blockchain
DLT that could facilitate the ownership of physical assets
Tokenization
Tokenization
> representing ownership rights to physical assets e.g. real estate
creating a single digital record of ownership to verify ownership title and authenticity
application of DLT management
> cryptocurrencies
tokenization
compliance
post-trade clearing
settlement
type of asset manager making use of fintech in investment decision making
> quants
fundamental assets mngrs
data processing methods
- capture
- curate
- storage
- search
- transfer
fintech
technological innovation in the design and delivery of financial services and products
what is fintech
> analysis of large databases (traditional , non-traditional data)
analytical tools (AI for complex non-linear relationships)
automated trading (algorithms - lower costs, anonymity, liquidity)
automated advice (robo-advisers - may not incorporate whole information in their recommendations)
financial record keeping (DLT)
Big data characteristics
volume
velocity (real-time)
variety (structured, semi-structured and unstructured data)
veracity (important for inference or prediction, credibility and reliability of various data sources)
sources of big data
finanicla markets
businesses
governments
individuals
sensors
internet of things
main sources of alternative data
businesses
individuals
sensors
types of machine learning
- supervised learning (inputs and outputs labelled, local market performance)
- unsupervised learning (no data labelled, grouping of firms into peer groups based on characteristics)
- deep learning (multi stage non linear data to identify patterns, supervised + unsupervised ML approaches)
Determinants of Interest Rates
r = Real risk-free interest rate + Inflation premium + Default risk premium + Liquidity premium + Maturity premium.
1 + nominal risk-free rate
(1 + real risk-free rate)(1 + inflation premium)
increased sensitivity of the market value of debt to a change in market interest rates as maturity is extended
maturity premium
defined benefit pension plans and retirement annuities
over the life of a beneficiary
MWRR & TWRR
1) cash flows where inflows = outflows
2) HPR : (change in value of share + dividend)/initial value
annualised compounding rate of growth
r annual
(1+r weekly)^52 -1
gross return
excl : mngmnt , taxes , custodial fees
incl : trading expenses
net return large vs small fund
small fund at disadvantage due to fixed administration costs
return on leverage portfolio
R_p + (V_d/V_e)(R_p - r_d)
cash flows associated with fixed income
> discount e.g. zero coupon bond (FV-PV)
periodic interest e.g. bonds w coupons
level payments : pay price + pay cash flows at intervals both interest and principal ( amortizing loans)
ordinary annuity
r(PV) / (1-(1+r)^(-t))
forward P/E
payout / (r-g)
trailing P/E
(p*(1+g))/(r-g)
(1+spot rate) ^n
(1+spot rate) ^(n-i) * (1+ forward)^(n-i)
IRP
> spot FX * IR = forward FX
continuous compounding
percentile
(n+1)*(y/100)
mean absolute deviation
> dispersion
(sum abs(x-xavg))/n
sample target semi-deviation formula
((SUM_(x<=B)(X-B)^2)/(n-1))^(1/2)
coefficient of variation
sample st dev / sample mean
skewness
positive:
> small losses and likely
> profits large and unlikely
> invesotrs prefer distribution with large freq of unuasally large payoffs
kurtosis
observations/ distribution in its tails than normal distrib
> platykurtic (thin tails, flat peak)
> mesokurtic (normal distr)
> leptokurtic (fat tails, tall peak)
high kurtosis
higher chance of extrmees in tails
> platykurtic (thin tails, flat peak)
mesokurtic (normal distr)
leptokurtic (fat tails, tall peak)
- kurotsis < 3 , excess kurotsis -ve
- kurtosis = 3, excess kurtosis 0
- kurotsis > 3, excess kurotsis +ve
spurious correl
> chance rel
mix of two variables divided by third induce correl
rel of two var between third have correl
updated probability
(prob of new info given event / unconditional prob of new info) * prior prob of event
p(event|info)
[P(info|event)/P(info)]*P(event)
P(F|E)
P(F)P(E|F)/[P(F)P(E|F)+P(Fnot)*P(E|Fnot)]
odds for event
P(E)/[(1-P(E)]
odds against event
[(1-P(E)]/P(E)
Empirical
> Probability - relative frequency
historical data
Does not vary from person to person
objective probabilities
A priori
> Probability - logical analysis or reasoning
Does not vary from person to person
Objective probabilities
Subjective
> Probability - personal or subjective judgment
No particular reference to historical data
used in investment decisions
A&B mutually exclusive and exhaustive events
P(C) = P(CA)+P(CB)
P(B or C) (non-mutually exclusive events)
P(B or C) = P(B) + P(C) – P(B and C)
P(B C)Dependent events
P(B C) = P(B) x P(C| B)
P(C) unconditional probability
P(C) = P(B) x P(C given B) + P(Bnot) x P(C given Bnot) = P(C and B) + P(C and Bnot)
No. of ways the k tasks can be done
= ( n1)( n2 )( )….(nk )
Combination (binomial) formula
seq does not matter
cov
P * (r-E(r_a))(r-E(r_b))
shortfall risk
return below min level
(E(R_p)- R_l) / sigma_p
Roy’s safety-first criterion
- Optimal portfolio: minimizes the probability that portfolio returns fall below a specified level
- If returns are normally distributed, optimal portfolio maximizes safety-first ratio
Measuring and controlling financial risk
- Stress testing and scenario analysis
- Value-at-Risk (VaR) - value of losses expected over a specified time period at a given level of probability
Bootstrapping
> no knowledge of population
sample of size n
Unlike CLT that considers all samples of size n from the population - samples of size n from the known sample that also has size n
Each data item in our known sample can appear once or more or not at all in each
resample (due to replacement)
computer simulation to mimic the process of CLT : randomly drawn sample as if population
Easy to perform but only provides statistical estimates not exact results
Resampling
repeatedly draws samples from one observed sample to make statistical inferences about
population parameters.
Monte Carlo Simulation
> large number of random samples : represent the role of risk in the system
> specified probability distribution
e.g. pension assets with reference to pension liabilities
> Produces a frequency distribution for changes in portfolio value
> Tool for valuing complex securities
Limitations of Monte Carlo simulation
- Complement to analytical methods
- Only provides statistical estimates, not exact results
- Analytical methods provide more insight to cause-and-effect relationships
Historical simulation
- Sample from a historical record of returns or other underlying variables
- Underlying rationale is that the historic record provides the best evidence
of distributions - Limited by the actual events in the historic record used
- Does not lend itself to ‘what if’ analysis like Monte Carlo simulation
sampling error
diff be/een statistic and estimated parameter
Stratified random sampling
- divided into strata
- simple random samples taken from each
e.g. bond indices - Guarantees population subdivisions are represented
Cluster sampling
- divided into clusters – mini-representation of the entire population
-certain clusters chosen as a whole using simple random sampling - If all members in each sample cluster are sampled: one-stage cluster sampling
- If a subsample is randomly selected from each selected cluster : twostage cluster sampling
- time-efficient and cost-efficient but the cluster might be less representative of the population
Convenience sampling
Might be used for a pilot study before testing a large-scale and more representative
sample
Judgmental sampling
Sample could be affected by the bias of the researcher
Properties of Central Limit Theorem
- Assuming any type of distribution and a large sample
- Distribution of sample mean is approximately normal
- Mean of the distribution of sample mean will be equal to population mean
- Variance of distribution of sample mean equals population variance divided by the
sample size
Jackknife
> no knowledge of what the population looks like
sample of size n which is assumed to be a good representation of the population
unlike bootstrapping items are not replaced
bootstrapping we have B resamples but with jackknife we have n resamples such that resample sizes are n, n-1, n-2, n-3,……, 3, 2, 1
For a sample of size n, jackknife resampling usually requires n repetitions. In contrast, with bootstrap resampling, we are left to determine how many repetitions are appropriate
used to reduce the bias of an estimator and to find the standard error and confidence interval of an estimator
Bootstrapping and Jackknife
- Jackknife tends to produce similar results for each run whereas bootstrapping usually gives different results because resamples are drawn randomly
- Both can be used to find the standard error or construct confidence intervals for
the statistic of other population parameters
> such as the median which could not be done using the Central Limit Theorem.
Bernoulli and Binomial properties
mean : p , var: p(1-p)
mean : np , var: np(1-p)
Discrete and continuous uniform distribution (random # for Monte Carlo sim)
f(x) = 1/#X
f(x) = #/(b-a)
multivariate distribution pairwise corr
> n*(n-1)/2
feature for the multivariate normal distr
99%, 95%, 68%, 90%
+-2.58
+-1.96
+-1
+- 1.65
t-distr
n-1 df
as t large n>30 approaches normal distri
> fatter tails and less peak to normal curve
students t and chi squared distr
> asymmetrical and bounded below by 0
family of dsitributions
chi square (1)
F(2) numeration and denominator df
as n tends to infty the probability density functions becomes more bell curved
properties of an estimator
unbiased - sample mean = population mean
effcient - no other estimator has a sampling distribution with smaller variance
consistent - improves w sample size increase
Point estimate is not likely to equal population parameter in any given sample
CI
Confidence intervals
Point estimate +/- (Reliability factor (z_(a/2))x Standard error (sigma/(n)^(1/2))
increase in reliability e.g. from 90% - 95%
wider CI
- If the population’s standard deviation is not known
t-stat (sigma >1)
Normal distribution with a
known variance
sample < 30 - z-stat
sample > 30 - z-stat
Normal distribution with
an unknown variance
sample < 30 - t-stat
sample > 30 - t-stat or z-stat
Non-normal distribution
with a known variance
sample < 30 - N/A
sample > 30 - z-stat
Non-normal distribution
with unknown variance
sample < 30 - N/A
sample > 30 - t-stat or z-stat
What affects the width of the confidence interval
- Choice of statistic (z or t)
- Choice of degree of confidence
- Choice of sample size
- Larger sample size decreases width
- Larger sample size reduces standard error
- Big sample means t-calcs closer to z-calcs
- Same for at least 30 observations
Problems with
larger sample
size
cost
cross- poulation data
Two-sided (or two-tailed) hypothesis test
Not equal to alternative hypothesis
* H0 : ϴ = ϴ0 versus Ha : ϴ ≠ ϴ0
One-sided hypothesis test
- A greater than alternative hypothesis
- H0 : ϴ ≤ ϴ0 versus Ha : ϴ > ϴ0
- A less than alternative hypothesis
- H0 : ϴ ≥ ϴ0 versus Ha : ϴ < ϴ0
t-stat z-score
(mean - estimated mean) / standard error
2-tail or 1-tail significance level
subtract 0.3 from 2 tail for z-stat
Type II error (β) + Type I error (α)
accept false null + reject true null
Decrease in significance level (incr in confidence levels)
Reduces Type I error, but increases chances of Type II error
Reduce both Type I and Type II errors
- Increase sample size
Power of a test
- Probability of correctly rejecting H0 when it is false
- 1-β
Type I error
false discovery rate
BH number adjusted p − value = α*(Rank of i /Number of tests) — compare p -value w BH - reject null if p value less
t-stat > critical value
rej H0
Test the difference between two population means
- State the hypotheses
- Null hypothesis is stated as
H0: μd = 0
- I.e. there is no difference in the populations’ mean daily returns (var unknowns but assumed equal) - Identify the appropriate test statistic and its probability distribution
- t-test statistic and t-distribution - Specify the significance level
- 5% significance level - State the decision rule
- If the test statistic > critical value, reject the null hypothesis
Test of a single variance (if sample var known can test for population var)
chi-squared distributed with n-1 degrees of
freedom
two-tailed because distrib not symmetrical
chi^2_(n-1)=((n-1)s^2)/sigma^2_0
Hypothesis Tests Concerning the Variance
Assumptions (chi square)
- Normally distributed population
- Random sample
- Chi-square test is sensitive to violations of its assumptions
Testing the equality of variances of two variances
- Using sample variances to determine whether the population var are equal
- F-distribution
- Asymmetrical and bounded by zero
- one-tailed
- Calculation of F test statistic
F = s^2/ s^2
≥ 1 as larger sample variance is numerator
> df: n-1 / n-1
Parametric tests
- assumptions about the distribution of the population
- E.g., z-test, t-test, chi-square test, or F-test
Non-parametric tests are used in four situations
- Data does not meet distributional assumptions
-not normally distributed + small sample - OUTliers that affect a parametric statistic (the mean) but not a nonparametric statistic (the median)
- Data is given in ranks
- Characteristics being tested is not a population parameter
Tests concerning a single mean
Parametric:
t-distributed test
z-distributed test
Non-Parametric:
Wilcoxon signed-rank
test
Tests concerning
differences between
means
Parametric:
t-distributed test
Non-Parametric:
Mann-Whitney U test
(Wilcoxon rank sum test)
Tests concerning mean differences (paired
comparison tests)
A paired comparisons test is appropriate to test the mean differences of two samples believed to be dependent.
Parametric:
t-distributed test
Non-Parametric:
Wilcoxon signed-rank test
Sign test
Testing the significance of a correlation coefficient
both variables are distributed normally
parametric test
t tables (two-tailed p/2) and n-2 degrees of freedom:
t= r(n-2)^(1/2) / (1-r^2)^(1/2)
As n increases we are more likely to reject a false NULL:Testing the significance of a correlation coefficient
- Degrees of freedom increases and critical statistic falls
- Numerator increases and test statistic rises
The 3 Rank Correlation Coefficient
nonnormal distrbution
nonparemtric test
1. Rank observations on X from largest to smallest assigning 1 to the largest, 2 to the second, etc. Do the same for Y.
2. Calculate the difference, di, between the ranks for each pair of observations and square answer
=1 - (6*sum(d^2)/n(n^2-1))
sample size is large (n>30) we can conduct a t-test : df: n-2
= r((n-2)^(1/2))/(1-r^2)^(1/2)
Ordinary Least Squares Regression
The estimated intercept, b0, and slope, b1, are such that the sum of the squared vertical distances from the observations to the fitted line is minimized.
covariance
sum((x-xbar)(y-ybar))/(n-1)
slope coefficient
covariance(x,y)/var(x)
intercept
b0bar = Ybar - b1Xbar
Assumptions of the Simple Linear Regression Model
- Linear relationship – might need transformation to make linear
- Independent variable is not random – assume expected values of independent
variable are correct - Variance of error term is same across all observations (homoskedasticity)
- Independence – The observations, pairs of Y’s and X’s, are independent of one another. error terms are uncorrelated (no serial correlation) across observations
- Error terms normally distributed
SST =
SSR + SSE
Total Variation
sum(y-ybar)^2
Sum of the squared differences between the actual value of the dependent variable and the mean value of the dependent variable
Explained Variation
sum(yhat-ybar)^2
Sum of the squared differences between the predicted value of the dependent variable based on the regression line and the mean value of the dependent variable.
Unexplained Variation
sum(y-yhat)^2
Sum of the squared differences between the actual value of the dependent variable and the predicted value of the dependent variable based on the regression line
Coefficient of Determination – R2
> SSE =0 and RSS = TSS - perfect fit
percentage variation in the dependent variable explained by movements in the independent variable
R^2 = RSS / TSS or (1-(SSE/TSS))
r = sign of b1*(R^2)^(1/2)
Regression : DF, SS, MS
k =1 indep var (measures the number of independent var)
sum(yhat-ybar)^2
MSR = SSR / DF
Residual : DF, SS, MS
n-k-1
sum(y-yhat)^2
MSE = SSE/ DF
standard error of the estimate (SEE)
MSE^(1/2)
SSE / n-2
ANOVA
F-distributed Test Statistic
H0: b0=b1=…=0
F-test= (SSR / k) / SSE / (n-k-1) = MSR / MSE
> df = k , df = n-k-1
Hypothesis Test of the Slope Coefficient
- H0: b1 = 0
tcalc = (bhat - b)/SE
SE = (MSE)^(1/2) / (SUM(X-Xbar)^2)^(1/2)
or HO: b <= 0
or H0: b=1
Hypothesis Test of the Intercept
H0: b0 = specified value
tcalc = (bhat0- b0) / SE
df= n-k-1
SE = SEE * (1/N + Xbar^2/sum(x-xbar)^2)^(1/2)
Level of Significance and p-Values
- Smallest level of significance at which the null hypothesis can be rejected
- Smaller the p-value, stronger the evidence against the null hypothesis
- The smaller the p-value, the smaller the chance of making a Type I error (rejecting the null when, in fact, it is true), but increases the chance of making a Type II error (failing to reject the null when, in fact, it is false)
Prediction (confidence) intervals on the dependent variable
Y =Ŷf±tc*sf
s^2f= SEE^2[1+1/N+(Xf-Xbar)^2/(n-1)s^2x]
for y predicted need to plug value into linear equation
Log-lin model
Slope coefficient represents the relative change in the dependent variable for an absolute change in the independent variable
Lin-log model
Slope coefficient gives the absolute change in the dependent variable for a relative change in the independent variable
Log-log (double-log) model
Slope coefficient gives the relative change in the dependent variable for a relative change in the independent variable and is useful for calculating elasticities
hedged portfolio using
long underlying and short calls
to find c0
V0 = hS0 - c0
V1 +/- = hS1+/- -c1+/-
because we are hedged
V1+ = V1-
h (hedge ratio) = (c1+ - c1-) / (S1+ - S1-)
return = V1+ / V0 = V1- / V0 = 1+ R
hS0 - c0 = V1+ / (1+R)
a parameter
refers to any descriptive measure of a population characteristic
normal distribution z-score rejection points
Two-sided
One-sided
10%
1.645
1.28
5%
1.96
1.645
1%
2.58
2.33
quintiles
1st
2nd
3rd
4th
5th
1st = 1/5
2nd = 2/5 etc
e.g. want 3rd quintile
4/5*(n+1) and n 10 then = 8.8
so answer be/teen postion 8 and 9
set numbers into ascending order
and interpolate e.g.
X8 + (8.8 − 8) × (X9 − X8)
When working backward from the nodes on a binomial tree diagram, the analyst is most likely attempting to calculate:
In a tree diagram, a problem is worked backward to formulate an expected value as of today
when the test statistic > critical statistic
reject H0
> the correl coefficient is statistically significant
remember : when two -tailed test the p-value / 2
Ln(1+ discrete return)
= continuous return
null hypothesis must always include the
equal sign
a test of independence using a nonparametric test statistic that is chi-square distributed
χ2=∑^m_i=(Oij−Eij)^2/Eij
> m = the number of cells in the table, which is the number of groups in the first class multiplied by the number of groups in the second class;
Oij = the number of observations in each cell of row i and column j (i.e., observed frequency); and
Eij = the expected number of observations in each cell of row i and column j, assuming independence (i.e., expected frequency).
(r − 1)(c − 1) degrees of freedom, where r is the number of rows and c is the number of columns
Eij=(Total row i)×(Total column j)/ Overall total
Standardized residual=Oij−Eij/ √ Eij
ML model that has been overfitted is not able
to accurately predict outcomes using a different dataset and might be too complex
‘overtrained’
> treating true parameters as if they are noise is most likely a result of underfitting the mode
correlation coeff
(sign of b) sqrt (RSS)
cash return on assets
= (Cash flow from operations/Average total assets)
F-distributed test statistic to test whether the ? in a regression are equal to zero,
slopes
H0: b1 = 0. Ha: b1 ≠ 0
Arithmetic mean x Harmonic mean =
Geometric mean^2
Arithmetic mean
≥ Geometric mean ≥ Harmonic mean