Quant + ECO Flashcards
numerical data
quantitative
categorical data
qualitative
discrete data
quantitative - countable
Continuous data
quantitative - factional
Nominal data
qualitative - cannot be ordered..
Ordinal data
qualitative - can be ranked
structured data
organised in a defined way
unstructured data
big data / commentary
Winzorised mean
substitute variables for top x% and bottom x%
Harmonic Mean
(N/sum(a/x))
Percntile calculation
=(N +1) (y/100)
Coefficient of Variation (CV)
standard deviation of x / average value of x
Positive skewness
Mean > Median > mode
Negative skewness
Mode > Median > mean
degree if skewness
zero = normal dist, greater than 0.5 considered significant
Leptokurtic
fat tails (more peaked)
Playkurtic
thin tails (less peaked)
Mesokurtic
Normal distribution
Excess kurtosis
= sample kurtosis - 3
Random variable
uncertain quantity/number
Outcome
an observable value of a random variable
Event
single outcome or a set of outcomes
Mutually exclusive events
are events that cannot happen at the same time
Exhaustive events
include all possible outcomes
Empirical probability
analysing past data
Priori probability
formal reasoning
subjective probability
personal judgement
Addition rule
P(A or B) = P(A) + P(B) - P(AB)
Multiplication rule
P(AB) = P(A|B) x P(B)
total probability rule
P(A) =P(A|B) x P(B) + opposite side
Covariance
w2sd2 + w2sd2 + 2ww(cov)
Bayes formula
prior probability of event x (P(new info given event) / P(unconditional probabilityh of new info) )
Types of counting problems
Labelling (order doesn’t matter/nCR function)
permutation (order does matter/nPR function)
Probability distribution
describes the probabilities of all the possible outcomes for a random variable. The probabilities of all possible outcomes must sum to 1.
discrete random variable
is one for which the number of possible outcomes can be counted, and for each possible outcome, there is a measurable and positive probability.
probability function
denoted p(x), specifies the probability that a random variable is equal to a specific value.
continuous random variable
one for which the number of possible outcomes is infinite, even if lower and upper bounds exist.
cumulative distribution function
simply distribution function, defines the probability that a random variable, X, takes on a value equal to or less than a specific value, x.
discrete uniform random variable
one for which the probabilities for all possible outcomes for a discrete random variable are equal (X = {1, 2, 3, 4, 5}, p(x) = 0.2.)
continuous uniform distribution
is defined over a range that spans between some lower limit, a, and some upper limit, b, which serve as the parameters of the distribution. Just have to use a range instead and work out percentages within upper and lower limit
binomial random variable
may be defined as the number of “successes” in a given number of trials, whereby the outcome can be either “success” or “failure.” The probability of success, p, is constant for each trial, and the trials are independent. A binomial random variable for which the number of trials is 1 is called a Bernoulli random variable.
binomial random variable calc
product of desired probabilities x relevant counting problem data
normal distribution
completely described by its mean, μ, and variance, σ2
* Skewness = 0,
* Kurtosis = 3
* A linear combination of normally distributed random variables is also normally distributed
unbounded
univariate distributions
distribution of a single random variable
multivariate distribution
probabilities associated with a group of random variables (correlations = key differnece)
confidence interval
Range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time
confidence intervals to rememebr
- The 90% confidence interval for X is X − 1.65s to X + 1.65s.
- The 95% confidence interval for X is X − 1.96s to X + 1.96s.
- The 99% confidence interval for X is X − 2.58s to X + 2.58s.
standard normal distribution
z = (observation - population mean) / standard deviation
Roy’s safety-first criterion
sub in min level of required return for the population mean in the z equation. Then choose the portfolio that minimises the likelihood of going below the minimum threshold
Log-normal distribution
calculated by the function ex
* The lognormal distribution is skewed to the right.
* The lognormal distribution is bounded from below by zero so that it is useful for modeling asset prices, which never take negative values.
Continuous compounding
EAR = e^Rcc – 1
Student’s t-distribution
- It is symmetrical.
- It is defined by a single parameter, the degrees of freedom (df), where the degrees of freedom are equal to the number of sample observations minus 1, n – 1, for sample means.
- It has more probability in the tails (“fatter tails”) than the normal distribution.
- As the degrees of freedom (the sample size) gets larger, the shape of the t-distribution more closely approaches a standard normal distribution.
Chi-squared distribution
- Distribution of the sum of squared values of n
- Bounded from below by zero
- Asymmetric
- Degrees of freedom = n – 1
- As df increase, approaches normal distribution
F-distribution
- Quotient of two chi-square distributions with M and n degrees of freedom
- Bounded from below by zero
- Asymmetric
- As df increase, approaches normal distribution
Monte Carlo simulation
technique based on the repeated generation of one or more risk factors that affect security values (only as good as inputs)
Monte Carlo simulation method
- Specify distributions of random variables, such as interest rates and underlying stock prices
- Use computer random generation of variables
- Price the derivatives using those values
- Repeat steps 2 and 3 thousands of times
- Calculate mean/variance of distribution
Monte Carlo simulation uses
- Value complex securities.
- Simulate the profits/losses from a trading strategy.
- Calculate estimates of value at risk (VaR) to determine the riskiness of a portfolio of assets and liabilities.
- Simulate pension fund assets and liabilities over time to examine the variability of the difference between the two.
- Value portfolios of assets that have nonnormal returns distributions.
Probability sampling
sampling when we know the probability in the population of each sample member
(Simple) random sampling
every population member has an equal probability of being selected
Non-probability sampling
use judgement of researcher or low-cost/readily available data, to select sample items
Sampling error
difference between a sample statistic and true population parameters ( x – u )
Non-probability sampling may lead to greater sampling error than probability sampling
Stratified random sampling
- Create subgroups from population based on important characteristics
- Select samples from each subgroup in proportion to size of the subgroup
Cluster Sampling
Create subsets (clusters), each of which is representative of an overall population (e.g., personal incomes of residents by county)
One-stage cluster sampling
take random sample of clusters and include all data from those clusters
Two-stage cluster sampling
select clusters and take random samples from each
Non-probability sampling methods
- Convenience sampling – use readily available data
- Judgemental sampling – select observations from population based on analyst’s judgement
central limit theorem
For any population with mean u and variance o2, as the size of a random sample gets large, the distribution of sample means approaches a normal distribution with mean u and variance o2/n.
This allows for inferences about and confidence intervals for population means, based on sample means. Sample needs to be greater than 30 as a general rule.
standard error of the sample mean
standard deviation of population / square root of the sample size
Unbiased
expected value equal to parameters
Efficient
sampling distribution has smallest variance of all unbiased estimators
Consistent
larger sample -> better estimator
dist = normal, variance = known, sample = small
Z stat
dist = normal, variance = unknown, sample = small
t stat
dist = nonnormal, variance = known, sample = small
NA
dist = nonnormal, variance = unknown, sample = small
NA
dist = normal, variance = known, sample = large
Z stat
dist = normal, variance = unknown, sample = large
t stat
dist = nonnormal, variance = known, sample = large
Z stat
dist = nonnormal, variance = unknown, sample = large
t stat
Jackknife
calculate multiple sample means, each with one observation removed
Bootstrap
take many samples of size n, calculate their sample means and calculate standard deviations of these means
Sample selection bias
sample noy really random
Survivorship bias
sampling only surviving firms, mutual funds, hedge funds
Look-ahead bias
using information not available at the time to construct sample
Time-period bias
relationship exists only during the time period of sample data
Hypothesis test method
- State the hypothesis – relation to be tested
- Select a test statistic
- Specify the level of significance
- State the decision rule for the hypothesis
- Collect the sample and calculate statistics
- Make a decision about the hypothesis
- Make a decision based on the test results
Type I error
Rejecting true null hypothesis (H0) significance level is probability of type I error
Type II error
failing to reject false null hypothesis (H0)
power of a test
is 1 – probability of type II
standard error
sample standard deviation / square root of observations
p-value
smallest level of significance at which the null can be rejected