Statistics/Probability Flashcards
sample space
the set of all possible sample points for an experiment, e.g. S={HH,TT,HT,TH} for two times head tails flip
dependent events regarding probability
e.g. picking marbles out a bag
Covariance
- When calculated between two variables, X and Y, it indicates how much the two variables change together.
- Cov(X,Y)=E[(X−EX)(Y−EY)] = E[XY]−(EX)(EY)
P–P plot
probability–probability plot or percent–percent plot or P value plot: probability plot for assessing how closely two data sets agree, or for assessing how closely a dataset fits a particular model.
It works by plotting the two cumulative distribution functions against each other; if they are similar, the data will appear to be nearly a straight line.
For input z the output is the pair of numbers giving what percentage of f and what percentage of g fall at or below z.
Q–Q plot
quantile–quantile plot: for comparing two probability distributions by plotting their quantiles against each other. A point (x, y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate).
PMF (Probability Mass Function)
A probability mass function (PMF) is a mathematical function that calculates the probability that a discrete random variable will be a specific value. It assigns a particular probability to every possible value of the variable.
Table: With each row an outcome + probability
the conditional probability for a cancel given snow.
P(Cancel∣Snow), the ∣ is short for ‘given’
An event happens independently of a condition if
P(event∣condition)=P(event)
Kolmogorov-Smirnov (K-S) Test
non-parametric test that compares the empirical distribution of the data with a theoretical distribution.
It helps determine how well the theoretical distribution fits the data.
K-S Statistic
The K-S statistic measures the maximum distance between the empirical cumulative distribution function (ECDF) of your data and the cumulative distribution function (CDF) of the theoretical distribution.
In simpler terms, it quantifies the biggest difference between what you observed (your data) and what you would expect if the data followed the theoretical distribution.
The K-S statistic ranges from 0 to 1:
A smaller K-S statistic indicates that the empirical distribution is very close to the theoretical distribution.
outcome = model + error –> how are the parts called?
model = systematic part, error = unsystematic part
Descriptive Statistics
collect, organize, display, analyze, etc.
Inference Statistics
- Predict and forecast values of population
parameters - Test hypothesis and draw conclusions about values
of population parameters - Make decisions
Central Tedency
1st moment - mean, median, mode
Spread
2nd moment - MAD, Variance, SD, coefficient of variation (CV = SD/mean), range, IQR
Skweness
3rd moment - measure of asymmetry, positive skew (tail pointing to high values (body of the distribution is to the left), negative skew
Kurtosis
4th moment - Measure of heaviness of the tails, leptokurtic (heavy tails), platykurtic (light tails)
Which kurtosis has a normal distribution?
3 (mesokurtic)
statistical test on prices vs returns:
prices are not predictable, returns are predictable (they are “stationary”)
Standard Error calculation & meaning
- SE = SD / (n^1/2)
- Standard deviation measures the amount of variance or dispersion of the data spread around the mean. The standard error can be thought of as the dispersion of the sample mean estimations around the true population mean
Sample standard deviation
𝑠
Population standard deviation
𝜎 (sigma)
Central Limit Theorem
states that: the distribution of sample mean, 𝑋ത, will approach a Normal distribution as sample size 𝑛 increases (𝑛 ≥ 30)
Sample variance - do you use n or n-1?
n-1
Random variable:
𝑋
Cumulative Density Function of Standard Normal:
Φ (z)
Pivotal distribution
N(0,1)
Population mean - greek letter:
μ (mu)
sample standard deviation
s
Confidence interval
sample mean +/- z-value * (sigma or SE / root(n))
In the sample, you approximate mu and sigma with…
x (sample mean) and sample standard deviation
Population standard deviation
𝜎
Particular observation of a Standard Normal (also
known as ‘z-critical value’)
z
Parameter of 𝒕-distribution (also known as ‘degrees of
freedom’):
𝜐
t-critical value
t
Important: are you given sigma or s?
If n is < 30, but you are given sigma, you can use sigma
t-distribution
- Has thicker tails than Normal (i.e. larger chance of
extreme events). - Its shape depends on a single parameter “nu” 𝜈 =
𝑛 – 1, where n is the number of observations. - Assumption: t-distribution assumes that the data
originates from a Normal Distribution.
3 main types of distribution
Gaussian, Poisson, Chi-square
Statistical stationarity:
A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are all constant over time.
get probability of z-value - function
xpnorm(probability, mean =…, sd = …)
given a particular probability of 𝑍 < 𝑧, what is the corresponding value 𝑧?
qnorm(value z, mean = …, sd = …)
Measures the amount of variability within a single
dataset - calculate SD for population and sample - comparison population vs sample SD calculation
What is variance
the expected value of the squared deviation from the mean of a random variable
Null Hypothesis vs Sample mean
- Null Hypothesis: 𝐻0 belief about true population parameter value –> The null hypothesis will be rejected if the difference between sample means is bigger than would be expected by chance
- Sample mean: 𝐻1 alternative
Significance Level - letter
alpha –> probability of rejecting the null hypothesis when it is true
Critical value / Cutoff point
𝑧 − 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 or 𝑡 − 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 –> ±z/t-value which act as cutoff points beyond which the null hypothesis should be rejected
p-value: 𝑝
probability of obtaining a value of the test statistic as extreme as, or more extreme than, the actual value obtained, when the null hypothesis is true
How to report results for statistical hypothesis testing:
- “we accept the null hypothesis as truth”
- “we cannot reject the null hypothesis”
Hypothesis:
is a statement of assertion about the true value of an unknown population parameter (e.g. 𝜇 = 100)
Null Hypothesis Test - performed using a test statistic, which is the standardised value derived from sample data, e.g. the standardized value of the sample mean
When should you use the z-statistic vs. the t-statistic in hypothesis testing?
Types of Statistical Errors
Conventions in Your Industry regarding alpha
Calculating CI Cutoff Points
Use t-dist when 𝑛 < 30 and 𝜎 is unknown, otherwise
use 𝑧. In practice, we can use 𝑡 for all cases.
R-functions for SD Distribution and t-Distribution
3 equivalent ways of testing hypothesis:
Equivalent Approaches for hypothesis testing
Does correlation reflect nonlinear relationships?
No
True Dependent Variable and Estimated Dependent Variable
True Coefficient and Estimated Coefficient
Residual Error and Residual Standard Error
Number of observations and number of independent variables
Coefficient of Determination / R Squared and Adjusted R Squared
Coefficient 𝒊’s Standard Error
What does regression diagnostics look for?
Testing for “significant” relationships.
assumed true model vs fitted model - how do you write the coefficients down for the regression equations?
Different names for Y and x
Fitted Model: Time-Series With Lagged Variables and Fitted Model: Autoregression - examples how they could look like:
OLS minimizes…
the Sum of Squared Errors (SSE) with respect to regression coefficients 𝛽0, 𝛽1
Residual Standard Error (RSE) - calculation/formula
- square root of the average squared residuals
- where 𝑛 is the number of observations and 𝑝 number of independent variables.
What does 𝑅2 show?
The proportion of total variation of Y that is explained by the model (i.e. by the independent variable(s))
𝑅2 - calculation
Adjusted R2 - calculation/meaning
- If 𝑛 is very large and 𝑝 is very small, the ratio is close to zero, and 𝑅2 ≈ 𝑅2adj
- As the number of inputs (𝑥’s) increases, 𝑅2 typically increases regardless of whether the variables are useful for prediction
- Adjusted 𝑅2 will only increase if the new 𝑥 variable improves the model more than would be expected by chance.
linear regression modell
lm(y ~ x, data = name)
Assumptions for errors in linear regression models:
How is constant, time independent variance in linear regression models called?
homoscedasticity
What can be conclusions of nonnormal residuals?
nonlinearity present, interactions between independent variables, outliers
Possible Reasons For Systematic Errors in linear regression models:
- Nonlinearity: systematic pattern in the residuals
- Heteroscedasticity: variance of errors changes across levels of independent variable
- Autocorrelation: errors in one period are correlated with errors in another period
Common Nonlinear Transformations
- 1/𝑥 Relationship
- square root(𝑥) Relationship
- x^2 Relationship
- Exponential 𝑥^𝑏 Relationship
We could say that the regression line has reduced our uncertainty, as measured by variances, from … to …
- from the unconditional variance of s2y
- to y s to the conditional variance of s2e
- That is, a reduction of s2y - s2e
- The reduction expressed as a fraction is called R-squared or R2
A regression output usually reports the ratios between â and its standard deviation, and between bˆ and its standard deviation … which are referred to as “t-ratios”, i.e. …
If the residual distribution has very “fat tails”, i.e. many more extreme values than you would like to see, it may be appropriate to think of using an alternative estimation technique, such as:
- Least Absolute Value rather than Least Squares
- This approach will weight extreme values less although a different set of diagnostics will then have to be used
Common Nonlinear Transformations
- 𝑥2 Relationship - Example: return from an investment (Y), increases quadratically (exponentially) with an increasing investment (x). This could happen due to aggressive reinvestment and compounding returns.
- square root(𝑥) Relationship - Example: stock volatility (𝑌) increases with a decreasing rate of volume (𝑥), i.e. y = square root(x)
What is an Interaction Term?
- an independent variable in a regression model that is a product of two independent variables
- Sometimes the partial effect of the dependent variable with respect to independent variable can depend on magnitude of yet another independent variable
Hierarchy Principle - Interaction Term
What is Multicollinearity? Effects?
- appears when independent variables used inside the regression equation are highly correlated
- Effects: fit is not improved much, additional variables add little information, may cause two or more variables to become insignificant but significance may be high if one
variable is dropped, Parameter estimates are unreliable
Dummy Variable:
- A variable that takes on a value of 0 or 1
- Example: 1 war year, 0 no war, treated as a reference group (usually the majority of the data in the sample)
- When a categorical variable has k categories (we call them ‘levels), we include only k-1 dummy variables in the regression model.
- The category that is left out is usually the one with the most frequent observations and it acts as a reference.
Distributed lag model
A model for time series data in which a regression equation is used to predict current values of a dependent variable based on both the current values of an independent variable and/or its lagged (past period) values.
Nonparametric statistics
statistical method in which the data are not assumed to come from prescribed models that are determined by a small number of parameters, such as the normal distribution model and the linear regression model
RISK analysis toolbox in Excel - overview
RISK build in correlation
What is linear programming?
Linear programming is a special type of optimization model that sets up constraints as linear equations and solves them simultaneously while optimizing an objective function.
How to calculate portfolio variance for 3 stocks given weights, correlations, and SDs? How does the matrix multiplication look like?
What is the shadow price?
The amount of profit that an additional unit of available resources would yield
What shadow price will a non-binding constraint have?
A shadow price of 0 because there is already an excess of the resource
An inequality constraint is binding if…
…the solution makes it an equality. Otherwise, it is nonbinding.
Optimization model: The positive difference between the two sides of the constraint is called…
the slack
Slack in an optimization model
Surplus amount of a limited resource. Constraints with zero slack are binding. These are the important constraints that influence the optimal solution
Shadow (Dual) Price in an optimization model:
The amount by which the objective value will improve if we relax the constraint by one unit. It is the most we are willing to pay to obtain an additional unit of that resource.
Reduced Cost in an optimization model:
- The deterioration in the objective function if we force one unit of a sub-optimal activity (zero level) into the optimal solution. Alternatively, the amount by which we need to improve the contribution of sub-optimal activities before they could enter the optimal solution on their own merit.
- A reduced cost for a product not in the optimal mix indicates how much greater its margin would have to be before it would enter the optimal mix.