QM PREREQ4 – Common Probability Distributions Flashcards
What is a probability distribution?
Specifies the probabilities associated with the possible outcomes of a random variable
What are the 7 common probability distributions and why are they useful to know?
Uniform, binomial, normal, lognormal, Student’s (named after a person called Student!), chi-square, or F-distribution
Most distributions will look like one of these 7
So when we see a distribution we can say it is an “approximately normal” or “approximately chi square” distribution
This is useful because each of these common distributions has well-defined mathematical properties, which we can then use to analyse and interpret our data
What is a random variable and what are the two forms it can take?
A random variable is a quantity whose future outcoems are uncertain.
It can be either
- Discrete: take on at most a countable number of possible values (possibly infinite)
- Continuous: cannot count the possible values
Every random variable is associated with a probability distribution that describes the variable completely
What is a probability function?
Specifies the probabilities that a random variable can take
For discrete variables we would use p(x)
For continuous variables we would use the probability density function
The probability function has two key properties.
1. 0 =< p(x) =< 1 (any given probability within the data must be between or equal to 0 and/or 1
2. sum p(x) over all values of x equals 1. That is, if you add up all the values beneath the probability function they should add to 1
What is a CDF?
Cumulative distribution function
Gives the probability that a variable X is less than equal to a particular value x
Can be used for percentile rank for example
It is a slope that goes from 0 to 1 (0 to 1 being on the y axis)
What is a discrete uniform distribution?
All outcomes are equally likely
The probability distribution is a rectangle
Thus length x width = 1
It will look like stairs of equal height and width as a CDF
What is a continuous uniform distribution?
The same as a discrete uniform distribution but with a continuous random variable
Also a rectangular probability distribution
An even slope upwards as a cumulative distribution function
What is a Bernoulli random variable?
One based on the outcome of a trial which produces one of two outcomes (binomial outcomes), interpreted as 1 or 0
p(1) = p
p(0) = 1 - p
In n trials, we can have 0 to n successes
If each trial is a random variable, then the number of successes in n trials is also a random variable, known as a binomial random variable
What is a binomial random variable?
The number of successes in n Bernoulli trials
Assumption:
1. p is constant for all trials
2. Trials are independent
A binomial random variable has a distribution completely described by 2 parameters
x ~ B(n, p)
- To find how many successes (x ) are in n trials we can use nCr
Because the order doesn’t matter
-[ When we ask how probable is it to have x successes in n trials we can do:
p^x (1 - p)^(n - x)
- We multiply nCr by this to get the probability distribution function for a binomial random variable
n! / ((n - x)! x!) * p^x (1 - p)^(n - x)
Why when we’re calculating probability are we only interested in the tails?
If we continue counting up past the mid point of the probability distribution we would misinterpret it
Such that we would deduce that achieving the top figure has a 100% chance
We have to count in the direction from the centre toward to tail
How do we calculate mean and variance for Bernoulli and binomial distributions?
For Bernoulli,
mean = p
variance = p (1 - o)
For Binomial,
mean = np
binomial = np (1 - p)
What is the central limit theorem?
The distribution of a large number of independent random variables with finite variance is approximately normal
Let’s say we take a whole bunch of samples of random variables that are not related to each other and find their means
The distribution of these means will be approximately normal
The central limit theorem tells us that because of this result a lot of data tends to be normally distributed
What is a standard normal distribution?
A distribution where we have set the mean to 0 and standard deviation to 1
We may want to standardise our values (if they fall into an approximately normal distribution) and turn it into a standard normal distribution to allow data processing (using things like ML) and cross comparison
Why do we use a normal distribution to model asset returns but not asset prices?
We use a normal distribution to model continuously compounded asset returns
We do not use it to model asset prices because the left tail of a nd goes to negative infinity, whereas asset prices go to 0
Asset returns are approximately normally distributed, so we can use nd to model (“close enough”)
However asset returns tend to be more kurtotic than normal (longer tails), and options add skew (pos/neg)
There is a lot more of this at L3
What are the characteristics of nd?
A normal distribution has these 3 characteristics:
1. Described by 2 parameters, mu and sigma squared (population variance). The formula is X ~ N(mu, sigma squared)
- Skew = 0 and kurtosis = 3 (K sub-c = 0). Therefore median = median = mode
- A linear combination of 2 or more normal random variables is also normally distributed.
So R sub-p = w sub-1 R sub-1 + w sub-2 R sub-2 + w sub-3 R sub-3 …. is also nd, althought it is multivariate. Each of these terms is a univariate random variable