Module 16: Statistical Distributions Flashcards
Binomial distribution description
Bin(n, p)
Models the number of successes in n independent trials where p is the probability of success.
Negative binomial distribution description
NBin(r, p)
- Models the number of trials needed until there have been r successes.
- if r=1, the distribution is known as the geometric distribution
Poisson distribution description
Poi(λ)
- Models the number of independent events occurring in a specified time period
- used as an approximation to the binomial distribution for small p
Normal distribution
- mathematically tractable distribution (easy to parameterise and use), useful when little is known about the data
- used as an approximation to the binomial and Poisson distributions when the sample size is large
- used to model the error terms in a random walk
- symmetrical and mesokurtic
Central Limit Theorem
by the Central Limit Theorem, the distribution of the average, X_bar, of a large sample of iid random variables with finite mean, μ, and finite variance, σ², is Normally distributed.
~ N ( μ , σ²/n )
2 Tests for normality
- QQ plots
- Jarque-Bera test
Generalised student’s t-distribution
- Used to model symmetric data sets where the tails are fatter than implied by a normal distribution (leptokurtic) - important distribution for modelling risks
- Can be derived as a normal mean-variance mixture distribution
Lognormal distribution
- frequently used to model financial data that takes positive values only, eg asset prices, or insurance claim amounts
- positively skewed
Wald (or inverse Gaussian) distribution
- models the time taken for a random walk with drift to reach a particular level
- positively skewed with useful properties in terms of aggregation
Chi-square distribution
- Used for goodness of fit
- represents the sum of v squared independent standard normal random variables
- positively skewed
Exponential Distribution
- Models expected time between observations under a Poisson process
- monotonically decreasing, positively skewed, tail decreases exponentially
- inflexible due to single parameter and unlikely to provide a good fit to data
Gamma Distribution
- extension of exponential distribution
- flexible and has useful properties in terms of aggregation
- if X has a gamma distribution then Y = 1/X has an inverse gamma distribution
Generalised Inverse Gamma distribution
- can produce a wide range of shapes - flexible as has three parameters
- monotonically decreasing, positively skewed inflexible as single parameter
Pareto distribution
- used for modelling variables where the probability of an event falls in proportion to the magnitude of the event raised to a power, eg the distribution of wealth or the population of cities
- the tail of the distribution follows the power law
Generalised Pareto Distribution
Flexible distribution used in extreme value theory
Triangular Distribution
Useful when the following limited data is available:
- the minimum value
- the maximum value
- the mode
Multivariate Distribution
A way of modelling several random variables at once