Distributions Flashcards
How are Y and y defined?
Y –> Actual outcome of an event
y –> One of the possible outcomes
Ways of writing the likelihood of a particular outcome y:
P (Y = y)
p (y)
What is p(y) called?
Since p(y) expresses the probability of each distinct outcome, we call this:
The probability function
With what do we define distributions?
2 characteristics:
Mean –> Average value –> μ
Variance –> How spread out the data is –> σ^2
How are population and sample data defined?
Population data –> All the data
Sample data –> Just a part of it
How are sample mean and variance denoted?
Sample mean symbol: x̄
Sample variance: s^2 (square)
How is the standard deviation defined/denoted
Standard deviation –> Square root of variance:
√(σ^2)
Formule standaarddeviatie
Standaarddeviatie:
Sx = σ = de standaarddeviatie van getallenreeks x.
Xi = de waarde van getal i in de getallenreeks.
μ = het gemiddelde van de getallenreeks (som getallen / aantal)
Nx = het aantal getallen in de proef.
Formule Standaarddeviatie
σ = Sx = √( ∑ ( (xi - μ)2 / nx) )
Notation for distributions:
Variable name
Tilde sign
Type –> Capital letter
Characteristics (μ, σ^2)
X ~ N (μ, σ^2)
Discrete distribution when all outcomes are equally likely?
Equiprobable –> Uniform distribution
Discrete distribution with only two possible outcomes?
Follow a Bernoulli distribution
Single trial
Discrete distribution when carrying out a similar experiment several times in a row
Binomial Distribution
Two outcomes per iteration
Many iterations –> Multiple trials
Discrete distribution when calculating chance of succes after given an average probability?
Poisson distribution
How unusual is an event frequency for a given interval
Example: There’s 35 points per game. How big is the chance of 12 points in the first quarter of the next game?
Characteristics of normal distribution?
Often observed in nature
Margin values are called outliers
When is Student’s T distribition used?
A small sample approximation of a Normal distribution
It accommodates extreme values much better
Curve has fatter tails than normal distribution
When is Chi-Squared distribition used?
A-symmetric continuous distribution
Only consists of non-negative values
Starts from 0
Often used in hypothesis testing
When is exponential distribution used?
Present with events that are rapidly changing early on
Example is how online news articles get hits –> Typically when topic is still fresh and then it dies off.
When is logistic distribution used?
Logistic distribution
Useful in forecast analysis
Useful for determining a cut-off point for a succesful outcome
Discrete distributions characteristics
FINETELY many distinct outcomes
Every unique outcome has a probability assigned to it
Example: Darts board –> The possible outcomes are from 0 to 60, thus finite
How do continuous distributions differ from discrete?
Infinitely many consecutive outcomes
Therefore cannot record frequency of each distinct value
Cannot respresent them with a table but with a graph
f(y) >= 0
Characteristics of the PDF Graph?
The function is called PDF (Probability Density Function)
Curve is called “Probability distribution curce (PDC)
It is like a discrete distribution but with infinite amount of samples
It gives the probability on y-axis for every possible value ‘y’ on x-axis
Likelihood of each individual outcome in continuous distribution?
Likelihood of each individual one is infinitely small
favourable / sample space = 1 / infinite amount
Thus the probability for any individual value is equal to 0 –> P(X) = 0
Also: P(x > X) = P(x >=X)
Example of this: Finishing a run in exactly 6 minutes is extremely unlikely therefore we say the time until 6 minutes is the same as the time including the six minute exact moment.
What is the CDF? How is it build up?
When you integrate the PDF, you would get the CDF
When you derive the CDF you would get the PDF
Therefore the CDF runs from 0 to 1
What are the characteristics of the normal distribution?
Bell shaped
μ is the mean
Symmetric
Expected value E(X) = μ
How would you transform a distribution?
Plus/Minus –> Moves the graphs to the right/left
Multiply/Divide –> It will shrink/expand
What is standardizing?
Special kind of transformation where E(x) = 0
Var(X) = 1
The distribution we get after standardizing any normal distribition is called A ‘Standard normal distribution’
What is the standard normal distribution table?
There’s a CDF table with all the standardized values for this graph
Also called Z- Score table
What is the process of standardizing using transformation?
Moving the vertical centerline to y-axis by adding or subtracting a constant value –> -μ
We need to make sure the standard deviation is 1 –> Divide every element by the value of the standard deviation –> / σ
How to calculate the Z-score (based on transformation)
Z = (Y - μ) / σ
Define student’s T distribution
t (k) –> Where k is degrees of freedom
Y ~ t(3) –> Variable Y follows t distribution with 3 degrees of freedom
When do you use student’s T distribution and what are key difference with normal disribution
You use this when there’s not sufficient data for the normal distribution
Small sample size approximation of a Normal Distribution
Another key difference is that apart from mean and variance you must also define degrees of freedom for the distribution
Graph is also bell shaped but with larger tails to accomodate occurence of values for away from the mean
Difference statistics vs characteristics
Statistics:
Sample
60% of 1000 people have brown eyes –> Statistic
Characteristics:
Population
If 65% of people worldwide have brown eyes then that is a characteristic of the human population
How can you determine the distribution type and what can you do with it?
Shape of the curve
Characteristics mu and sigma
You can create models (like regression models)
How do statistics relate to data science
An expansion of probability, statistics and programming that implements computational technology to solve more advanced questions
How does ML fits in the statistics world?
ML relies on expected values a lot.
ML is trial and error where a computer adjusts its expected value along the way
There is always a probability of failure due to unforeseen events (earthquakes etc)