Probability Basics Flashcards
What is Independent
each event is not affected by other events
Probability of an event happening =
Number of ways it can happen
/ Total number of outcomes
What is Dependent
also called “Conditional”, where an event is affected by other events
A and B = A and (A l B)
“Probability of event A and event B equals
the probability of event A times the probability of event B given event A”
What is Mutually Exclusive
events can’t happen at the same time
Permutations with Repetition
n^r
where n is the number of things to choose from,
and we choose r of them,
repetition is allowed,
and order matters.
Combination
When the order doesn’t matter
Permutation
When the order does matter
Permutations without Repetition
n!
/ (n − r)!
where n is the number of things to choose from,
and we choose r of them,
no repetitions,
order matters.
Combinations without Repetition
n!/(r!(n-r)!)
It is often called "n choose r" where n is the number of things to choose from, and we choose r of them, no repetition, order doesn't matter.
Combinations with Repetition
(r+n-1)!/ (r!(n-1)!)
where n is the number of things to choose from,
and we choose r of them
repetition allowed,
order doesn’t matter.
This is the same as a combination without repetition where n = r + n - 1
Standard Deviation definition (not formula!!)
The formula is easy: it is the square root of the Variance. So now you ask, “What is the Variance?”
“What is the Variance?”
The average of the squared differences from the Mean.
Var(X) = Σx^2p − μ^2
To calculate the Variance:
square each value and multiply by its probability
sum them up and we get Σx^2p
then subtract the square of the Expected Value μ^2
μ = expected value = Σxp
Formula for The “Population Standard Deviation”:
and
The “Sample Standard Deviation”:
Population Standard Deviation
square root of [ (1/N) Σ of (x - mu)^2 ]
Sample Standard Deviation”
square root of [ (1/(N-1)) Σ of (x - x^bar)^2 ]
Weighted mean standard deviation
Square root of [Σ(x^2 (p) − μ^2)]
μ = expected value = Σxp
What is The “Bell Curve” or a Normal Distribution.
The Normal Distribution has: 1. mean = median = mode 2. symmetry about the center 3. 50% of values less than the mean and 50% greater than the mean
4. Standard deviations: 68% of values are within 1 standard deviation of the mean 95% of values are within 2 standard deviations of the mean 99.7% of values are within 3 standard deviations of the mean
“Standard Score”, “sigma” or “z-score”.
The number of standard deviations from the mean
z = (x − μ) / σ
z is the “z-score” (Standard Score)
x is the value to be standardized
μ (‘mu”) is the mean
σ (“sigma”) is the standard deviation
Correlation
When two sets of data are strongly linked together we say they have a High Correlation.
Correlation can have a value:
1 is a perfect positive correlation
0 is no correlation (the values don’t seem linked at all)
-1 is a perfect negative correlation
“Correlation Is Not Causation” - 4 reasons why
What it really means is that a correlation does not prove one thing causes the other:
One thing might cause the other??
The other might cause the first to happen -simultaneous reverse dependence
They may be linked by a different thing -hidden 3rd variable
Or it could be random chance! -spurious
Pearson’s Correlation formula
Step 1: Find the mean of x, and the mean of y
Step 2: Subtract the mean of x from every x value (call them “a”), and subtract the mean of y from every y value (call them “b”)
Step 3: Calculate: ab, a2 and b2 for every value
Step 4: Sum up ab, sum up a2 and sum up b2
Step 5: Divide the sum of ab by the square root of [(sum of a2) × (sum of b2)]
r = ( n Σ xy - Σx Σy ) / (square root( nΣx^2) - square root(Σx)^2) * (n Σy^2 - (Σy)^2) )
Bayes Theorem
“AB AB AB” then remember to group it like: “AB = A * BA / B”
P(A|B) = P(A) * P(B|A) / P(B)
Which tells us: how often A happens given that B happens, written P(A|B),
When we know: how often B happens given that A happens, written P(B|A)
and how likely A is on its own, written P(A)
and how likely B is on its own, written P(B)
Bayes Theorem applied to false positives and false negatives
P(A|B) =
P(A)P(B|A) /
( P(A)P(B|A) + P(not A)P(B|not A) )
Tp / (tp + fp)
Birthday statistics formula
chance of n= people having the same r= random choices
R! / (R^n (r-n)!)
Then 1 - The result of this equation
The closer that n comes to r, the closer that the probability of N choose R will have a match comes to 100%
Confidence intervals (integrations of total) for a standard distribution
Conf Interval Z 68% 1.0 80% 1.282 85% 1.440 90% 1.645 95% 1.960 99% 2.576 99.5% 2.807 99.9% 3.291
How to calculate our Confidence interval (AKA Margin of Error) if we know our chosen z value.
use that Z value in this formula for the Confidence Interval
X ± Z * (s/√n)
Where:
• X is the mean
• Z is the chosen Z-value from the table in previous card
• s is the standard deviation
• n is the number of observations
What is the formula for the confidence interval
The Confidence Interval is based on Mean and Standard Deviation. Its formula is:
X ± Z s√n
Where:
X is the mean
Z is the Z-value from the table below
s is the standard deviation
n is the number of observations
formula for Chi-Square:
Χ^2 = Σ { (O − E)^2 / E }
Σ means to sum up (see Sigma Notation)
O = each Observed (actual) value
E = each Expected value
So we calculate (O−E)2E for each pair of observed and expected values then sum them all up.
What is the Chi square test
The chi square test is to establish ‘p’ or the probability of independence.
p<0.05 is considered dependent!
p > 0.05 is considered independent!
Important points before we get started:
• This test only works for discreet categorical data (data in categories), such as Gender {Men, Women} or color {Red, Yellow, Green, Blue} etc, but not numericaldata such as height or weight.
• The numbers must be large enough. Each entry must be 5 or more. In our example we have values such as 209, 282, etc, so we are good to go.
Formula for linear regression or line of Best fit
Calculate Slope m:
m = (N * Σ(xy) − Σx Σy) / (N * Σ(x^2) − (Σx)^2)
(N is the number of points.)
Calculate Intercept b:
b = (Σy − m * Σx) / N
Assemble the equation of a line
y = mx + b
How does linear regression work?
It works by making the total of the square of the errors as small as possible (that is why it is called “least squares”):
The straight line minimizes the sum of squared errors
So, when we square each of those errors and add them all up, the total is as small as possible.
You can imagine (but not accurately) each data point connected to a straight bar by springs:
Be careful! Least squares is sensitive to outliers. A strange value will pull the line towards it.
Grouped Frequency Distribution definition
It is also possible to group the values of collected data. they are grouped in Xs
It is very useful when the scores have many different values. Or when we want to go from continuous to discreet data
How to determine Group Size or class intervals for Grouped Frequencies
Now calculate an approximate group size, by dividing the range by how many groups you would like.
Then round that group size up to some simple value (like 2 instead of 1.83 or 5 instead of 4.26).
Estimating the Mean from Grouped Data
Continuous to Discreet
We can estimate the Mean by using the midpoints. EG: The groups (51-55, 56-60, etc), also called class intervals, are of width 5. The midpoints are in the middle of each class: 53, 58, 63 and 68
Class Interval A / 2 = Midpoint A
&a-z (Mid a * N of Mid a) / N
Estimated Mean = Sum of (Midpoint × Frequency) / Sum of Frequency
Add all Midpoints times N of data points within that midpoint
Then divide that sum by N or All data points
Grouped Data Frequency median
Median is value at n/2.
Estimated Median = L + [(n/2 − B) / G ] × w
where:
L is the lower class boundary of the group containing the median (This is tricky, it starts where the values would round up to the lowest possible number in that class interval)
n is the total number of values
B is the cumulative frequency of the groups before the median group
G is the frequency of the median group
w is the group width or class interval
Estimating the Mode from Grouped Data
We can easily find the modal group (the group with the highest frequency),
Estimated Mode = L + (fm − fm-1) / (fm − fm-1 + fm − fm+1) × w
L + (B-A) / ( B-A + B-C) * W
where:
L is the lower class boundary of the modal group (This is tricky, it starts where the values would round up to the lowest possible number in that class interval)
A (or fm-1) is the frequency of the group before the modal group
B (fm) is the frequency of the modal group
C (fm+1) is the frequency of the group after the modal group
w is the group width
What is a Random Variable
It is a value we give to observed events that we are uncertain of beforehand
- A Random Variable is a set of possible values from a random experiment.
- The set of possible values is called the Sample Space.
- A Random Variable is given a capital letter, such as X or Z.
- Random Variables can be discrete or continuous.
EG:
We have an experiment (such as tossing a coin)
We give values to each event
The set of values is a Random Variable
Probability is defined as
How likely a random variable is to occur
Probability of an event happening = Number of ways it can happen / Total number of outcomes
P(X = x) = x / n P(X = value) = probability of that value
(Notice the different uses of X and x:
X is the Random Variable “The sum of the scores on the two dice”.
x is a value that X can take.)
in Probability, An Experiment is a
Experiment: a repeatable procedure with a set of possible results.
in Probability, An Outcome is a
Outcome: A possible result of an experiment.
In probability, A Sample Space is a
And What are Sample Points?
Sample Space: all the possible outcomes of an experiment.
The Sample Space is made up of Sample Points:
Sample Point: just one of the possible outcomes
In Probability, an Event is
Event: one or more outcomes of an experiment
In probability
Analog or Continuous is
Random Variables can be either Discrete or Continuous:
Continuous Data can take any value within a range (such as a person’s height)
Digital or Discreet in Probability means
Random Variables can be either Discrete or Continuous:
Discrete Data can only take certain values (such as 1,2,3,4,5)
Mean or Expected Value: μ
When we know the probability p of every value x we can calculate the Expected Value (Mean) of X:
μ = Σxp
Note: Σ is Sigma Notation, and means to sum up.
To calculate the Expected Value:
- multiply each value by its probability
- sum them up
If all chances are random or equal then
1/n Σx = Σxp
Variance: Var(X)
The Variance is:
Var(X) = Σx^2p − μ^2
To calculate the Variance:
square each value and multiply by its probability
sum them up and we get Σx^2p
then subtract the square of the Expected Value μ^2
Standard Deviation: σ
Standard Deviation is the square root of the Variance:
σ = √Var(X)
Var (X) = Σx^2p − μ^2
Biased choices or weighted combinations without repetition
N choose weighted r
But what if the coins are biased (land more on one side than another) or choices are not 50/50.
= p^k (1-p)^(n-k)
Where
p is the probability of each choice we want
k is the the number of choices we want
n is the total number of choices
p = 0.7 (chance of chicken)
k = 2 (chicken choices)
n = 3 (total choices)
So we get:
= p^k (1-p)^(n-k) = 0.7^2 (1-0.7)^(3-2)
The General Binomial Probability Formula
Now we know how to calculate how many:
n!/(k!(n-k)!)
And the probability of each:
p^k (1-p)^(n-k)
When multiplied together we get:
Probability of k out of n ways:
P(k out of n) = n!/(k!(n-k)!) * p^k (1-p)^(n-k)
Important Notes:
The trials are independent,
There are only two possible outcomes at each trial,
The probability of “success” at each trial is constant.