Lec 3 Flashcards
Sample space
Event
Probability
Complement
Sample space: set of all possible outcomes of an experiment
Event: any set of outcomes (from sample space) of interest (Eg: a red card, card of diamonds)
Probability (of an event): the relative freq of the set of outcomes (compromising the event) over an indefinitely large (infinite) # of repetitions of the experiment (trials)
Eg: In the long run, what portion of the time will you expect a diamond? 0.25
Complement: of an event A, the set of outcomes in the sample space that are not in the event A (Aka A’, Ā, Ac)
Union of two events
Union of more than 2 events
Intersection of 2 events
Intersection of more than 2 events
Mutual exclusive events
Union of two events: it means A or B(shown as A U B)
Union of more than 2 events: A1 or A2 or A2 … or Ak
Intersection of 2 events: it means A and B, or the outcomes that belong to both A and B
(shown as A ∩ B, AB, A and B)
Intersection of more than 2 events: A1 and A2 and A3… Ak
Mutual exclusive events: A and B cannot occur at the same time
Exhaustive
Partition
Independence
Independence formula
Exhaustive: at least one of them must occur (Eg when you role a die, one of {1,2,3,4,5,6} must occur)
Partition: how mutually exclusive events are organized in the sample space (image)
Independence: knowing one event happened doesn’t change the probability of the other event
Independence formula: P(AB) = P(A)P(B)
Formula for A U B
Formula for A U B, if A and B are mutually exclusive
Are mutually exclusive events independent?
What happens when we sum up events that are a partition of the sample space?
Formula for complementary
Formula for A U B: P(A U B) = P(A) + P(B) – P(AB)
Formula for A U B, if A and B are mutually exclusive: P(AUB) = P(A) + P(B)
Is mutually exclusive event independent? No
What happens when we sum up events that are a partition of the sample space?
P(S) = sum of P(Ai to Ak) = 1
Formula for complementary: P(A’) = 1 – P(A)
Formulas:
The conditional probability of B given A
The conditional probability of B given A if A and B are independent
The conditional probability of A given B
General formula for P(AB) or 2 ways to express P(AB)
Formula to infer independence
Formulas:
The conditional probability of B given A: P(B|A) = P(AB)/P(A)
The conditional probability of B given A if A and B are independent: P(B|A) = P(A) P(B)/ P(A) = P(B)
The conditional probability of A given B: P(A|B) = P(AB)/P(B)
General formula for P(AB) or 2 ways to express P(AB): P(AB) = P(A) P(B|A) = P(A|B) P(B)
Independence: P(AB) = P(A)xP(B)
Bayes Theorem
How many formulas
Sensitivity
Specificity
High sensitivity
High specificity
SPIN
SNOUT
Bayes’ therorem: helps us determine P(A|B) when we know the probability of P(B|A), P(A), and P(B)
Bayes’ theorem formula: there’s 3
Sensitivity: if you have the disease, what is the prob of +ve test
Specificity: if you do not have the disease, what is the prob of -ve test
high sensitivity: if you have disease, you have high prob of a +ve result
If you have a -ve result on a test with HIGH sensitivity, there’s a high chance you DON’T have the disease (good for ruling out)
Specificity: no disease, high chance of -ve result
So, a +ve result means that you are very likely to have the disease
(NOTE: In Epi, we want to know, given the test result, do we have covid/disease? We can determine this with SPIN and SNOUT)
SPIN: specificity = rule in
SNOUT: Sensitivity = rule out
Random variable
Capital letters (X,Y)
Small letters (x,y)
Discrete random variable
Examples
Continuous random variable
Examples
Random variable: it assigns a real number to a point in the sample space
Capital letters (X,Y) = random variables
Small letters (x,y) = actual values
Discrete random variable: it is countable
Eg: # of heads from 4 coin tosses, # of particles from a radioactive source in 1min, # hospital admissions
Continuous random variable: not countable, usually measured (eg height, weight)
Probability mass function
Formula
Variable type
2 conditions for pmf
E
Probability mass function (pmf)
In a sample space, there is a discrete random variable “X”. The pmf gives the probability of X is EQUAL to a value “x”
(Denoted by small letter f(x)
Formula: f(x) = P(X = x)
Variable type: Only for discrete variables
2 conditions of pmf: pmf (a probability) is a value b/w 0 an 1; summation of all probabilities in the sample space = 1
The cumulative distribution function (CDF)
Formula
Cumulative distribution function: is the probability that a random variables “X” has a value LESS THAN or EQUAL to “x”
Denoted by capital letter F(x)
Formula: F(x) = P (X ≤ x)
Variable type: discrete and cont random v
How to use pmf to find the expected value or mean for random variable X
How to use pmf to find the VARIANCE for random variable X (2 formulas)
Use pmf to find the expected value or mean for random variable X
(Eg: What is the expected value of a role of a fair die µ = E(x) = (1)1/6 + (2)1/6 + (3)1/6 + (4)1/6 + (5)1/6 + (6)1/6 = 3.5
It’s the avg of #s 1 to 6
Use pmf to find the VARIANCE for random variable X (2 formulas)
Factorial
Combination
formula
Factorial: n!
Combination: The number of ways to choose “x” items in a set of n items
(eg how many ways to choose 12 juries in a set or pool of 20 people)
Define binomial distribution
Variable type
Notation and explanation for binomial distribution
Formula
pmf for binomial distribution formula
The pattern on graphs - if p increases, what happens to the distribution
mean of binomial distribution
variance of binomial distribution
Binomial distribution: The probability of “SUCCESS” or “FAILURE” outcomes in an experiment that is repeated for “n” trials
Variables: discrete
Notation: X~ Bin(n,p)
The random variable “X” is in a binomial distribution with “n” experimental trials and “p” probability
pmf for binomial distribution formula: (image)
Graphs: As probability goes up (p = 0.5 -> 0.75), it becomes more skewed to the left (the tail is on the left side)
mean of binomial distribution: E(X) = µ = np
variance of binomial distribution: σ² = np (1 – p)
Define poisson distribution
Variable type
pmf for random variable X formula
mean for poisson distribution
variance for poisson distribution
Poisson distribution: determines the probability of an event happening over a specified period of time
Variables: discrete
pmf for random variable X formula
e = 2.72
mean for poisson distribution: E(X) = µ = λ
variance for poisson distribution: σ² = λ
Probability density function (pdf)
variable type:
pdf formula: what it does
Probability density function (pdf): the area under the curve b/w 2 values (eg a and b) on the horizontal axis is equal to the probability that “X” (the random variable) is b/w those 2 values
Variable type: continuous
pdf formula: integrates and computes the area under the curve
mean for pdf …
variance for pdf …
Explain X ~ N (µ, σ2)
variable type for normal distribution
Notation for Standard normal distribution
Process to transform NORMAL distribution to STANDARD NORMAL distribution
X ~ N (µ, σ2): “X” (random variable) is located in a normal or Gaussian distribution that has a mean µ and variance σ2
variable type: continuous
pdf for normal distribution formula …
Standard normal distribution: X ~ N(0,1)
pdf for standard normal distribution formula …
Process to transform NORMAL distribution to STANDARD NORMAL distribution: (image)
For W, the mean and variance after a Location shift ONLY
For W, the mean and variance after a Scale shift ONLY
For W, the mean and variance after a
Scale AND location shift
Transformations:
Random variable = X
mean = µ
variance = σ²
constants = a, b
For W, the mean and variance after a Location shift ONLY
New variable W = X + b
mean: µ + b
variance: σ²
For W, the mean and variance after a Scale shift ONLY
New variable W = aX
mean: aµ
variance: a²σ²
For W, the mean and variance after a
Scale AND location shift
New variable W = aX + b
mean: aµ + b
variance: a²σ²
Covariance
Variance
Notation for Covariance
If X and Y are independent, then Cov (X,Y) =?
Is the inverse true? Why?
Covariance measures the relationship between X and Y (2 random variable), specifically how much they vary
Variance: measures the spread of a data set around its mean
Notation: Cov(X, Y) = E[(X - µx)(Y - µY)]
Denoted as an expectation (E)
X and Y are random variables with means µx and µY respectively
If X and Y are independent, then Cov (X,Y) = 0
IMPORTANT: Cov (X,Y) = 0 does not imply X and Y are independent
IOW: inverse is not true
Covariance = how the 2 differ
Correlation = how the two are related
Covariance does not tell you correlation
the mean and variance when we sum RANDOM variables
the variance when we sum RANDOM variables that are independent
the mean and variance when we sum NORMAL, BINOMIAL, POISSON variables that are independent
Sum of random variables
Random variable X
mean = µx
variance = σ2 x
Random variable Y
mean = µy
variance = σ2 y
Let W = X + Y
Now, W has
mean = µx +µy
variance = σ2 x + σ2 y + 2Cov(X,Y)
If X and Y are independent
variance of W = σ2 x + σ2 y
Special Case – Normal variable
Given
X ~ N( µx, σ2 x)
Y ~ N(µy, σ2 y)
X and Y are independent
Let W = X + Y
Then W ~ N(µx + µy, σ2 x + σ2 y)
It is also the case the sums of independent binomials or Poissons are also binomial or poisson respectively
IOW: normal + normal = normal
2 types of discrete probability distributions
2 types of continuous prob distributions
probability distribution for discrete and continuous
2 types of discrete probability distributions: binomial or poisson; look at probability mass function (pmf)
2 types of continuous prob distributions: normal/Gaussian distribution, standardized normal distribution; look at probability density function (pdf)
probability distribution for discrete and continuous: cumulative distribution function
- 2 types of sampling: nonprob, and prob
o Nonprobability sampling:
Involve convenience sampling and voluntary sampling
Both are susceptible to selection bias
o Prob sampling: allow us to get a sample that is representative, and the results produce valid inferences
It uses random sampling techniques: - Simple random sampling
- Systematic sampling
- Stratified random sampling
- Cluster sampling
- Simple random sample
o Each subject in the pop has an equal chance of being selected - Systematic sample
o Subjects from pop are selected according to a random starting pt, and then every fixed period interval
o The interval is determined by dividing the size of pop by desired size of sample - Stratified sample
o A simple random sample is taken from a # of distinct strata of the pop - Cluster sampling
o Used when natural gps exist in pop
o Pop is divided into clusters, simple random selection is taken of each cluster
- Eqn for z statistic:
o Assumes the value of pop standard deviation σ is known - When the # of trials n is large and p (prob of success for each trial) is near 0.5
o Binomial distribution is approx. equal to normal distribution - Prob distributions using sample stats are sampling distributions
- Sampling distributions: prob distribution of a stat for all possible samples of a given size from a pop
- Central limit theorem: the distribution of smaple means approx. normal distribution
- T-distribution also resembles normal distribution
o Variability in sampling distribution of t depends on the sample size n - 2 other cont prob distributions: chi-aqr, F distributions
- X