Stats Exam 2 Flashcards
probability (chance)
The likelihood that something will occur. Probability is a mathematical description of randomness and uncertainty. It is a way to measure or quantify uncertainty. The probability of an event ranges from 0 to 1 (0 ≤ P(A) ≤ 1). Probability can be expressed in decimals or percentages.
2 approaches: classical and relative frequencies
classical (theoretical) approach
predictable events
(rolling dice)
equallly likely, predictable outcomes
P= # of ways to succeed / # of possible outcomes
relative frequencies (empirical)
outcomes NOT inherently predictable 1) run a bunch of trials 2) count the number of successes P= successful trials/ all trials empirically derived information
P=0
no chance of occurring
P=1
absolute certainty
event
a specific outcome from a trial. defined by a scenario or question
simple events
events that cannot be broken down further
sample space
the collection of all possible outcomes
equal liklihood rule
when all outcomes are equally likely, the probability of event A is the number of ways A can happen divided by the number of outcomes in the sample space
Non-disjointed
a double negative term that just means events CAN happen together
Addition Rule for non-disjointed events
P(A or B) = P(A) + P(B) - P(A and B)
disjointed
events that CAN NOT co-occur
OR vs AND
OR= ADD AND= MULTIPLY
Addition Rule for Disjointed independent outcomes
P(A or B) = P(A) + P(B)
Multiply independent probabilites when..
- two or more conditions must exist
- 2 or more outcomes must occur together or sequentially
independence & non-disjointed
complementary probabilites
two mutually exclusive outcomes with a combined probability of 1
P(A) + P(not A)=1
P(at least one)
Use complementary probabilities 1) the complement of at least one is non P(at least one) + P(none) = 1 2) P(none) is calculated in one step 3. Subtract P(none) from 1 P(at least one) = 1-P(none)
conditional probability
the probability of an event or condition given that another influential event or condition already occurred
conditional probability notation
P(B|A)= probability of B given that A has occurred or among subjects characterized by A
Benefits of 2-way table in probability
1) easy to set up
2) clearly display the sample space, event, & simple events
3) conditional probabilities can be found with fewer calculations & no formulas
important notes regarding conditional probabilities
1) P(A|B) is the inverse of P(B|A).
2) P(A|B) is NOT the compliment of P(B|A)
3) Complimentary probabilities have the same sample space: P(A|B) and P(not A|B)
4 tests to identify conditional probabilites
All are false or all are true. If ANY are false, the probabilities are conditional. 1) P(B | A) = P(B) 2) P(A | B) = P(A) 3) P(B | A) = P(B | not A) 4) P(A and B) = P(A) * P(B) only need one test
General Addition Rules for conditional probabilities
P(A or B) = P(A) + P(B) – P(A and B)
General multiplication rule for conditional probabilities
P(A and B) = P(A)*P(B|A)
conditional probabilities formula
P(B|A) = P(A and B)/ P(A)
Law of total probability
P(B) = P(A) * P(B | A) + P(not A) * P(B | not A)
Bayes Theorem
1) A method for mathematically relating inverse conditional probabilities
2) Applies to sequential events that are not independent (are conditional) and assumes that one event has already happened.
relates P(A|B) to P(B|A)
Sensitivity
Test accuracy when a condition is present. P(S|D)
Specificity
– Test accuracy when a condition is absent.
P(not S|not D)
False Positive
+ test, but no disease/condition
P(S|not D)
False Negative
- test, but disease/condition exists
P(not S|D)
Positive predictive value (PPV)
The probability a disease/condition is present, given a positive test.
P(D|S)
Negative Predictive Value (NPV)
The probability a disease/condition is absent, given a negative test.
P(not D|not S)
random variable
A variable that acquires unique numerical values determined by random trials.
Random trials have outcomes that are determined by chance and influenced by probability.
2 types of random variables
discrete & continous
discrete random variables
- countable and determined by chance (“X”)
- quantified with whole numbers
continuous random varaible
- Measurable and determined by chance
- Measured on interval or ratio scales
P(X=x)
X= big X, a discrete random variable x= little x, specific values of X
binomial variables
a variable with only two possible outcomes:
cancer/no cancer prognosis
4 Criteria for binomial variables
1) A fixed number of trials exists (n)
2) Each trial is independent
3) The same two possible outcomes per trial
“success” (the outcome of interest)
“failure” (the complimentary outcome)
4) The probability of success (p), or failure (1-p) is the same for every trial.
The language of binomial variables
“X is Binomial with n = … and p = …”
density curve/ probability density function
-Total area = 1
-Continuous data (Interval or Ratio)
-Probability of value ranges = Area
Area under the curve correlates with probability
Understanding the likelihood of events boils down to:
1) knowing how many standard deviations away from the mean the events are.
2) assigning the events to tail areas.
Z score
= how many standard deviations a value is from the population mean
A measure of how many standard deviations from the mean a given value or sample mean or sample proportion is. Z scores are used only with normal distributions. A Z score separates an area under the normal curve and left of the Z score, which is the probability associated with the range of values less then the one used in the Z score. The complement to this area, is the area right of the Z score and is the probability associated with the range of values greater than the one used in the Z score.
Likelihood =
Area
Rare Event Rule
If under a given assumption, the probability of a particular observed event is very small (
Normal as Approximation to Binomial
1) The binomial is often close enough to a normal curve, such that we can use Z-scores.
Specifically, when:
2) Continuity correction: discrete values made “continuous-like”.
3) Calculate Z scores using mean and std dev of binomial.
Law of Large numbers
The actual (or true) probability of an event (A) is estimated by the relative frequency with which the event occurs in a long series of trials. As the number of trials increases, the relative frequency becomes the actual probability. Thus, as the number of trials increases, the empirical probability gets closer and closer to the theoretical probability.
Random experiment
An experiment that produces an outcome that cannot be predicted in advance (hence the uncertainty).
venn diagram
A visual display for independent events, showing complimentary, disjoint events and non-disjoint events
tests for independence
If any of the 4 tests below are true, the variables are independent. They will either all be true, or they will all be false. When the tests are false, the variables A and B are dependent on each other, and their probabilities are conditional.
1) P(B | A) = P(B)
2) P(A | B) = P(A)
3) P(B | A) = P(B | not A)
4) P(A and B) = P(A) * P(B)
probability tree
A diagram for showing probabilities for events that occur in stages and that involve conditional probabilities.
probability distribution
A distribution of all possible values and probabilities of a random variable. It represents a population of values, not a sample. Probabilities range between 0 and 1 and must sum to 1.
probability histogram
A histogram, with discrete events on the X-axis and probability on the Y-axis. Each rectangle has a width of 1 and the area of all rectangles sum to 1.
mean of a random variable
Average of events weighted by their probability of occurring
variance
Squared standard deviation (σ2).
standard deviation of a random variable
The typical (or long-run average) distance between the mean of the random variable and the values it takes.
probability density curve
A smooth curve showing the relationship between a continuous random variable and probability. The area under the curve = 1. The curve can be subdivided into ranges of values with a probability equal to their area, but the probability of a single value can not be precisely calculated.
Normal curve
The distribution of a continuous random that has a single peak and is symmetrical about the center
normal table
a table of z scores and probabilities. The table indicates the probability of a normal variable taking on any value less than the standardized z score provided.
unstandardizing a z score
When given a probability and asked to find the value associated with this probability, we find the z score corresponding to the probability, and solve the z score formula for x.