Probability Flashcards
Probability density plot
Histogram in terms of relative frequency
Basic properties of probability
Probabilities are always bound on 0
Graphical displays of probabilities
Venn diagram: Classical visual interpretation of probabilities
Probability/Decision trees
Conditional probability
Probability is affected by another condition
P(A|B) probability of A occurring, given B has occurred
P(A|B) = P(A∩B) / P(B)
Random variable
Variable whose values are generated according to some probability function
Types of random variables
Qualitative: Take distinct forms that are non-numeric (like a nominal variable)
Quantitative: Take distinct values that are numeric
Discrete quantitative: Take values that can only assume a countable number of values (integers); ex: # children in household
Continuous quantitative: Takes values that assume both countable and uncountable (non-integer) values; ex: 185.42 cm tall
Discrete probability functions
Probability distribution for that variable shows probability of getting each of the distinct levels of that variable
Binomial distribution
Used to generate discrete random variables that consist of:
-n random trials
-each of n trials can have 1 of 2 possible outcomes (like flipping a coin)
-each trial is independent of one another
Variable is probability of success
nπ(1-π)
where π is the probability of success
Poisson distribution
Commonly used to model counts of events that occur in discrete periods of time or space. Ex: # of children a couple has between 1990 & 2000.
Assumptions of Poisson distribution:
-Events occur 1 at a time (& not at exact same time/place/to the same person)
-Each occurrence at a given time or place is independent
-Expected # of events at any time or place is the same at all times/places
Probability function:
P(y) = (μ^y * e^-μ) / y!
where e is base of natural logarithm ~2.71828
μ is average value of y
Distribution functions for discrete data
f(x) = Pr(X=x), probability density function for x F(x) = Pr(X
Continuous probability distributions
Unlike discrete, continuous distributions are defined for (theoretically) an infinite variety of values along the number line.
Probability of observing an occurrence of X at a particular value of x is 0, so we talk about probabilities for continuous variables in terms of intervals
Pr(X = x) = 0
If X is a continuous random variable, and a & b are real constants, then we can calculate the probabilities:
P(Xa), or P(X<b> 0 for -∞ </b>
Empirical cumulative distribution function
Cumulative distribution function for real data.
Fn(x) = (#Obs (xi
Gaussian, or Normal, distribution
Most frequently used distribution.
Probability density:
f(y) = (1/(SQRT(2πσ))*e^((-(y-μ)^2)/2σ^2)
where μ is the mean and σ is the standard deviation
Normal distribution has mean=0 and variance=1.
z-score
Number of standard deviations away from the mean a given value of y.
z = (y-μ)/σ
Normal Quantile-Quantile (Q-Q) plot
Normal quantiles represented by straight line & data area the points. A subjective visual test to examine the fit of the line to the data. If the data fit, then “normally distributed”.