Chapter 4 - Discrete Distributions Flashcards
Random variable
- random variable = function that assigns numeric values to different events in a sample space
- two types of random variables = discrete, continuous
Discrete random variable
a random variable for which there exists a discrete (finite) set of numeric values
Continuous random variable
a random variable whose possible values cannot be enumerated (infinite)
Probability-mass function
- the values taken by a discrete random variable and its associated probabilities can be expressed by a rule or relationship called a probability-mass function
- assigns to any possible value r a discrete random variable X, the probability P(X = r)
- this assignment is made for all values r that have positive probability
Expression of a probability-mass function
- a pmf can be displayed in a tabular form, or it can be expressed as a mathematical formula giving the probabilities of all possible values
- the probability of any particular value must be between 0 and 1, and the sum of the probabilities of all values must be exactly equal to 1
Frequency distributions
- a list of each value in the data set and a corresponding count of how frequently the value occurs
- the frequency distribution can be considered as a sample analog to a probability distribution
- frequency distribution gives the actual proportion of points in a sample that correspond to specific values
Goodness-of-fit
the appropriateness of a model can be assessed by comparing the observed sample-frequency distribution with the probability distribution
Expected value of a discrete random variable
- if a random variable has a large number of values with positive probability, then the pmf is not a useful summary
- measures of location and spread can be developed for a random variable in the same way as for samples
- expected value is also called population mean
Variance of a discrete random variable
- the analog of the sample variance for a random variable
- also called population variance
- the variance represents the spread, relative to the expected value, of all values that have positive probability
- approximately 95% of the probability mass falls within two standard deviations of the mean of a random variable
Cumulative-distribution function
- for a discrete random variable, the cdf looks like a series of steps, called the step function
- with the increase in number of values, the cdf approaches that of a smooth curve
Permutations
- in a matched-pair design, each sample/case is matched with a normal control of the same sex and age
- once the first control is chosen, the second control can be chosen in (n-1) ways
Combinations
- in an unmatched study design, cases and controls are selected in no particular order
- thus, the method of selecting n things taken k at a time without respect to order is referred to as the number of combinations
Binomial distribution
- a sample of n independent trials, each of which can have only two possible outcomes
- the probability of a success at each trial is assumed to be some constant p
- the probability at each trial is 1-p=q
- number of trials n is finite, and the number of events can be no larger than n
Calculating binomial probabilities
- for sufficiently large n, the normal distribution can be used to approximate the binomial distribution and tables of the normal distribution can be used to evaluate binomial probabilities
- if the sample size is not large enough to use normal approximation, then an electronic table can be used
Expected value and variance of the binomial distribution
- the expected number of successes in n trials is the probability of success in one trial multiplied by p, which equals np
- for a given number of trials n, the binomial distribution has the highest variance when p=1/2
- variance decreases as p moves away from ½ becoming 0 when p=0 or p=1