CRP 109 stats Lecture 2 Flashcards
z Score
-The number of standard deviations that a given value x is above or
below the mean.
-Round z scores to two decimal places.
-It is expressed as numbers with no units of measurement.
-If an individual data value is less than the mean, its corresponding z
score is a negative
-Units have now been converted to “standard deviations away from the
mean” and can thus be compared
Random Variable
A variable, typically represented by x , that has a single
numerical value, determined by chance, for each outcome of a
procedure
Discrete Random Variable
Has a collection of values that is finite or
countable (even theoretically)
Continuous Random Variable
A collection of values that has infinitely
many values, and is not countable.
Probability Distribution
gives the probability for each
value of the random variable
-We use 0+ to represent a probability value that is
positive but very small. Rounding to 0 would be
misleading because it would incorrectly suggest that the
event is impossible
Probability Distribution Requirements
-There is a numerical (not categorical) random variable x , and its
number values are associated with corresponding probabilities
-sum of P(x) = 1
-P(x) is between 0 and 1 inclusive for all values of x
Probability Histogram
-vertical scale shows probabilities instead of relative frequencies based on actual sample results.
-The areas of the rectangles are the same as the probabilities from the
corresponding probability distribution table
-probability distribution can also be in the form of a formula
Expected Value (E)
-theoretical mean value of the outcomes for infinitely many trials
-Does not need to be a whole number
Bernoulli Trial
-A Bernoulli trial is an experiment with only two possible outcomes:
success or failure
Binomial probability distribution
outcomes belong to two categories
1. The procedure has a fixed number of Bernoulli trials. One Bernoulli
trial is a single observation.
2. The trials must be independent, meaning that the outcome of any
individual trial does not affect the probabilities in the other trials.
3. Each trial must have all outcomes classified into exactly two categories,
commonly referred to as success and failure.
4. The probability of a success remains the same in all trials
Binomial probability distribution notation
-S (success) and F (failure)
p = probability of a success in one of the n trials
q = probability of a failure in one of the n trials = 1 − p
n = fixed number of Bernoulli trials
x = specific number of successes in n trials
P(x) = probability of getting exactly x successes among
the n trials
Sampling With/Without Replacement
-The binomial distribution will be applicable in cases where we sample
with replacement.
-If we sample from a small finite population without replacement, the
binomial distribution should not be used because the events are not
independent
Hypergeometric Distribution
If sampling is done without replacement and the outcomes belong to one of two types (success/failure), we can use the hypergeometric
distribution
Poisson probability distribution
discrete probability distribution
that applies to occurrences of some event over a specified interval
1. The random variable x is the number of occurrences of an event in
some interval.
2. The occurrences must berandom.
3. The occurrences must be independent of each other.
4. The occurrences must be uniformly distributed over the interval being
used
-determined only by the mean μ.
-The possible values of x has no upper limit
μ = mean number of occurrences of the event in the intervals
Poisson Distribution as Approximation to Binomial
Requirements:
1. n ≥ 100
2. np ≤ 10
Then for the Poisson distribution, we need parameter μ = np
Uniform Distribution
-random variable is continuous (although it can also be used for
discrete random variables).
-The values of the random variable are spread evenly over the range of
possibilities
Density Curves
-The graph of any continuous probability distribution is called a density curve.
Properties:
-The total area under the curve is 1.
-There is a correspondence between area and probability
Normal Distribution
-The random variable is continuous.
-Graph is symmetric and bell-shaped
-characterized by the population mean, μ, and the population standard deviation, σ
Standard Normal Distribution
-special normal distribution with the
following additional properties:
-Population mean, μ = 0.
-Population standard deviation, σ = 1.
-Commonly, the z -score is used as the label for the horizontal axis of
the graph.
Table A-2: Standard Normal Distribution
can be used to determine the area (probability) when given a z
score, or to determine the z score when given an area (probability)
-It is designed only for the standard normal distribution
Finding the Area Between Two Values
The area corresponding to the region between two z scores can be found by
finding the difference between the two areas found in Table A-2 (z score table)
Critical Values
For the standard normal distribution, a critical value is a z score on the
borderline separating those z scores that are significantly low or
significantly high
Converting Distributions
We can perform a conversion that allows us to “standardize” any
normal distribution so that x values can be transformed to z scores
z = x - mu / standard deviation
Sampling Distribution of a Statistic
-The distribution of all values of the
statistic when all possible samples of the same size n are taken from the same population.
-The statistic can refer to the sample proportion, sample mean, sample
variance, etc
Sampling Distribution of the Sample Proportion
p population proportion
pˆ sample proportion
-The distribution of sample proportion tends to approximate a normal
distribution.
-The mean of sample proportions is the same as the population mean
Sampling Distribution of the Sample Mean
-The distribution of sample mean tends to approximate a normal
distribution.
-The mean of sample means is the same as the population mean
Sampling Distribution of the Sample Variance
-The distribution of sample variance tends to be a distribution skewed
to the right.
-The mean of sample variance is the same as the population variance
Estimators
-Estimator A statistic used to infer (estimate) the value of a population
parameter.
-Unbiased Estimator A statistic that targets the value of the corresponding
population parameter in the sense that the sampling distribution of the statistic has a mean that is equal to the
corresponding population parameter, such as pˆ, x¯,s2.
-Biased Estimator A statistic that does not target the value of the
corresponding population parameter, such as median, range, s.
Central Limit Theorem (CLT)
-For all samples of the same size n with n > 30, the sampling distribution of
x¯can be approximated by a normal distribution with mean μ and standard deviation
-Given any population with any distribution, the distribution of x¯can be approximated by a normal distribution when the samples are large
enough with n > 30
Standard error of the mean, SEM
Standard deviation of all values of the sample mean
Applying the CLT
- Population (with any distribution) has mean μ and standarddeviation
σ. - Simple random samples all of the same size n are selected from the
population.
Requirement:
-Population has a normal distribution or n > 30
Considerations During Problem Solving
- Check Requirements: When working with the mean from a sample,
verify that the normal distribution can be used by confirming that the
original population has a normal distribution or the sample size is
n > 30. - Individual Value or Mean from a Sample? Determine whether you
are using a normal distribution with a single value x or the mean x¯ from a
sample of n values
Normal Quantile (Probability) Plot
A normal quantile plot is a graph of points (x , y ) where each x value is from
the original set of sample data, and each y value is the corresponding z score that is expected from the standard normal distribution.
-If the data forms (approximately) a straight line, then we can assume it
arises from a normal distribution
Sample Data From a Normally Distributed Population?
- Histogram: Construct a histogram. If the histogram departs
dramatically from a bell shape, conclude that the data do not have a
normal distribution. - Outliers: Identify outliers. If there is more than one outlier present,
conclude that the data might not have a normal distribution. - Normal quantile plot: If the histogram is basically symmetric and
the number of outliers is 0 or 1, look at a normal quantile plot.
The population is normal if the pattern of the points is reasonably
close to a straight line
lognormal distribution
-Many data sets have a distribution that is not normal, but we can
transform the data so that the modified values have a normal
distribution.
-One common transformation is to transform each value of x by taking
its logarithm.
-If the distribution of the logarithms of the values is a normal distribution,
the distribution of the original values is called a lognormal distribution
approximate normal distribution requirements
- The sample is a simple random sample of size n from a population in
which the proportion of successes is p, or the sample is the result of
conducting n independent trials of a binomial experiment in which the
probability of success is p. - np ≥ 5 and nq ≥5.
If the above requirements are satisfied, then the binomial probability
distribution of the random variable x can be approximated by a normal
distribution
Continuity Correction
When using the normal distribution (which is a continuous distribution) as
an approximation to the binomial distribution (which is a discrete
distribution), a continuity correction is made to a discrete whole number x in
the binomial distribution by representing the discrete whole number x by the
interval from x − 0.5 to x + 0.5
1. Check the requirements that np ≥ 5 and nq ≥ 5.
2. Find μ = np and σ = √npq to be used for the normal distribution.
3. Identify the discrete whole number x that is relevant to the binomial
probability problem being considered, and represent that value by the
region bounded by x ±0.5