CRP 109 stats Lecture 2 Flashcards

1
Q

z Score

A

-The number of standard deviations that a given value x is above or
below the mean.
-Round z scores to two decimal places.
-It is expressed as numbers with no units of measurement.
-If an individual data value is less than the mean, its corresponding z
score is a negative
-Units have now been converted to “standard deviations away from the
mean” and can thus be compared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Random Variable

A

A variable, typically represented by x , that has a single
numerical value, determined by chance, for each outcome of a
procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete Random Variable

A

Has a collection of values that is finite or
countable (even theoretically)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Continuous Random Variable

A

A collection of values that has infinitely
many values, and is not countable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Probability Distribution

A

gives the probability for each
value of the random variable
-We use 0+ to represent a probability value that is
positive but very small. Rounding to 0 would be
misleading because it would incorrectly suggest that the
event is impossible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probability Distribution Requirements

A

-There is a numerical (not categorical) random variable x , and its
number values are associated with corresponding probabilities
-sum of P(x) = 1
-P(x) is between 0 and 1 inclusive for all values of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probability Histogram

A

-vertical scale shows probabilities instead of relative frequencies based on actual sample results.
-The areas of the rectangles are the same as the probabilities from the
corresponding probability distribution table
-probability distribution can also be in the form of a formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Expected Value (E)

A

-theoretical mean value of the outcomes for infinitely many trials
-Does not need to be a whole number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bernoulli Trial

A

-A Bernoulli trial is an experiment with only two possible outcomes:
success or failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Binomial probability distribution

A

outcomes belong to two categories
1. The procedure has a fixed number of Bernoulli trials. One Bernoulli
trial is a single observation.
2. The trials must be independent, meaning that the outcome of any
individual trial does not affect the probabilities in the other trials.
3. Each trial must have all outcomes classified into exactly two categories,
commonly referred to as success and failure.
4. The probability of a success remains the same in all trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Binomial probability distribution notation

A

-S (success) and F (failure)
p = probability of a success in one of the n trials
q = probability of a failure in one of the n trials = 1 − p
n = fixed number of Bernoulli trials
x = specific number of successes in n trials
P(x) = probability of getting exactly x successes among
the n trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sampling With/Without Replacement

A

-The binomial distribution will be applicable in cases where we sample
with replacement.
-If we sample from a small finite population without replacement, the
binomial distribution should not be used because the events are not
independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hypergeometric Distribution

A

If sampling is done without replacement and the outcomes belong to one of two types (success/failure), we can use the hypergeometric
distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Poisson probability distribution

A

discrete probability distribution
that applies to occurrences of some event over a specified interval
1. The random variable x is the number of occurrences of an event in
some interval.
2. The occurrences must berandom.
3. The occurrences must be independent of each other.
4. The occurrences must be uniformly distributed over the interval being
used
-determined only by the mean μ.
-The possible values of x has no upper limit
μ = mean number of occurrences of the event in the intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Poisson Distribution as Approximation to Binomial

A

Requirements:
1. n ≥ 100
2. np ≤ 10
Then for the Poisson distribution, we need parameter μ = np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Uniform Distribution

A

-random variable is continuous (although it can also be used for
discrete random variables).
-The values of the random variable are spread evenly over the range of
possibilities

17
Q

Density Curves

A

-The graph of any continuous probability distribution is called a density curve.
Properties:
-The total area under the curve is 1.
-There is a correspondence between area and probability

18
Q

Normal Distribution

A

-The random variable is continuous.
-Graph is symmetric and bell-shaped
-characterized by the population mean, μ, and the population standard deviation, σ

19
Q

Standard Normal Distribution

A

-special normal distribution with the
following additional properties:
-Population mean, μ = 0.
-Population standard deviation, σ = 1.
-Commonly, the z -score is used as the label for the horizontal axis of
the graph.

20
Q

Table A-2: Standard Normal Distribution

A

can be used to determine the area (probability) when given a z
score, or to determine the z score when given an area (probability)
-It is designed only for the standard normal distribution

21
Q

Finding the Area Between Two Values

A

The area corresponding to the region between two z scores can be found by
finding the difference between the two areas found in Table A-2 (z score table)

22
Q

Critical Values

A

For the standard normal distribution, a critical value is a z score on the
borderline separating those z scores that are significantly low or
significantly high

23
Q

Converting Distributions

A

We can perform a conversion that allows us to “standardize” any
normal distribution so that x values can be transformed to z scores
z = x - mu / standard deviation

24
Q

Sampling Distribution of a Statistic

A

-The distribution of all values of the
statistic when all possible samples of the same size n are taken from the same population.
-The statistic can refer to the sample proportion, sample mean, sample
variance, etc

25
Q

Sampling Distribution of the Sample Proportion

A

p population proportion
pˆ sample proportion
-The distribution of sample proportion tends to approximate a normal
distribution.
-The mean of sample proportions is the same as the population mean

26
Q

Sampling Distribution of the Sample Mean

A

-The distribution of sample mean tends to approximate a normal
distribution.
-The mean of sample means is the same as the population mean

27
Q

Sampling Distribution of the Sample Variance

A

-The distribution of sample variance tends to be a distribution skewed
to the right.
-The mean of sample variance is the same as the population variance

28
Q

Estimators

A

-Estimator A statistic used to infer (estimate) the value of a population
parameter.
-Unbiased Estimator A statistic that targets the value of the corresponding
population parameter in the sense that the sampling distribution of the statistic has a mean that is equal to the
corresponding population parameter, such as pˆ, x¯,s2.
-Biased Estimator A statistic that does not target the value of the
corresponding population parameter, such as median, range, s.

29
Q

Central Limit Theorem (CLT)

A

-For all samples of the same size n with n > 30, the sampling distribution of
x¯can be approximated by a normal distribution with mean μ and standard deviation
-Given any population with any distribution, the distribution of x¯can be approximated by a normal distribution when the samples are large
enough with n > 30

30
Q

Standard error of the mean, SEM

A

Standard deviation of all values of the sample mean

31
Q

Applying the CLT

A
  1. Population (with any distribution) has mean μ and standarddeviation
    σ.
  2. Simple random samples all of the same size n are selected from the
    population.
    Requirement:
    -Population has a normal distribution or n > 30
32
Q

Considerations During Problem Solving

A
  1. Check Requirements: When working with the mean from a sample,
    verify that the normal distribution can be used by confirming that the
    original population has a normal distribution or the sample size is
    n > 30.
  2. Individual Value or Mean from a Sample? Determine whether you
    are using a normal distribution with a single value x or the mean x¯ from a
    sample of n values
33
Q

Normal Quantile (Probability) Plot

A

A normal quantile plot is a graph of points (x , y ) where each x value is from
the original set of sample data, and each y value is the corresponding z score that is expected from the standard normal distribution.
-If the data forms (approximately) a straight line, then we can assume it
arises from a normal distribution

34
Q

Sample Data From a Normally Distributed Population?

A
  1. Histogram: Construct a histogram. If the histogram departs
    dramatically from a bell shape, conclude that the data do not have a
    normal distribution.
  2. Outliers: Identify outliers. If there is more than one outlier present,
    conclude that the data might not have a normal distribution.
  3. Normal quantile plot: If the histogram is basically symmetric and
    the number of outliers is 0 or 1, look at a normal quantile plot.
    The population is normal if the pattern of the points is reasonably
    close to a straight line
35
Q

lognormal distribution

A

-Many data sets have a distribution that is not normal, but we can
transform the data so that the modified values have a normal
distribution.
-One common transformation is to transform each value of x by taking
its logarithm.
-If the distribution of the logarithms of the values is a normal distribution,
the distribution of the original values is called a lognormal distribution

36
Q

approximate normal distribution requirements

A
  1. The sample is a simple random sample of size n from a population in
    which the proportion of successes is p, or the sample is the result of
    conducting n independent trials of a binomial experiment in which the
    probability of success is p.
  2. np ≥ 5 and nq ≥5.
    If the above requirements are satisfied, then the binomial probability
    distribution of the random variable x can be approximated by a normal
    distribution
37
Q

Continuity Correction

A

When using the normal distribution (which is a continuous distribution) as
an approximation to the binomial distribution (which is a discrete
distribution), a continuity correction is made to a discrete whole number x in
the binomial distribution by representing the discrete whole number x by the
interval from x − 0.5 to x + 0.5
1. Check the requirements that np ≥ 5 and nq ≥ 5.
2. Find μ = np and σ = √npq to be used for the normal distribution.
3. Identify the discrete whole number x that is relevant to the binomial
probability problem being considered, and represent that value by the
region bounded by x ±0.5