unit 2 - chapter 8 - confidence intervals Flashcards

1
Q

we want…

A

Want full information not partial but we aren’t working with full information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sampling, samples and statistics

A

Risk and uncertainty

CI are an assignment of risk quantifying risk and the risk attached to the uncertainty of any statistic

Risk =/= injurious - if can quantify the risk – which CI can do– you cna make decisions with some sense of awareness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

confidence intervals

A

CI are an assignment of risk quantifying risk and the risk attached to the uncertainty of any statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

We want the sample to represent the population so

A

we take a good sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the confidence interval gives us…

A

CI gives us CONTEXT

For interpreting our single statistic about how good is our sample about mew’s that would be consistent with you data

X bar taking a selfie and mew shows on the phone… sample is a snapshot of population average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the confidence interval equation

A

mew = zbar +- z(a/2) (s/square root of n)

X bar = sample mean
Z (a/2) = confidence level coefficient (z score)
(S / square root of n) = standard error of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

reporting a confidence interval

A
  1. CI [LB,UB]
  2. LB =< u =< UB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the confidence interval equation and the law of large numbers

A
  1. As n increases, the observed stat moves closer to the true parameter
  2. As n increases, variation within a sample decreases
  3. Smaller samples are identified with greater (variation) uncertainty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

comparison of z distibutions

A
  • profile: symmetrical/bell curve
  • shape: statistic
  • mean: mew = 0
  • std deviation: s = 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

comparison of t distibutions

A
  • profile: symmetrical/bell curve
  • shape: bell curve
  • mean: change (n)
  • std deviation: => 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

t distributions

A

sampler (smaller?) distribution
Smaller n
If n = 30 then t and z are pretty close
T distribution accounts for the greater uncertainty associated with small samples (n decreases variation increases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what distribution should we use?

sample size vs do you know sigma

A

sample <30 and ? sigma
t

sample >30 and sigma
z

sample < 30 and ? sigma
z

sample > 30 and sigma
z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when do we use a t distribution

A

when we do not know sigma AND the sample size is less than 30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

t distribution and confidence interval

A
  • Bigger value then t distribution is wider and wider confidence intervals as opposed to the z distribution
  • don’t Want this because more risk is with smaller distribution
  • don’t Want bigger confidence interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

never compromise vs trade offs

A

never compromise
1. Small sample
2. High confidence
3. Tight interval

trade-offs
- Built into the formula
- Ideally we want these 3 things, but we cant have them all realistically…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

things to remember about confidence interval

A

We are 95% CONFIDENT OUR CI CONTAINS MU”
THIS IS NOT RIGHT - do not write this
WE DO NOT KNOW IF OUR CI IS GREEN OR RED

More about sensing than knowing
We can imagine or consider value for mu occurring across range of numbers

Numbers compatible with the CI are good money bets
A ci tempers that uncertainty under the idea of risk

Confidence address risk

Two different mews = two different populations

17
Q

inferential statistics

A

We use sample data to make generalizations about an unknown population. This part of statistics is called inferential statistics. The sample data help us to make an estimate of a population parameter.

We realize that the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals.

18
Q

point estimate

A

If so, you could conduct a survey and calculate the sample mean, 𝑥– , and the sample standard deviation, s. You would use 𝑥– to estimate the population mean and s to estimate the population standard deviation.

The sample mean, 𝑥– , is the point estimate for the population mean, μ. The sample standard deviation, s, is the point estimate for the population standard deviation, σ.

19
Q

statistic

A

𝑥– and s are each called a statistic.

20
Q

confidence interval

A

A confidence interval is another type of estimate but, instead of being just one number, it is an interval of numbers.

The interval of numbers is a range of values calculated from a given set of sample data. The confidence interval is likely to include the unknown population parameter.

21
Q

empirical rule

A

the Empirical Rule, which applies to the normal distribution, says that in approximately 95% of the samples, the sample mean, 𝑥–, will be within two standard deviations of the population mean μ.

For our iTunes example, two standard deviations is (2)(0.1) = 0.2. The sample mean 𝑥– is likely to be within 0.2 units of μ.

Because 𝑥− is within 0.2 units of μ, which is unknown, then μ is likely to be within 0.2 units of 𝑥–with 95% probability. The population mean μ is contained in an interval whose lower number is calculated by taking the sample mean and subtracting two standard deviations (2)(0.1) and whose upper number is calculated by taking the sample mean and adding two standard deviations. In other words, μ is between 𝑥⎯⎯ − 0.2 and 𝑥⎯⎯ + 0.2 in 95% of all the samples.

We say that we are 95% confident that the unknown population mean number of songs downloaded from iTunes per month is between 1.8 and 2.2. The 95% confidence interval is (1.8, 2.2).

22
Q

true mean

A

For this example, let’s say we know that the actual population mean number of iTunes downloads is 2.1. The true population mean falls within the range of the 95% confidence interval. There is absolutely nothing to guarantee that this will happen.

Further, if the true mean falls outside of the interval we will never know it. We must always remember that we will never ever know the true mean.

Statistics simply allows us, with a given level of probability (confidence), to say that the true mean is within the range calculated. This is what was called in the introduction, the “level of ignorance admitted”.

23
Q

(changing the confidence level or sample size) The Standard deviation of the sampling distribution is further affected by two things

A

the standard deviation of the population and the sample size we chose for our data. Here we wish to examine the effects of each of the choices we have made on the calculated confidence interval, the confidence level and the sample size.

In all other cases we must rely on samples.

24
Q

Another way to approach confidence intervals is through the use of something called the Error Bound…

A

the Error Bound gets its name from the recognition that it provides the boundary of the interval derived from the standard error of the sampling distribution. In the equations above it is seen that the interval is simply the estimated mean, sample mean, plus or minus something. That something is the Error Bound and is driven by the probability we desire to maintain in our estimate, 𝑍𝛼 , times the standard deviation of the sampling distribution. The Error Bound for a mean is given the name, Error Bound Mean, or EBM.

To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need 𝑥−
as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean 𝑥−is the point estimate of the unknown population mean μ.

25
Q

confidence level (abbreviated CL) and EBM and true parameter

A

The margin of error (EBM) depends on the confidence level (abbreviated CL). The confidence level is often considered the probability that the calculated confidence interval estimate will contain the true population parameter.

However, it is more accurate to state that the confidence level is the percent of confidence intervals that contain the true population parameter when repeated samples are taken.

26
Q

A confidence interval for a population mean with a known standard deviation is based on the fact that

A

the sampling distribution of the sample means follow an approximately

27
Q

t-score from the text

A

Scores follow a Student’s t-distribution with n – 1 degrees of freedom. The t-score has the same interpretation as the z-score. It measures how far in standard deviation units 𝑥−is from its mean μ. For each sample size n, there is a different Student’s t-distribution.

28
Q

degrees of freedom

A

The degrees of freedom, n – 1, come from the calculation of the sample standard deviation s. Remember when we first calculated a sample standard deviation we divided the sum of the squared deviations by n − 1, but we used n deviations (𝑥–𝑥⎯⎯values)
to calculate s.

Because the sum of the deviations is zero, we can find the last deviation once we know the other n – 1 deviations. The other n – 1 deviations can change or vary freely. We call the number n – 1 the degrees of freedom (df) in recognition that one is lost in the calculations. The effect of losing a degree of freedom is that the t-value increases and the confidence interval increases in width.

29
Q

student’s t distribution

A

The graph for the Student’s t-distribution is similar to the standard normal curve and at infinite degrees of freedom it is the normal distribution

The mean for the Student’s t-distribution is zero and the distribution is symmetric about zero, again like the standard normal distribution.

The Student’s t-distribution has more probability in its tails than the standard normal distribution because the spread of the t-distribution is greater than the spread of the standard normal. So the graph of the Student’s t-distribution will be thicker in the tails and shorter in the center than the graph of the standard normal distribution.

The exact shape of the Student’s t-distribution depends on the degrees of freedom. As the degrees of freedom increases, the graph of Student’s t-distribution becomes more like the graph of the standard normal distribution.

The underlying population of individual observations is assumed to be normally distributed with unknown population mean μ and unknown population standard deviation σ. This assumption comes from the Central Limit theorem because the individual observations in this case are the 𝑥⎯⎯ s of the sampling distribution.