Week 8 - Point Estimation and Interval Estimation Flashcards

1
Q

What is point estimation? (2 points)

A

1) it provides a single value or point, based on a random sample, to estimate the population parameter of interest

2) “Best” guess about the value of the parameter

eg the using a sample mean (y-bar) to estimate the population mean (E(Y) = μ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an estimator and an estimate?

A

Estimator: a summary statistic or a sample statistic used to estimate a population parameter

eg. y-bar is an estimator

Estimate: a specific value of the estimator computed from a given sample

eg. y-bar = 0.48 in a given sample, 0..48 is an estimate (a point estimate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

whats the difference between “E(Yi|Xi) =” and “Y-hat = “

A

the first one is the population model which would give us the true average in theory if we could sample every single person in the population.

In practice however, we would use the second one, which gives us the predicted value of Yi, given Xi which gives us the best guess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why can we use a sample mean as an estimator for the population mean (μ)?

A

because a sample mean (y-hat) is an unbiased and consistent estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

are all possible estimators unbiased or consistent? Why?

A

no, we need to know the characteristics of the sampling distribution to determine whether an estimator is unbiased or consistent.

basically, because there is usually a margin of error in the point estimate, we need to find the confidence interval to see how statistically significant our point estimate is

the characteristics would be the interval estimation - basically, what is the precision of the interval estimation based on the confidence level and the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define confidence interval

A

an interval of values that is believed to contain the population parameter of interest within a certain degree of confidence (=confidence level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

define confidence level

A

the confidence level is the probability that this method produces an interval containing the population parameter across repeated sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how you you calculate the confidence interval?

A

confidence interval = point estimate +/- margin or error

  • upper bound: PE + Margin of error
  • lower bound: PE - Margin of error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the margin of error

A

Margin of Error = Z x SE

1) where Z is a critical value and found from the sampling distribution if an estimator given a specific confidence level

SIMPLIFIED:
It tells you: How confident do we want to be?

For example:

If you want 95% confidence → Z ≈ 1.96
If you want 99% confidence → Z ≈ 2.58

The higher the confidence, the wider the margin of error (to be more sure).

2) where SE is an estimate of the standard error of the sampling distribution of an estimator (R calculates this for us)

SIMPLIFIED:
It tells you: How much does our estimate jump around from sample to sample?

Bigger SE = more noise in your estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

whats the equation for the Confidence Interval

A

y-bar +/- Z x σ/(square root of N)

basically the standard error is calculated by the estimated standard deviation (from the sample) divided by the square root of the number of observations

why do we need to know this?
to know how the SE changes according to sample size changing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the equation for the confidence interval when Y is continuous vs when Y is binary (Y= 0 or 1)

A

when Y is continuous:
y-bar +/- Z x σ/(square root of N)

when Y is binary:
sigma (which represents the estimated standard deviation) is replaced by the variation in a binary Y, which is

y-hat x (1-(y-hat))
basically the sample proportion of those who chose one over the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

So lets say we dont know the margin of error, given that Y is binary, and we are looking for the 95% CI for the population mean μ (=the population proportion of Y = 1), how would we solve for the margin of error

  • lets say y-hat is known and it is 0.48
A

use the CI equation for binary Y, sub in y-hat value, and critical value for 95% CI (1.96 or approx 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how can we interpret this:

“the 99% CI for the population mean μ (= the population proportion of Y = 1) is [0.44, 0.52] and our y-hat estimate is 0.48”

A

we are 99% confident that the true population of Y=1 in the entire population is somewhere between 44%-52% across repeated sampling.

(44%-52% of people in the entire population will answer Y=1)
eg. Y=1 could be agree to a survey question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What will happen to the length of confidence interval if we decrease the confidence level (eg from 99% to 95%

A

it will become shorter because as confidence level decreases, the critical value will decrease

therefore the margin of error will decrease, and the confidence interval will become shorter

shorter intervals means more precise values - eg a 90% confidence level gives us a shorter CI which means after repeated sampling, we can be sure that 90% of those intervals will include the true value of μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens to the confidence interval with a given confidence level as the sample size N increases

A

the confidence interval will become shorter as N increases, meaning that a larger sample size leads to greater precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When do we use a probability distribution in the process of statistical inference?

A

when we are drawing a random sample (whether it be an individual or range of individuals) from the whole population - imagine joint probability distribution here
- drawing a certain value or a certain range of values of Y in our sample, and therefore, it is considered as the probability distribution of random variable Y

when we are looking at the value of a sample statistic/summary statistic from a sample mean (y-bar) which varies from sample to sample by chance. hence it is a random variable
- Then, the distribution of y-bar across repeated sampling (= the sampling distribution of y-bar) is a probability distribution of a random variable, and therefore, a probability distribution.

17
Q

why is probability distribution not used in a given sample?

A

because when you are looking at the distribution of Y in a given sample, these values are fixed, it doesn’t tell anything about how it varies by chance from sample to sample

18
Q

What does it mean that an estimator is unbiased

A

An estimator is unbiased, if the mean of the sampling distribution of the estimator is the same as the value of the population parameter that the estimator is used to estimate.

***** simplified: an estimator is unbiased if the AVERAGE of the distribution of THE SAMPLE AVERAGES ACROSS REPEATED SAMPLING is the same as the value of the population parameter (population average)

basically its not biased if after a bunch of sampling, we took the results and averaged them out and the answer equalled to the population parameter

19
Q

Why is a sample mean of y an unbiased estimator for the population mean of Y?

A

because we know that y-bar (sample mean) will give us the right answer ON AVERAGE across repeated sampling

20
Q

What does it mean that an estimator is consistent?

A

An estimator is consistent, if its value gets closer and closer to the value of the population parameter that the estimator is used to estimate as the sample size (= the number of observations in a sample) increases.

21
Q

True or false:
In the confidence interval estimated in a given example, the 95% confidence level was used. What this means is that in a given sample, the values of y of 95% of the observations in the sample fall in this confidence interval.

A

False!

The confidence level is the probability that the confidence interval would include the value of the population parameter that we try to estimate (the population mean in the current case) across repeated sampling. See the questions that follow for more about the interpretation of the confidence level.

22
Q

True or false:
In 95% of the time across repeated sampling, the value of y-bar falls in the interval of µ ± 1.96 times standard error

A

True!

This interval corresponds to the 95% probability in the sampling distribution of y-bar approximated by a normal distribution as you can see in the above figure. From this figure, we can see that the value of y-bar falls in the interval of µ ± 1.96 times standard error in 95% of the time across repeated sampling.