Probably And Statistics - Part 2 Flashcards

1
Q

What is an estimator of a population parameter?

A

This is:
-A random variable that depends on sample information
-Of which whose value provides an approximation to this unknown parameter.

A specific value of that random variable is called an estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 4 criteria we use when decided which of two estimators to use?

A

1) Unbiasedness
2) Efficiency
3) Consistency
4) Mean Square Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is unbiasedness?

A

An estimator will be unbiased if the expected value of the estimator is equal to the true population value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is efficiency of an estimator?

A

Efficiency of an estimator refers to how reliable it is.
-If you have two estimators, each with the same number of sample variations, then the more efficient sample will be the sample that has a lower variance.

NOTE: Efficiency takes priority over bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the consistency of estimators?

A

An estimator is considered to be consistent if the difference between the expected value of an estimator and the parameter decreases as the sample size increases, essentially implying as the sample size approaches infinity, the bias diminishes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Mean Square Error (MSE)?

A

If the error is the difference between the true value of the parameter and the estimated value of the parameter, the MSE is the mean of the error squared. This happens to equal Var(X) - bias^2.

If unbiased, the MSE will equal the variance.
An estimator with a smaller MSE is said to be more efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can we find a confidence interval for a parameter?

A

We can do this thanks to the central limit theorem (CLT), as for large samples the standardised mean approaches a standard normal random variable.

-Depending on if we are assessing the mean, variance or proportion, the calculation will be slightly different.

-There’s no guarantee the true parameter will actually fall in this interval, that depends on our confidence level. We can never be 100% confident.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between point and interval estimates?

A

A point estimate is a single value, where as an interval estimate consists of a range of values and has the advantage of providing greater confidence than a point estimate. This is also called a confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is significance level?

A

If 100%(1 - alpha) is the confidence level, then alpha will be the significance level, and will lie between zero and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do we do when we want to find the confidence interval, but the variance of the population is unknown?

A

Usually,new will replace it with its sample variance and use T-distribution tables (if looking at the mean).
If we are looking for the interval estimate of the variance, then we will use the chi-squared table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What assumptions do we use when finding the confidence interval?

A

-The population variance has to be known
-The population is normally distributed
-If the population isn’t normal, use a large sample.

For the mean, we use:
Sample mean +- Z x (s.d/sqrt(n)), this is equal to the point estimate +- the margin of error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we reduce the margin of error?

A

Increase the sample size,
population standard deviation can be reduced,
Decrease the confidence level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When do we use the t-test for confidence interval rather than the z-test.

A

-We don’t know the population variance so estimate it using the sample variance
-The population variance is known but the sample size is small (less than 25).

t = (xbar- mu)/(sample s.d/ sqrt(n))
Then, the confidence interval is the same as the z-test, just using t instead of z.

The degree of freedom for a t-test is n-1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we find the confidence interval for population proportion?

A

If P is the proportion, standard deviation = sqrt(p(1-p)/n).

The confidence interval is then:
p -+ Z(sample s.d)

NOTE: YOU ALWAYS USE A Z TEST FOR PROPORTION, NEVER A T TEST.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we find the confidence interval for the population variance?

A

First we assume the population is normally distributed, and then the confidence interval is based on the sample variance.

The chi-squares distribution has an n-1 degree of freedom.

Then use variance = (n-1)sample variance/chi squared result.

With chi squared value, LB will be the significance level/2, and the UB will be 1 - that.

17
Q

What are dependent samples?

A

This is the confidence interval estimation of the difference between two normal populations means.
These two samples could be:
-Paired/matched samples
-Repeated measures

d = x - y

18
Q

How do we calculate the confidence interval difference between two means?

A

If you have n ‘ith’ pairs between Xi and Yi (of which the observations must be somehow related, the mean difference is the sum of Di/n.

The sample standard deviation is then S = sqrt(sum of squred difference from mean/n-1)

To then find the confidence interval, use dbar +- t(sample s.d/sqrt(n)).

20
Q

What are independent samples?

A

This occurs when we have non-paired data. E.g some units only assigned treatment A, and others treatment B.
Or units in two different groups compared on some survey variable.

21
Q

How to find confidence interval between two means independent and variance is known.

A

The confidence interval for mu(x) - my(y):

(Xbar - Ybar) +- Z(sqrt(var(popX)/n(x) + var(popY)/n(y)).

The part that comes after the Z value is the standard deviation of X and Y.

22
Q

How do we find the confidence interval when the two population variances are unknown and equal?

A

1)If they are equal, we can calculate the ‘pooled variance’:
= [(n(x) - 1)sampleVAR(x) + (n(y) - 1)sampleVAR(y)]/(n(x) + n(y) - 2)
2) Use the t-test with n(x) + n(y) - 2 degrees of freedom.

3) compute confidence interval (Xbar -Ybar) +- t(sqrt(pooledVar/n(x) + pooledVAR/n(y))

23
Q

How do we calculate the confidence interval estimation of the difference between two population proportions?

A

If the samples are randomly and independently drawn, the populations are large and hence can use central limit theorem (and normal distribution), and the population variances are unknown and assumed unequal:

P(p(x) - p(y) +- Z(sqrt(p(x)(1-p(x)/n(x) + p(y)(1-p(y))/n(y) = 1 - alpha

NOTE: You will where get two possible correct answers depending on which way round you set x and y. These can both be correct unless the question specifies a specific order to use.

24
Q

When should you assume a sample is small?

A

Some say when the small sample statistics differ from the large sample statistics, and others say to consider a sample small of it has 60 or less observations.

A good rule of thumb is to play it safe and assume a sample is small if unclear.

25
Q

How can we use confidence intervals for funding allocation?

A

We can use thresholds to ensure the there is only an acceptable amount to error to potentially provide a district which doesn’t need supplementary funding with this funding.
If we want this confidence level to be 95% sure they won’t receive funding, we can work backwards to find the income threshold which whole provide only a 5% change on the district with a mean salary above the boundary for funding receiving it.

This is the basis of a framework known as hypothesis testing.

26
Q

What is a null and alternative hypothesis?

A

A null hypothesis shocks be a statement which required very strong evidence to reject, as if wrongly rejected there would likely be a costly error. It is less costly to accept the null hypothesis despite it being false.
By deciding which error is more costly, we can decide what to make the null hypothesis.

We denote the null hypothesis H0, and the alternative hypothesis H1. The null hypothesis will always contain an = sign (even if less than or equal to etc).
-Usually, it will be the alternative hypothesis that the researcher is trying to support.

27
Q

What is the difference between a 1 or two tailed test?

A

If a alternative hypothesis can be above or below the null, it will be two tailed. If not, it will be one tailed. This is relevant because it will impact our significance level we use for the tails.

28
Q

What are the two different types of errors in hypothesis testing?

A

Type 1 error:
-Rejecting the null hypothesis when true, a serious error. The probability of this error occurring is alpha.

Type 2 error:
-Failing to reject a null hypothesis which is false, the probability of this error is beta.

We cannot find the probability beta unless alpha is known.

29
Q

What is the relationship between type 1 and type 2 errors?

A

-They cannot occur are the same time, Type 1 can only if H0 true, type 2 only if false. If the probability of a type 1 error goes up, the the probability of a type 2 error goes down.

Probability of a type two error also increases when n falls, or when the standard deviation increases. The probability of beta also increases if the difference between the hypothesised parameter and its true value falls.

30
Q

What is the power of a test?

A

This is the probability of rejecting a null hypothesis that is false (it equals 1 - beta).

Power = P(Reject H0 | H1 is true)

The power will increase as sample size increases

31
Q

What is a p-value?

A

This is the probability of obtaining a test statistic more extreme than the observed sample value, given that H0 is true.
It is the smallest value of alpha for which the null hypothesis can be rejected.

32
Q

What condition is necessary for a binomial distribution of the population to be approximated by a normal distribution?

A

nP(1-P) must be greater than 5, so n must be large enough that n = 5/P(1-P).

Once you have confirmed this, you can find the z-value of population proportion using:

z = (p-hat - P0)/sqrt[(Po(1-Po)/n]

33
Q

What are the 4 different ways we could use two sample tests?

A

1) Population means with dependent samples
2) Population means with independent samples
3) Proportion 1 vs Proportion 2
4) Variance 1 vs Variance 2

34
Q

What difference is there when comparing the sample means of dependent vs independent samples?

A

Because dependent samples are related, we can use di = xi - yi and reject Ho if d-bar/(sample difference s.d/sqrt(n)) >t-value

We cannot do this for independent samples, as there is no link between the two groups. If the variances are known, we can use a z-test (formula on formula sheet).
If they are unknown variances, but assumed equal, then we have to use the find the pooled variance, and then conduct a t-test.

35
Q

When hypothesis testing for two population proportions, what do you have to do?

A

You have to first check that your sample size is large enough to be able to consider the binomial distribution as normal, and then use the formula in the formula book to carry out your test, using a z-test.