EAB - Estimation and Significance Tests and P Values Flashcards
What is sampling error?
Samples provide an incomplete picture of the population.
Different samples will give different estimates, which is called ‘sampling error’.
What is sampling distribution?
Sample estimates (e.g. means) are calculated from multiple samples from the same population.
They will will then have a distribution of differing values which is known as the ‘sampling distribution’.
What are two measures we can introduce to deal with uncertainty in drawing conclusions?
- Confidence interval:
If we are estimating some quantity from our data, for example, the proportion of patients who have a particular attribute, then we can quantify the imprecision in the estimate using a confidence interval. - Statistical significance test:
If we are testing a hypothesis, for example, comparing blood pressure in two groups, then we can do a statistical significance test which helps us to weigh the evidence that the sample difference we have observed is in fact a real difference.
What is the relationship between sample size and how close it is to the true mean?
The bigger the sample size, the closer the estimate is to the true mean.
What is the relationship between spread of data and how close it is to the true mean?
The smaller the spread of data (standard deviation), the closer the estimate is to the true mean.
What is a standard error?
A standard error (SE) is an indication of the extent of the sampling error.
Standard error tells us how much a sample mean tends to vary from the population mean (true mean). It provides an estimate of the precision of the sample mean.
How do you calculate standard error?
For a sample mean, it can be calculated from the standard deviation divided by the square root of the sample size.
(SE = SD / √[𝑁])
How can standard error be used to calculate a confidence interval?
The true (population) mean can be expected to lie in the range: (sample mean – 1.96 standard errors) to (sample mean + 1.96 standard errors) in 95% of calculations.
What are our assumptions when calculating a 95% confidence interval from population mean?
- this is normal data or a large sample (at least 60)
- the sample is chosen at random from the population
- the observations are independent of each other
What are our assumptions when calculating a 95% confidence interval from population proportion?
- the sample is chosen at random from the population
- the observations are independent of each other
- the proportion with the characteristic is not close to 0 or 1
- np and n(1-p) are each greater than 5 (large sample)
How do you calculate the standard error for proportion?
Multiply the proportion with the characteristic by the proportion without the characteristic:
p(1-p)
Divide by the sample size:
p(1-p)/n
Take the square root to deduce the SE:
√[(𝑝 × (1 − 𝑝)/𝑛)]
What is a significance test (and its benefit)?
A significance test uses data from a sample to show the likelihood that a hypothesis about a
population is true. There are always two mutually exclusive hypotheses since, if the hypothesis being tested is not true, then the opposite hypothesis must be true.
A measure of the evidence for or against the hypothesis is provided by a P value.
What is the null hypothesis?
The null hypothesis is the baseline hypothesis which is usually of the form ‘there is no difference’ or
‘there is no association’.
The corresponding alternative hypothesis is ‘there is a difference’ or ‘there is an association’.
What is a two-sided test (two-tailed test)?
It is known as a two-sided or two-tailed test when the alternative hypothesis is general and allows the difference to be in either
direction.
What is a one-sided test (one-tailed test)?
It is known as a one-sided or one-tailed test when the alternative hypothesis is not general and allows the difference to be in only one
direction.
Two-sided tests should always be used unless there is clear justification at the outset to use a one-sided test.