Statistics 2 Flashcards
The process of using what we know about a sample to make probabilistic statements about the broader population.
Statistical Inference
What does statistical inference rely on
- Relies on probability, we can estimate what is going on in the population based on a sample.
Population parameter
A quantity of the population
Sample Statistic
A quantity of the sample, provides a estimate of the population parameter.
Distributions
Distributions are representations of how often each value occurs.
Probability distributions
- Lists the possible outcomes of an events and their probabilities
- Assigned a probability to each possible value of a random variable.
- Each probability is a number between zero and one.
- The sum of the probabilities of all possible values equals 1.
Probability distributions: discrete variables
- If the population proportion who lives in households without children is 80%β¦
- That means that the probability that an adult selected randomly from the population lives in a house without children is 80%.
Significance tests
- Hypothesis: Prediction that the parameter takes a particular numerical value or falls in a certain range of values. It is a statement about a population.
- For example. The mean age of the UK is 50.
- A statistical significance test uses data to summarize the evidence about a hypothesis β by comparing point estimates of the parameters with the values predicted by the hypothesis.
Five parts of a signifcance test
Assumptions
Hypotheses
Test statistic
p-value
Conclusion
Assumptions
- Type of data: quantitative or categorical?
- Randomization: assumed randomization in the data gathering, such as random sample.
- Population distribution: Some tests assume a certain distribution.
- Sample size: Many tests use a t sampling distribution, or approximately normal. If sample size large enough, no need for normal population distribution.
Null Hypotheses
- The null hypothesis, π»_π : a statement that the parameter takes a particular value.
- The mean age of UK population is 50.
The Alternative hypothesis
π»_π : the parameter falls in some alternative range of values. An effect of some type. This is the research hypothesis.
The mean age of UK voters if higher/lower/different than 50.
Hypothesis in a significance test analysis
- Analyses the sample evidence about H0 by investigating if the data contradicts H0, suggesting that Ha is true.
- Proof by contradiction
- Null hypothesis presumed to be true, under this presumption if the data observed is very unusual, we reject the null.
Test statistic
The test statistic summarizes how far the estimate falls from the parameter value in π»_π.
The number of standard errors between the estimate and the π»_π value.
P-
The probability that the test statistic equals the observed value, or a value even more extreme in the direction predicted by π»_π.
- Smaller the P value the stronger the evidence for Ha.
- Larger P number means that if HO is true than observed data wouldnt be unusual.
Why do smaller p-value indicate stronger evidence against H0?
- Because the data would then be more unusual if Hβ were true.
What is the a-value/a-level
- The boundary value 0.05 is called the πΌβπ£πππ’π or πΌ βlevel of the test.
- The πΌ-level thus is a number such that we reject π»_0 if the p-value is less than or equal to it.
- The Ξ±-level is the significance level.
Reject π»_π if p<= Ξ±. Common levels 0.05 and 0.01. - The smaller the πΌβlevel the stronger the evidence must be to reject π»_0.
- To avoid bias in the decision-making process you select πΌ before analysing the data.
Conclusion of the test
- The p-value summarises the evidence against π»_π.
- To draw the conclusion of the test we report and interpret p-values.
- If the p-value is sufficiently small, we reject π»_π and accept π»_π.
- P<=0.05 β results are significant at the 0.05 level.
- If π»_π were true, the chance of getting such extreme values in the sample data would be smaller than 0.05.