Normal Distribution & Inferential statistics Flashcards
Why is normal distribution important?
basis for statistical inference
good description of distribution for commonly studied quantitate variables
determines choice of statistical method for data
Normal distribution characterisations
mathematically defined theoretical distribution
symmetrical bell-shaped curve
approx. describes many quantitative variables eg height, birthweight
What does knowing the data is normally distributed & having the mean & SD give us the ability to do?
Calculate the percentage of people that score below/above any value or within a range
Normal distribution common characteristic values
68% of scores with 1 SD
95% within 1.96 SD
99.7% within 3 SD
How is 95% range for normally distributed data calculated?
lower bound of range = mean - 1.96 x SD
upper bound = mean + 1.96 x SD
Population, Sample & data definitions
Population - total number of people with a certain characteristic
Sample - the people from population taking part in study
Data - information from the sample population
Parameters & estimates explained
data from sample is used to obtain estimates of the true values of parameters (true mean/values)
eg % of people with type II diabetes
Why do we need statistical methods?
the answer provided by sample data is rarely the correct answer in the population
uncertainty due to - variability & the sample being a subset of the pop
What are descriptive statistics?
describing & summarising data from a sample
What are inferential statistics?
using sample data to make inferences about characteristics & relationships in the wider population
What are standard errors working out?
How close the estimate from the sample data is to the parameter value (truth) if the study was repeated with different samples
How can standard error be decreased?
using a larger sample sizes as this gives us more confident estimate
Why are inferential statistics required?
we cant draw conclusions about the population using just the mean difference from a sample
What does a 95% Confidence interval show?
95% certain that something in the population is true (giving a range of data from sample study)
Why is a range of data from sample study given when using 95% confidence interval?
Because we don’t know the true population parameter value but can be 95% sure the parameter value will be within the given range
95% confidence interval definition
the range of values within which we are 95% certain the true parameter (mean) lies)
Is a narrow or wider Confidence interval (CI) better
narrower because we are then more certain about what the truth is
What does p-value show?
values between 0-1, the lower the value the greater the chance of evidence contradicting the null hypothesis
What is hypothesis testing
Concerned with what the true answer isn’t
assessing the extent to which the sample estimate disproves the null hypothesis
What is a null hypothesis
The most boring truth - that nothing is different between control & intervention groups in a test
eg Mean systolic blood pressure in diabetics is the same as in the healthy population
What is a alternative hypothesis
the opposite of the null hypothesis
That there is a significant difference between the two sample groups
Why is a 0.05 p-value significant?
is used as a threshold to reject the null hypothesis
below = reject above = not enough data to reject
P-value examples
> 0.001 = ?
0.05 = ?
> 0.1 = ?
< 0.001 = strong evidence against null hypothesis
0.05 = moderate
> 0.1 = little
What does a small p-value not prove?
That there is a large difference between the groups just shows that there is some difference
Why are CI’s better than P-values (hypothesis testing)
because they tell you something about what the true answer in the population is
However CI can not always be calculated
95% CI to 0.05 P-value link?
If 0 (null hypothesis value) is within 95% CI = P-value > 0.05
if it is excluded = < 0.05 p-value
if it is either the upper or lower boundary = 0.05 p-value