2 Normal Distribution & Inferences Flashcards
What are the properties of a normal distribution?
- bell shaped
- symmetrical
- mean, median and mode equal
What does the central limit theorem state?
Anything you can average over a large enough number will give you a normal distribution curve
What 2 parameters determine the shape of a normal curve?
Mean- line of symmetry
SD- how spread out the data is
What is the empirical rule?
In a normal distribution curve:
68% of data within 1SD
95% of data within 2SD
99.7% of data within 3SD
If we measure student heights and we know height is normally distributed do we still have to check normality of the data?
Yes
Sample and population
What is a population and what is a sample?
Population: group of all items of interest
Sample: set of data drawn from population
What is a parameter and what is a statistic?
Parameter is a descriptive measure of a population
Statistic is a descriptive measure of a sample
What is inferential statistics?
Draw conclusions or inferences about characteristics of populations based on data from a sample
What is a statistical inference?
An estimate, prediction or decision about a population based on a sample
What is sampling variation
How sample statistics can vary from sample to sample due to random chance
What is standard error?
Measure of sampling variability
What factors reduce standard error/sampling variability?
Increased sample size
Lower variability of outcome
How do we calculate Standard error?
Standard deviation of sample statistic
Sampling variability
Sampling variability = standard error = SD of sample statistic
Decrease w inc sample size
Increase w inc variability if outcome
How do we calculate confidence intervals?
(Sample statistic)+- (confidence level)xSE
If we want to be 95% confident = 1.96
If the 95% confidence interval for the mean is (29.6, 46.9) what does is mean??
There is a 95% probability that the true population mean lies within 29.6 and 46.9
How can we check if our data is normally distributed?
Histogram- is it bell shaped?
Descriptive summary- are mean, median and mode similar?
Does 68% of data like within 1 SD of mean? 95% within 2 SDs?
Q-Q plot - linear
Normality test - sig>0.05
What are the normality test we can run?
Why do we have to be cautious
Kolmogorov-Smirnov - normal probability plot, if normal it will be linear
Highly influenced by sample size
Why do we have confidence intervals and significance levels?
Conclusions and estimates based on sample statistics need measures of reliability
What does confidence level measure?
Proportion of times that an estimating procedure will be correct
What does the significance level measure?
How frequently the conclusion will be wrong in the long run. 5% sig level= conclusion will be wrong 5% of the time in the long run
How do we illustrate significance and confidence level
Alpha = significance
Confidence = 1-alpha
Therefore significance + confidence level = 1
If the sample increases, what would you expect would happen to the confidence interval?
Sample size increases so standard error reduces so confidence interval becomes narrower
How do we calculate the sample proportion for categorical data like success/fail?
no. successes in sample/total count in sample = p hat = “sample proportion” = mean of sample proportion
As the sample size increases, the sample proportion distribution becomes ?
Normal (approximately)
If we repeat samples again and again we will eventually get a normal distribution of the sample mean. What are the conditions for this to take place?
Random sampling - no selection bias
Sample is large enough (>10) & n(p) and n(1-p) >5 (p is mean)
What are the 2 methods of assessing normality?
Q-Q plot - straight line
Kolmogorov Smironov test (sig>0.05)
What is the central limit theorem?
For quantitative data, the sample size is large enough (n>30), distribution of the sample mean (if survey repeated 10000 times), would be Normal distribution.
For categorical data, the sample proportion distribution would be normal (if survey repeated 10000s times) but only following certain conditions.
What are the conditions necessary for central limit theorem to be valid for categorical data (sample proportion distribution to be normal)
Sample size must be large enough
np>=5 and n(1-p)>=5
Randomised selection of sample (no bias)
How do we calculate the CI for 95% confidence level?
CI = sample statistic +- (1.96 x SE)