Fundamental Assumptions of Parametric Models Flashcards
What are statistical models?
1) Models are approximations (representations) of reality focusing on practically relevant aspects.
2) “Models are always wrong, but sometimes useful” George Box
3) Statistical models are also machines (intentionally overly complicated to carry out simple function)
What do models help us do?
1) Models help us to understand a complex matter.
2) Ideally we want to be able to assess limitations to their usefulness.
What is a normal distribution of data also commonly called?
A Bell Curve
What do we use to visualise normal distribution of data?
1) Histograms: counts / frequency of observations x.
2) Density plots/Defined by its density function: relative likelihood of x taking on a certain value
(smooth continuous line of bell curve)
What are the features of a normal distribution?
1) Symmetric: Left and right halves of data are mirror images of each other
2) Mean = Mode = Median
3) Tails never hit zero
4) They are characterised by mean and standard deviation
5) The ‘standard’ normal distribution is
mean = 0 and SD = 1
What does Central Limit Theorem (CLT) suggest?
The sampling distribution will be approximately normally distributed for very large sample sizes, regardless of the (type / shape) of the distribution of our sample.
What does Central Limit Theorem (CLT) allow us to do in relation to parametric models?
We can use parametric statistical inference even if we are sampling from a population that is weird (i.e. not normally distributed).
What is the mean from our sample?
The mean of a sampling distribution is an estimate of the population mean (μ; Greek mu)
Why are the model assumptions important?
1) Machines need input to perform operations on whatever is input.
2) Parametric statistical models make assumptions about the input they receive
3) They will always give some output.
4) Reliability of the output depends on how well the input ‘fits’ these assumptions
What is a parametric model?
1) A family of probability distributions with a finite number of parameters.
2) Statistical models that make parametric assumptions about their input (non-parametric models do not make the same assumptions: e.g. Chi-squared test, Mann Whitney U test, Spearman’s rank correlation)
What do parametric models assume about data distribution?
Normal distribution:
-Normal distribution has two parameters: mean (m) and standard deviation (sd)
-Normal distribution is assumed in t-tests, ANOVAs and linear regression
Other than normal distribution, what do parametric models assume?
(hint: LINE)
1) Linearity (for continuous predictors in regression models)
2) Independence (observations separate from one another?)
3) Normality (normally distributed)
4) Equal variance (aka homogeneity)
5) Remember using LINE
What must observations be for Central Limit Theorem CLT to apply?
Observations must be independent and identically distributed (IID) for the central limit theorem to apply
Normal distribution requires which kind of data?
Interval / Continuous data
What is the ‘Normality’ assumed by parametric models?
Parametric models assume normal distribution of data
-Interval / Continuous data