Fundamental assumptions of parametric models Flashcards
reminders
- Models are approximations of reality focusing on practically relevant aspects.
- “All models are wrong, but some are useful.” – George Box
– Help us to understand a complex matter.
– Ideally we want to be able to assess limitations to their usefulness.
Why are the model assumptions important?
– Machines need input.
– Perform operations on input.
– Always give some output.
– Parametric statistical models make assumptions about the input they receive.
– Reliability of output depends on the fit of input and assumptions.
What is a parametric model?
A family of probability distributions with a finite number of parameters.
E.g. normal distribution has two parameters: mean and standard deviation
Normal distribution is entailed in t-test, ANOVA, linear regression
Those models make parametric assumptions about their input.
Non-parametric models do not make the same assumptions: e.g. Chi-squared [χ^2] test, Mann Whitney U test, Spearman’s rank correlation
What do parametric models assume?
– All parametric models make the same assumptions about their input.
– Normal distribution is at the heart of parametric models
• Interval / continuous data
• Central limit theorem
• Observations must be independent and identically (iid) for the central limit theorem to apply.
– See also lecture and workshop week 6
– Homogeneity of variance
– Linearity (for continuous predictors in regression models)
What do parametric models assume?
– Linearity
– Independence
– Normality
– Equal variance (aka homogeneity)
density plots
– Relative likelihood of x taking on a certain value.
– The normal distribution is defined by its density function.
– We don’t need to worry about the maths here.
symmetric
– Left and right half are mirror images of each other
– Mean = Mode = Median
Example for non-normal responses
– Psychometric scales are neither continuous nor linear (see intro of Bürkner & Vuorre, 2019).Caveats of normal distributions
Psychometric scale; see Robinson (2018) – Response categories – Limited discrete options (vs sliders) – Ordinal: implicit order – Not equidistant (vs, say, inch) – See Liddell & Kruschke (2018) – We will see why the use of lms is not unjustified.
Caveats of normal distributions
– Strictly speaking, nothing is really normal distributed.
– Most variables have an upper and lower bound, e.g., people can’t be fast than 0 secs or smaller than 0 inch.
– All observations are discrete in practice due to limitations of our measuring instruments.
– However, a normal distribution is often suitable for practical considerations.
– So we typically want data to be distributed approximately normal.
Interim summary
– Parametric models assume that the data are normal distributed.
– However, psychologists often obtain non-normal distributed data.
– Why do we bother with the normal distribution?
– We will see in the following that the data don’t need to come from a normal distribution at all.
– The reason is the central limit theorem.
Central limit theorem (CLT)
The sampling distribution will be approximately normal for large sample sizes, regardless of the (type / shape of the) distribution which we are sampling from.
We can use parametric statistical inference even if we are sampling from a population that is weird (i.e. not normal distributed).
From week 6: mean of sampling distribution is estimate of population mean (μ; Greek mu)
Works also for totals (e.g. IQ), SDs, etc.
Demo of CLT
– CES-D scale: self-report depression (Radloff, 1977)
– 22 items to assess the degree of depression
– 5-point Likert scale: Strongly disagree - Strongly agree
– Item 1: I was bothered by things that usually don’t bother me.
– Item 2: I had a poor appetite.
– Item 3: I did no feel like eating, even though I should have been hungry.
Simulate one participant
ppt_1
Repeat for another participant
(ppt_2
Calculate means for each participant
mean(ppt_1); mean(ppt_2)
– The sample distribution will approach normality as the number of participants increases.