Fundamental assumptions of parametric models Flashcards

1
Q

reminders

A
  • Models are approximations of reality focusing on practically relevant aspects.
  • “All models are wrong, but some are useful.” – George Box
    – Help us to understand a complex matter.
    – Ideally we want to be able to assess limitations to their usefulness.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are the model assumptions important?

A

– Machines need input.
– Perform operations on input.
– Always give some output.
– Parametric statistical models make assumptions about the input they receive.
– Reliability of output depends on the fit of input and assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a parametric model?

A

A family of probability distributions with a finite number of parameters.
E.g. normal distribution has two parameters: mean and standard deviation
Normal distribution is entailed in t-test, ANOVA, linear regression
Those models make parametric assumptions about their input.
Non-parametric models do not make the same assumptions: e.g. Chi-squared [χ^2] test, Mann Whitney U test, Spearman’s rank correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do parametric models assume?

A

– All parametric models make the same assumptions about their input.
– Normal distribution is at the heart of parametric models
• Interval / continuous data
• Central limit theorem
• Observations must be independent and identically (iid) for the central limit theorem to apply.
– See also lecture and workshop week 6
– Homogeneity of variance
– Linearity (for continuous predictors in regression models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do parametric models assume?

A

– Linearity
– Independence
– Normality
– Equal variance (aka homogeneity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

density plots

A

– Relative likelihood of x taking on a certain value.
– The normal distribution is defined by its density function.
– We don’t need to worry about the maths here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

symmetric

A

– Left and right half are mirror images of each other

– Mean = Mode = Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Example for non-normal responses

A

– Psychometric scales are neither continuous nor linear (see intro of Bürkner & Vuorre, 2019).Caveats of normal distributions

Psychometric scale; see Robinson (2018)
–	Response categories
–	Limited discrete options (vs sliders)
–	Ordinal: implicit order
–	Not equidistant (vs, say, inch)
–	See Liddell & Kruschke (2018)
–	We will see why the use of lms is not unjustified.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Caveats of normal distributions

A

– Strictly speaking, nothing is really normal distributed.
– Most variables have an upper and lower bound, e.g., people can’t be fast than 0 secs or smaller than 0 inch.
– All observations are discrete in practice due to limitations of our measuring instruments.
– However, a normal distribution is often suitable for practical considerations.
– So we typically want data to be distributed approximately normal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interim summary

A

– Parametric models assume that the data are normal distributed.
– However, psychologists often obtain non-normal distributed data.
– Why do we bother with the normal distribution?
– We will see in the following that the data don’t need to come from a normal distribution at all.
– The reason is the central limit theorem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central limit theorem (CLT)

A

The sampling distribution will be approximately normal for large sample sizes, regardless of the (type / shape of the) distribution which we are sampling from.
We can use parametric statistical inference even if we are sampling from a population that is weird (i.e. not normal distributed).
From week 6: mean of sampling distribution is estimate of population mean (μ; Greek mu)

Works also for totals (e.g. IQ), SDs, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Demo of CLT

A

– CES-D scale: self-report depression (Radloff, 1977)
– 22 items to assess the degree of depression
– 5-point Likert scale: Strongly disagree - Strongly agree
– Item 1: I was bothered by things that usually don’t bother me.
– Item 2: I had a poor appetite.
– Item 3: I did no feel like eating, even though I should have been hungry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Simulate one participant

A

ppt_1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Repeat for another participant

A

(ppt_2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Calculate means for each participant

A

mean(ppt_1); mean(ppt_2)

– The sample distribution will approach normality as the number of participants increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Independent and identically distributed (iid)

A

– One observation must be unrelated from the next.
– Assessing the spread of COVID infections: sample only one person per house hold

– Self-report depression
– Item 1: I was bothered by things that usually don’t bother me.
– Item 2: I had a poor appetite.
– Item 3: I did no feel like eating, even though I should have been hungry.
– Different questions related to the same psych phenomenon.
– Violations:
– repeating the same questions
– testing the same people multiple times
– not randomising the presentation order
– Consequence:
Unreliable / biased results

17
Q

Identical distribution

A

– Observations must come from the same distribution
– or family of distributions: e.g. normal, count, ordinal, binary
– Depression example: 22 items about depression, all 5-point Likert scale

18
Q

– Violation:

A
  • measuring responses on different scales (6-point Likert, continuous scale)
  • studying the effect of snapchat on self esteem but including people without snapchat
  • asking questions about coffee preference to measure depression
19
Q

R studio - create data that should be normally distributed

A

n

20
Q

rnorm

A

x