Fundamental Assumptions of Parametric Models Flashcards

1
Q

What are statistical models?

A

1) Models are approximations (representations) of reality focusing on practically relevant aspects.
2) “Models are always wrong, but sometimes useful” George Box
3) Statistical models are also machines (intentionally overly complicated to carry out simple function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do models help us do?

A

1) Models help us to understand a complex matter.

2) Ideally we want to be able to assess limitations to their usefulness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a normal distribution of data also commonly called?

A

A Bell Curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we use to visualise normal distribution of data?

A

1) Histograms: counts / frequency of observations x.

2) Density plots/Defined by its density function: relative likelihood of x taking on a certain value
(smooth continuous line of bell curve)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the features of a normal distribution?

A

1) Symmetric: Left and right halves of data are mirror images of each other
2) Mean = Mode = Median
3) Tails never hit zero
4) They are characterised by mean and standard deviation

5) The ‘standard’ normal distribution is
mean = 0 and SD = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Central Limit Theorem (CLT) suggest?

A

The sampling distribution will be approximately normally distributed for very large sample sizes, regardless of the (type / shape) of the distribution of our sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does Central Limit Theorem (CLT) allow us to do in relation to parametric models?

A

We can use parametric statistical inference even if we are sampling from a population that is weird (i.e. not normally distributed).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the mean from our sample?

A

The mean of a sampling distribution is an estimate of the population mean (μ; Greek mu)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are the model assumptions important?

A

1) Machines need input to perform operations on whatever is input.
2) Parametric statistical models make assumptions about the input they receive
3) They will always give some output.
4) Reliability of the output depends on how well the input ‘fits’ these assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a parametric model?

A

1) A family of probability distributions with a finite number of parameters.
2) Statistical models that make parametric assumptions about their input (non-parametric models do not make the same assumptions: e.g. Chi-squared test, Mann Whitney U test, Spearman’s rank correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do parametric models assume about data distribution?

A

Normal distribution:
-Normal distribution has two parameters: mean (m) and standard deviation (sd)

-Normal distribution is assumed in t-tests, ANOVAs and linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Other than normal distribution, what do parametric models assume?

(hint: LINE)

A

1) Linearity (for continuous predictors in regression models)
2) Independence (observations separate from one another?)
3) Normality (normally distributed)
4) Equal variance (aka homogeneity)
5) Remember using LINE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What must observations be for Central Limit Theorem CLT to apply?

A

Observations must be independent and identically distributed (IID) for the central limit theorem to apply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Normal distribution requires which kind of data?

A

Interval / Continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the ‘Normality’ assumed by parametric models?

A

Parametric models assume normal distribution of data

-Interval / Continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the properties of a Normal Distribution plot? (Histogram/ Density plot)

A

1) x is continuous
2) y is defined for every value of x
3) Ranges from - to +
4) Area under the curve is 1 (100%)

17
Q

How is the area under the curve distributed (normally)?

A

Area under the curve is 1 (100%)

  • 68% within 1 SD
  • 95% within 2 SD
  • 99.7% within 3 SD
18
Q

Give an example of non-normal responses.

A

Psychometric scales are neither continuous nor linear (see intro Bürkner & Vuorre, 2019)

19
Q

Why are psychometric data (often) not continuous / normal distributed or linear? (4 reasons)

A

Psychometric scale (Robinson, 2018)

  • Response categories
  • Limited discrete options (vs sliders)
  • Ordinal: implicit order
  • Not equidistant (no standard distance (cms/inches) instead ‘very strongly’ v ‘strongly’) (Liddell & Kruschke, 2018)

–However, sometimes the use of lms is justified

20
Q

What are the 5 caveats of normal distribution?

A

1) Strictly speaking, nothing is really normally distributed
2) While supposedly infinite…most variables do have an upper and lower bound (e.g., people can’t be fast than 0 secs or smaller than 0 inch) 3)
3) All observations are discrete in practice due to limitations of measuring instruments.
4) However, a normal distribution is often suitable for practical considerations.
5) So we typically want data to be distributed approximately normal.

21
Q

What kind of statistical analyses assume normal distribution? (and mention some that don’t?)

A

-Normal distribution is assumed in t-tests, ANOVAs and linear regression.

-Non-parametric models do NOT make the same assumptions: e.g. Chi-squared [χ2] test,
Mann Whitney U test,
Spearman’s rank correlation

22
Q

How does Central Limit Theorem (CLT) help us (when using discrete/non normally distributed sample data)?

A

1) Even sampling from discrete data but, using sample means, can arrive at a normal distribution
2) CLT suggests that the distribution of sample means approaches normality as the number of participants increases——Sample size is the crux.
3) Also works for totals (e.g. IQ), SDs, etc
4) iid applies (independent and identically distributed)

23
Q

What does Independent and identically distributed (iid) mean?

A

1) Most fundamental assumption for the CLT and therefore statistical tests: involves sampling / obtaining of the data.
2) One observation must be unrelated from the next.
3) Sample is iid IF each observation comes from the same distribution as the others and all observations are mutually independent.

24
Q

Name some Independence violations and consequences (3 and 2)?

A

NB: Tests must include different questions related to the same psych phenomenon.

Violations:

  • repeating the same questions
  • testing the same people multiple times
  • not randomising the presentation order

Consequences:

  • Unreliable results
  • Biased results
25
Q

What does IDENTICAL DISTRIBUTION require?

A

Observations must come from the same distribution
or family of distributions: e.g. normal, count, ordinal, binary

E.g., Depression example: 22 items about depression, all 5-point Likert scale

26
Q

Name some violations of IDENTICAL DISTRIBUTION?

A

Violations:

  • measuring responses on different scales (6-point Likert, continuous scale)
  • e.g., studying the effect of snapchat on self esteem but including people without snapchat
  • asking questions about coffee preference to measure depression