Week 3 - Parametric test assumptions Flashcards

1
Q

What are the features of a parametric test?

A
  • assess group means
  • data must have normal distribution (+CLT)
  • unequal variances allowed
  • more powerful
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the features of a non-parametric test?

A

e.g. correlation tests

  • assess group MEDIANS
  • data doesn’t need to be normally distributed
  • can handle small sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Questions to ask yourself when deciding to use a parametric test or not

A
  • sample size

- best way to measure central distribution (e.g. median or mean?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the parametric test assumptions? (4)

A
  1. Additivity and linearity
  2. Normality (Gaussian distribution/Bell curve)
  3. Homogeneity of variances
  4. Independence of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the assumption of Additivity and linearity

A

Involves a standard linear model/ equation (describing a straight line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Standard linear model equation

A

Yi - b0 + b1X1+Ei

Yi= the ith person’s score on the outcome variable

B0= Y-intercept. value of Y when X = 0. point at which the regression line crosses the y-axis

B1 = regression coefficient for the first predictor (B2 for the second predictor).

  • Gradient (slope/ rise over run) of the regression
  • Direction/ strength of relationship

Ei= the difference between the actual and predicted value of Y for the ith person
- residual/ error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it mean for data to be linear and additive?

A
  • X1 and X2 predict Y.
  • The outcome is a linear function of the predictors (X1 + X2)
  • predictors are added together & do not depend on values of other variables in as in a multiplicative model

The outcome Y is an additive combination of the effects of X1 and X2. e.g. as both X1 and X 2 increase, Y increases also

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or false:

The outcome Y is an additive combination of the effects of X1 and X2. e.g. as both X1 and X 2 increase, Y increases also

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we assess linearity?

A
  • plot observed vs predicted values (symmetrically distributed around diagonal line)
  • plot residuals vs predicted values (symmetrically distributed around diagonal line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to fix non-linear equations?

A
  • nonlinear transformation to variables
  • another regressor that is nonlinear - function - polynomial curve
  • examine moderators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the assumption of Normality

A

relevant to:

  • parameters (sampling distribution)
  • residuals/ error terms
  • -> confidence intervals around parameter
  • -> Null hypothesis significance testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Central Limit Theorem (CLT)?

A

As the sample size increases toward infinity (gets larger), the sampling distribution approaches normal.

–> sample means will be normally distributed thus you don’t need to worry too much about the distribution that the samples came from.

–> distribution of means from many samples and re-samples

–>sample size must be AT LEAST 30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For CLT to apply, what size must the sample size be?

A

At least 30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or false
According to CLT -
Even if the data is not normal, the sampling distribution of the data will be normal

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True or false

Positively skewed data gathers on the left side and scores bunch at the low values with tails pointing to high values

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or false

Negatively skewed data gathers on the left side and scores bunch at the low values

A

false - it gathers on the left (e.g. as you grow conditions get “worse” in life)

they bunch at the high values with tails pointing to low values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is kurtosis?

A

The amount which data clusters in either the tails (ends) or the peak (tallest part) of the distribution

  • heaviness of tails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Draw the following:
Negative Kurtosis
Positive Kurtosis
Normal distribution

Leptokurtic (heavy tails)
Mesokurtic
Platykurtic (light tails)

A

draw on paper

19
Q

What are properties of frequency distributions?

A
  • Skewness

- Kurtosis

20
Q

Checking the distribution to determine if the assumption of normality is met is important. Which graphical displays are used to test for normality?

A

Q-Q plots (dots on straight line = normal)

Histograms

21
Q

What is the name for the software (e.g. JASP) based method for testing for normality?

A

Shapiro Wilkes Test

22
Q

Describe the Shapiro Wilkes Test and what a p value of <0.05 means

A
  • tests if data is different from normal distribution

- p < 0.05 = data varies significantly from normal distribution thus normality is violated

23
Q

In Shapiro Wilkes Test, what does a p value >0.05 mean?

A

Data des not vary significantly from a normal distribution thus the normality assumption is not violated

24
Q

Describe the assumption of homogeneity of variance

A

Assumes all groups or data points have the same or equal variances = the assumption of equal variances

25
Q

What does homoscedasticity mean?

A

All groups have equal/ similar variances

26
Q

What does hetroscedasticity mean?

A

All data points/ groups do NOT have equal variances. = unequal variances

27
Q

Define the “error”

A

The variance from the residual line

Error from what we predicted the y would be based on its X value and what we actually observed from the true data

28
Q

Describe the assumption of independence of observation

A

Assumes that you do not have repeated measures of data.

  • residuals (errors) are unrelated
  • assume based on study design
29
Q

According to the assumption of independence of observations, what happens when observations are non-independent?

A

results in downwardly biased standard errors. (too small) thus incorrect statistica inferences (p values < 0.05 when they should be > 0.05)
–> false significant p values

—> this is why it is important to know study design

—> important for mean values of the outcome to come from a different person or other unit (e.g. family, school)

30
Q

What is is an univariate outlier?

A

outlier when considering only the distribution of the variable it belongs to

31
Q

What is a bivariate outlier?

A

outlier when considering the joint distribution of two variables
- breaking away from the pattern of the association between two variables

32
Q

What is a multivariate outlier?

A

outliers when simultaneously considering multiple variables.

33
Q

What type of outlier is difficult to asses using numbers or graphs?

A

multivariate outliers

34
Q

What types of outliers bias the mean and inflate the standard deviation?

A

Univariate outliers

35
Q

What types of outliers bias the RELATIONSHIP between two variables e.g. change the strength

A

bivariate outliers

36
Q

What are the three ways to deal with outliers?

A

REMOVE the case or trim the data

TRANSFORM the data

CHANGE the score (winsorizing) pulling the data in e.g. biological data (must be transparent about it when reporting results)

37
Q

What are some reasons for transforming data?

A
  1. ease of interpretation - standardisation e.g. z -scores allow for simpler comparisons

2, reducing skewness - closer to normality

  1. equalising spread/ improving homogeneity of variances
  2. linearising relationships between variables - to fit non-linear relationships into linear models
  3. making relationships additive therefore fulfilling assumptions for certain tets
38
Q

Do linear transformations change the shape of the distribution ?

What do they change?

A

No

Changes the value of the mean/ SD but shape remains unchanged

39
Q

How do linear transformations work?

A
  • adding constant to each number, x + 1
  • converting raw scores to z-scores (x-m)/SD
  • mean centring, x- m
40
Q

What type of transformation changes the shape of the distribution?

A

non-linear transformations

  • Log, log(X) or ln(x)
  • Square root of x
  • Reciprocal, 1/x
41
Q

Can you use a log transformation [log(x)] on data with positive values and if you want to reduce positive skew and stabilise variance?

A

yes

42
Q

When would you use a square root transformation?

A
  • reduce positive skew
  • stabilise variance
  • defined for zero/ positive values
43
Q

When would you use a reciprocal transformation?( 1/x)

A
  • reduce impact of large scores
  • stabilise variance
  • it reverses the scores so this can be avoided by reversing the scores before transforming 1/ (Xhighest - X lowest)
44
Q

What are the negatives of transforming data?

A
  • non-linear transformations (used to normalise distribution e.g. log, square root, reciprocal ) CHANGE the data & results –> 1 unit increase on the natural log scale might be different
  • Transformation can hider if wrong transformation applied
  • Makes interpretation difficult (dealing with raw sores and transformed)