Parametric test assumptions Flashcards
Define
Parametric test
tests that make assumptions about the parameters of the population distribution from which the sample is drawn
Define
Outlier
a data point that differs significantly from other observations
Define
Linear transformation
a function from one vector space to another that respects the underlying (linear) structure of each vector space
Define
Non-parametric test
tests don’t assume that your data follow a specific distribution
Define
Central limit theorem
states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement , then the distribution of the sample means will be approximately normally distributed
Define
Normality
the sampling distribution of the mean is normal or that the distribution of means across samples is normal
Define
Homogeneity of variance
the assumption that all groups have the same or similar variance
Define
Independence
means that your data isn’t connected in any way (at least, in ways that you haven’t accounted for in your model)
Define
Residual
The difference between the observed value of the dependent variable (y) and the predicted value (ŷ)
Define
Kurtosis
a measure of the combined weight of a distribution’s tails relative to the center of the distribution
Define
Leptokurtic
having greater kurtosis than the normal distribution; more concentrated about the mean
Define
Mesokurtic
having the same kurtosis as the normal distribution
Define
Platykurtic
a statistical distribution in which the excess kurtosis value is negative
Define
Shapiro Wilkes Test
a test that examines if a variable is normally distributed in a population
Define
Q-Q Plot
a scatterplot created by plotting two sets of quantiles against one another
Define
Univariate outlier
outlier when considering only the distribution of the variable it belongs to
Define
Bivariate outlier
outlier when considering the joint distribution of two variables
Define
Multivariate outlier
outliers when simultaneously considering multiple variables
Define
Log transformation
A type of transformation that can be used to reduce positive skew and stabilise variance and is only defined for positive values > 0
Define
Square root transformation
A type of transformation that can be used to reduce positive skew and stabilise variance. It is defined for zero and positive values
Define
Reciprocal transformation
A type of transformation that can reduce the impact of large scores and stabilize variance. Transformation reverses the scores, but can be avoided by reversing the scores before transforming
Definition
tests that make assumptions about the parameters of the population distribution from which the sample is drawn
Parametric test
Definition
a data point that differs significantly from other observations
Outlier
Definition
a function from one vector space to another that respects the underlying (linear) structure of each vector space
Linear transformation
Definition
tests don’t assume that your data follow a specific distribution
Non-parametric test
Definition
states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement , then the distribution of the sample means will be approximately normally distributed
Central limit theorem
Definition
the sampling distribution of the mean is normal or that the distribution of means across samples is normal
Normality
Definition
the assumption that all groups have the same or similar variance
Homogeneity of variance
Definition
means that your data isn’t connected in any way (at least, in ways that you haven’t accounted for in your model)
Independence
Definition
The difference between the observed value of the dependent variable (y) and the predicted value (ŷ)
Residual
Definition
a measure of the combined weight of a distribution’s tails relative to the center of the distribution
Kurtosis
Definition
having greater kurtosis than the normal distribution; more concentrated about the mean
Leptokurtic
Definition
having the same kurtosis as the normal distribution
Mesokurtic
Definition
a statistical distribution in which the excess kurtosis value is negative
Platykurtic
Definition
a test that examines if a variable is normally distributed in a population
Shapiro Wilkes Test
Definition
a scatterplot created by plotting two sets of quantiles against one another
Q-Q Plot
Definition
outlier when considering only the distribution of the variable it belongs to
Univariate outlier
Definition
outlier when considering the joint distribution of two variables
Bivariate outlier
Definition
outliers when simultaneously considering multiple variables
Multivariate outlier
Definition
A type of transformation that can be used to reduce positive skew and stabilise variance and is only defined for positive values > 0
Log transformation
Definition
A type of transformation that can be used to reduce positive skew and stabilise variance. It is defined for zero and positive values
Square root transformation
Definition
A type of transformation that can reduce the impact of large scores and stabilize variance. Transformation reverses the scores, but can be avoided by reversing the scores before transforming
Reciprocal transformation
What is the difference between parametric and non-parametric tests?
Parametric:
- Assess group means
- Require that your data follow the normal distribution
- Except for large sample sizes due to central limit theorem
- Can deal with unequal variances across groups
- More powerful
Non-parametric:
- Assess group medians
- Don’t require that your data follow the normal distribution
- Can deal with small sample sizes
When deciding between a parametric and non-parametric test what questions should you ask yourself?
What is the best central tendency measure for your data?
What is your sample size?
Parametric tests are based on the normal distribution and have what assumptions?
- Additivity and linearity
- Normality
- Homogeneity of variance
- Independence
What is the standard linear model equation and what do the variables represent?
Yi = b0 + b1x1 + b2x2 + ei
- Yi =* outcome variable
- b0* = y-intercept
- x1 & x2 =* predictor variables
- b1 & b2 * = slope of predictors
- ei* = error
True or False:
In the Standard linear model, the slope of effect of one predictor does not depend on the values of other variables.
True
What does linear and additive mean about variables x1, x2 and y?
Linear and additive data mean that x1 and x2 predict y
The outcome y is an additive combination of the effects of x1 and x2; it looks like y increases as both x1 and x2 increases.
How can we assess linearity?
- Plot of observed vs predicted values (symmetrically distributed around diagonal line)
- Plot of residuals vs predicted values (symmetrically distributed around horizontal line)
- look out for bow shape to know that you have violated
How do you fix when additivity and linearity are voided?
- Apply nonlinear transformation to variables
- Add another regressor that is a nonlinear function – polynomial curve
- Examine moderators
What is sample size is large enough for the central limit theorem to apply?
>30 participants
Is this positively or negatively skewed?

Negative
What are the three types of kurtosis (in order of increasing central value height)?
Platykurtic (negative)
Mesokurtic (normal distribution)
Leptokurtic (positive)
When assessing normality what graphical displays do we use to check data or residuals?
Q-Q plot
Histogram
What are the two main tests of normality?
Shapiro Wilkes test
Q-Q plot
What does a Shapiro Wilkes test do?
- Tests if data differ from normal distribution
- Statistically significant (p < .05) → data varies significantly from a normal distribution (i.e., normality assumption is violated)
- Not statistically significant (p > .05) → data does not vary significantly from a normal distribution (i.e., normality assumption is not violated)
What does a Q-Q plot do?
What are the three types of outliers?
Univariate
Bivariate
Multivariate
What type of outlier is this?

Univariate for both variables
What type of outlier is this?

Bivariate
How do you deal with outliers?
- Remove the case or trim the data
- Transform the data
- Change the score (known as winsorizing):
- Change the score to the next highest value plus some small number (e.g., 1, or whatever is appropriate to the scale of the data)
- Convert the score to that expected for a z-score of +-3.29
- Convert the score to the mean plus 2 or 3 standard deviations
- Convert the score to a percentile of the distribution (e.g., 0.5th or 99.5th percentile)
Why is it a good idea to transform data?
- For convenience or ease of interpretation – standardisation, e.g. z scores allow for simpler comparisons
- Reducing skewness – help get closer to meeting normality assumption
- Equalising spread or improving homogeneity of variance – produce approximately equal spreads
- Linearising relationships between variables – to fit non-linear relationships into linear models
- Making relationships additive and therefore fulfilling assumptions for certain tests
What is the difference between a linear and non-linear transformation?
Linear transformations do not change the shape of the distribution. It may change the value of the mean and/or standard deviation, but the shape of the distribution remains unchanged.
Non-linear transformations change the shape of the distribution
What are some examples of linear transformations?
Adding a constant to each number, x + 1
Converting raw scores to z-scores, (x – m)/SD
Mean centring, x – m
What are some examples of non-linear transformations?
Log, log(x) or ln(x)
Square root, 𝑋
Reciprocal, 1/x