Week 3 & 4 Flashcards
Parametric
assess group means
normal distribution
can deal with unequal variances across groups
generally more powerful
non-parametric
assesses group medians
don’t require normal distribution
can handle small sample size
parametric test assumptions
additivity and linearity
normality
homogeneity of variance
independence of observations
additivity and linearity
outcome is a linear function of the predictors X1 and X2, and the predictors are added together
outcome y is an additive combination of the effects of X1 and X2
Assessing linearity
observed vs predicted values (symmetrically distributed around diagonal line)
residuals vs predicted values (symmetrically distributed around horizontal line)
fixing non-linearity
apply non linear transformation to variables
add another regressor that is a nonlinear function (polynomial curve)
examine moderators
central limit theory
as the sample size increases towards infinity, the sample distribution (NOT DATA) approaches normal distribution
skewness
how symmetrical the data is
positive: scores bunched at low values, tail pointing to high values
negative: scores bunched at high values, tail pointing to low values
kurtosis
how much the data clusters either at the tails/ends or peak of the distribution
leptokurtic
heavy tails
platykurtic
light tails
normality checks
Q-Q plot compares sample quantiles to quantiles of normal distribution; normal= forms straight line
Shapiro wilkes test: tests if data differs from normal distribution; normal=p>.05, data does not vary significantly from a normal distribution
histogram
homogeneity of variance
all groups or data points have same or similar variance
equal distribution above and below horizontal line on residual vs predicted plot= homoscedasticity
heteroscedasticity would be cone shapes
Independence
residuals unrelated
if non-independent: downwardly biased SE (too small) and incorrect statistical inference (p values <.05 when they should be >.05)
Univariate outlier
outlier when considering only the distribution of the variable it belongs to
bias mean and inflate SD
Bivariate outlier
outlier when considering the joint distribution of two variables
Multivariate outlier
outliers when simultaneously considering multiple variables, difficult to assess using numbers or graphs
bias relationship between two variables e.g. change strength
changing the data= winsorizing
next highest value plus some small number
z score of +/- 3.29
mean plus 2 or 3 SDs
percentile of distributions
winsorizing
a predefined quantum of the smaller and/or largest values are replaced by less extreme values
linear transformations
adding constant to each value
converting to z score (x-m)/SD
mean centering (x-m)
non linear transformations
log, log (x) or In (x)
square root of x
reciprocal, 1/x
log (x)
reduce positive skew and stabilise variance
positive values >0
square root of x
reduce positive skew and stabilise variance
zero and positive values
1/x
reduce impact of large scores and stabilise variance
score reversal can be avoided by reversing before transforming: 1/(xhighest-x)
variance
average squared distance from mean
linked to sum of squares
covariance
how much two variables differ from their means
linked to sum of cross products
correlation coefficient
standardised version of covariance
divide covariance by SD of both variables
pearsons correlation assumptions
interval/ratio variables
normality
linearity
coefficient of determination
r^2
r^2 in spearmans correlation
proportion of variance in the ranks that the two variables share
partial correlations
measure the association between two variables, controlling for the effects that a third variable has on them both
semi-partial correlations
part correlation
measures the relationship between two variables, controlling for the effect that a third variable has on one of the others
zero order correlations
measuring correlation between two variables when not controlling for anything
excluding cases pairwise
for each correlation, exclude particiapnts who do not have a score for both vairables
excluding cases listwise
across all correlations, exclude particiapnts who do not have a score for each variable
linear regression looks at
direction (unstandardised B)
magnitude (standardised beta)
significance (p<.05)
Standardised coefficients (beta)
show the expected SD change in DV for a 1SD change in IV