1.8 Preparing for analysis Flashcards
Before conducting a statistical analysis you need to check your data for eight things:
- Accuracy of data entry,
- Missing data,
- Outliers,
- Normality,
- Linearity, homoscedasticity, and homogeneity of variance,
- Independence,
- Multicollinearity and singularity (MANOVA and multiple regression).
- Other assumptions
Missing data may be addressed through a range of approaches such as
list-wise deletion, mean substitution, expectation-maximization, multiple imputation
As defined by Tabachnick and Fidell (2019, p. 63), an outlier is
“a case with such an extreme value on one variable (a univariate outlier) or such a strange combination of scores on two or more variables (multivariate outlier) that it distorts statistics”
If not identified and processed, outliers can lead to
Both Type I and Type II errors.
There are several ways that outliers can be addressed that include
- ignoring (non-influential) data points (univariate, multivariate),
- deleting individual data points, if sample size can accommodate for this (univariate, multivariate),
- running the analysis with and without the outlier/s to justify keeping the outlier/s (univariate, multivariate),
- modification to reduce the bias of the data through winsorizing or trimming data (univariate), and
- transforming data for large data sets (univariate, can be extremely complex for multivariate).
Occasionally, new multivariate outliers may have been identified following deletions or original outliers. This happens because once you remove a single outlier, the data set becomes more consistent and new data points will become
extreme points
Distributional information, such as skewness and kurtosis values, can provide indicators of
symmetry and peakedness of a variable’s distribution
Skewness relates to the
symmetry of the distribution
Positive skew is depicted when most scores are clustered at the
lower end of the distribution,
Kurtosis refers to the
peakedness of the distribution
A positive skew is described as ________ and a negative skew is described as:
leptokurtic; platykurtic
Screening the residuals for normality is common practice when conducting data analyses for
ungrouped data
Linearity (straight-line relationships between variables) can be observed graphically through
bivariate scatterplots
For ungrouped data, the assumption of homoscedasticity refers to
assumption in regression analysis that the residuals on the continuum of scores for the predictor variable are fairly consistent and as such have similar variances
For grouped data, Homogeneity of variance is
the assumption that the variance of one variable is stable (i.e. relatively similar) at all levels of another variable