1.8 Preparing for analysis Flashcards

1
Q

Before conducting a statistical analysis you need to check your data for eight things:

A
  1. Accuracy of data entry,
  2. Missing data,
  3. Outliers,
  4. Normality,
  5. Linearity, homoscedasticity, and homogeneity of variance,
  6. Independence,
  7. Multicollinearity and singularity (MANOVA and multiple regression).
  8. Other assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Missing data may be addressed through a range of approaches such as

A

list-wise deletion, mean substitution, expectation-maximization, multiple imputation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

As defined by Tabachnick and Fidell (2019, p. 63), an outlier is

A

“a case with such an extreme value on one variable (a univariate outlier) or such a strange combination of scores on two or more variables (multivariate outlier) that it distorts statistics”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If not identified and processed, outliers can lead to

A

Both Type I and Type II errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

There are several ways that outliers can be addressed that include

A
  • ignoring (non-influential) data points (univariate, multivariate),
  • deleting individual data points, if sample size can accommodate for this (univariate, multivariate),
  • running the analysis with and without the outlier/s to justify keeping the outlier/s (univariate, multivariate),
  • modification to reduce the bias of the data through winsorizing or trimming data (univariate), and
  • transforming data for large data sets (univariate, can be extremely complex for multivariate).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Occasionally, new multivariate outliers may have been identified following deletions or original outliers. This happens because once you remove a single outlier, the data set becomes more consistent and new data points will become

A

extreme points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Distributional information, such as skewness and kurtosis values, can provide indicators of

A

symmetry and peakedness of a variable’s distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Skewness relates to the

A

symmetry of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Positive skew is depicted when most scores are clustered at the

A

lower end of the distribution,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kurtosis refers to the

A

peakedness of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A positive skew is described as ________ and a negative skew is described as:

A

leptokurtic; platykurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Screening the residuals for normality is common practice when conducting data analyses for

A

ungrouped data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Linearity (straight-line relationships between variables) can be observed graphically through

A

bivariate scatterplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For ungrouped data, the assumption of homoscedasticity refers to

A

assumption in regression analysis that the residuals on the continuum of scores for the predictor variable are fairly consistent and as such have similar variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For grouped data, Homogeneity of variance is

A

the assumption that the variance of one variable is stable (i.e. relatively similar) at all levels of another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

There are often two types of assumptions of independence often referred to in statistics which are

A

Independence of Observations and Independence of Residuals/Errors

17
Q

Independence of Observations requires each participant to

A

participate only once in the research and as such only contribute one set of data

18
Q

the assumption of Independence of Residuals/Errors is the assumption that

A

errors in your model are not related to each other

19
Q

the Durbin-Watson test statistic is used to

A

assess for serial correlations (autocorrelation) of errors

20
Q

Multicollinearity and singularity are

A

problems with a correlation matrix that occurs when variables are too highly correlated. With multicollinearity, the variables are very highly correlated (say, above .80); with singularity, the variables are redundant; one of the variables is a combination of two or more of the other variables

21
Q

Investigation of Tolerance and Variance Inflation Factors can help determine

A

whether multicollinearity is a problem within your sample

22
Q

the assumption of sphericity relates to

A

repeated measures ANOVA and mixed model ANOVA designs

23
Q

Sphericity assumes that

A

variances of the differences between data taken from the same participant are equal

24
Q

Field (2018, p. 283) suggests that nonparametric statistics based on ranks are not affected by

A

small sample sizes, extreme scores, and outliers, and they do not require a normally distributed sample

25
Q

Allen, Bennett & Heritage (2018) suggest that non-parametric tests should be used with

A

ordinal data, and/or where the sample is not normally distributed