Page 8 - Preparing for Analysis Flashcards by John Doe

Before conducting a statistical analysis you need to check your data for..

Accurate of data entry
Missing date
Outliers
Normality
Linearity, homoscedasticity, homogenity of variance
Independence
Multicollineratity and singularity (MANOVA and multiple regression)
Other assumptions

How well did you know this?

Not at all

Perfectly

What can you use to check Data entry?

SPSS Procedures

or, Frequencies

How well did you know this?

Not at all

Perfectly

You must consider the … and … of missing data

Amount - pattern

How well did you know this?

Not at all

Perfectly

Amount of data missing

If more than 5% check patterns

How well did you know this?

Not at all

Perfectly

What kind of patterns of missing data?

MCAR (missing completely at random)
MAR (Missing at random)
MNAR (Missing not at random)

How well did you know this?

Not at all

Perfectly

MAR

A pattern of missingness predictable from other variables in the data set

How well did you know this?

Not at all

Perfectly

MNAR

A pattern of missingness related to the variable itself

How well did you know this?

Not at all

Perfectly

Litte’s MCAR test - when is data MCAR?

If p value is above 0.05 (non-significant difference from MCAR test/mean)

How well did you know this?

Not at all

Perfectly

Missed data can be checked by?

List-wise deletion
Mean substitution
Expectation maximisation
Multiple imputation

How well did you know this?

Not at all

Perfectly

List-wise used when

Few cases missing
Variables not critival to your analysis
Data are missing at random
Missing data on a different variable

How well did you know this?

Not at all

Perfectly

Mean substitution

Replacing value with the mease of cases across items

Not highly recommended, can skew mean

How well did you know this?

Not at all

Perfectly

Expectation maximisation

Estimated the shape of the distribution and infering the liklihood the value falling with that distribution
Most simple and reasonable with random missing data

How well did you know this?

Not at all

Perfectly

Multiple imputation

Used regregression to predict values based on other variables in your dataset
Most respectable, can be used a MNAR MCAR
More difficult

How well did you know this?

Not at all

Perfectly

An outlier is…

A case with such an extreme value on one variable (univariate) or such a strange combination of score on two or more variables (multivariate) that is distorts statistics
Can lead to type 1 (false positive) and type 2 (false negative) results

How well did you know this?

Not at all

Perfectly

When can an outlier occur?

Participant interpreted question incorrectly
Experiementer eorror
Participants answer comes from different population
Population of participants has extreme values and is not normally distributed

How well did you know this?

Not at all

Perfectly

Checking univariate outliers

Frequency distribution in histogram
Box-plots
Normal probability plots
Calculating standariised scrores (Z-scores +- 3.29

How well did you know this?

Not at all

Perfectly

Before checking univariate outliers, determine…

Study These Flashcards

Ungrouped (Correlations, regression and factor analysis) or,

Split by group (t-tests, ANOVAs, ANCOVAs, MANOVAs, logistic regression and discriminant analysis

Checking multivariant outliers

Study These Flashcards

Mahalanobis distance (chi squared distribution used x2)
Leverage
Discrepancy
Influence (cooks distance)

Five methods of addressing Outliers

Study These Flashcards

Ignoring data points
Deleting individual Data points
Running analysis with and without outlier/s
Modification to reduce the bias through winsorizing or tremming (only univariate)
Transforming data for large data sets (univariate, normally too complex for multivariate)

Kurtosis

Study These Flashcards

Peakedness of distribution
Positive (Leptokurtic) - High
Negative (Platykurtic) - Flat (looks like flat platypus)
https://img.tfd.com/mk/K/X2604-K-11.png

What tests used for normality?

Study These Flashcards

Kolmogorov-Smitov (out-dated)

Shapiro-Wilk

Weaskness of normality tests

Study These Flashcards

As sample size, and statistical power increases, tests can appears statistically significant (suggesting skewness) despite data points being normally distributed

What does Field recommend when testing for normality?

Study These Flashcards

You assess the extent of non-normality in your data using ‘converging evident of multiple techniques
Box-plots, histograms, and/or normality tests

Non-normal data is more likely to result in Type 1, or Type 2 errors?

Study These Flashcards

Type 1 (False Positive, incorrect rejection of the H0)

Is there a normality test for multivarite normality in SPSS

No, however, is Shapiro Wilk is non-significant, can assume multivariate normality

What is normality of residuals

Used for ungrouped data | Difference between observed and expected valiues on a variable should be normally distributed

What is linearity

Straight line relationship betweem variables | *Seen by bivariate scatterplot, or residuals plots (multiple regression/linearity of residuals)

Field (2018) states that linearity is one of the most important assumptions to meet for you analyses, as it underpins the process that you want to model. True or False?

True

What is Homoscedasticity

Used for ungrouped data Assumption in regression analysis that the residulas on the coninuum of scores for the predictor variable are fairly conistent and have similiar variances (

What is Homogeneity of Variance

Same as Homoscedasicity, but for grouped data 'the assumption that the variance of one cariable is stable at all levels of another variable. Linear relationship, not exponential etc.

What is independance of Observations

Each participant only participates once in the research + no influence of participants on other participants

What is independance of residuals/errors

Errors in your model are not related to each other | Durbin-Watson test statistic used to check 1-3 preferred

What are multicollinearity and singularity

'Problems with a correlation matric that occurs when variables are too highliy correlated

Multicollinearity

Variables are very highly correlated (>0.8)

Singularity

Variables are redundant, one variable is a combination of two or more other variables

What is Additivity?

The combined effect of individual predictors on an outcome variable is best represented by adding these individual effects together

What test is used to check for homegeneity of variance?

Levene's test Non-significant = homogeneity of variance Signnificant = Heterogeneity of variance

Trimming Data

Deletion of cases with extreme values * Percentage based * Standard deviation based

Winsorising

Extreme scores are replace with value that is not as extreme * Next highest or lowest score * Reaplce with next highest/lowest score that is not an outlier * Replace with score that is +- 3.29 Standard deviations from the mean

Non-parametric statistics

Do not rely on normally distributed data - Spearmann correlation

Robust methods

Trimmed mean/M-Estimator | Bootstrapping (estimates parameters of the same distri ution based on the sample data

Page 8 - Preparing for Analysis Flashcards

(41 cards)