Page 8 - Preparing for Analysis Flashcards

1
Q

Before conducting a statistical analysis you need to check your data for..

A
  1. Accurate of data entry
  2. Missing date
  3. Outliers
  4. Normality
  5. Linearity, homoscedasticity, homogenity of variance
  6. Independence
  7. Multicollineratity and singularity (MANOVA and multiple regression)
  8. Other assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can you use to check Data entry?

A

SPSS Procedures

or, Frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You must consider the … and … of missing data

A

Amount - pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Amount of data missing

A

If more than 5% check patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of patterns of missing data?

A

MCAR (missing completely at random)
MAR (Missing at random)
MNAR (Missing not at random)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MAR

A

A pattern of missingness predictable from other variables in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

MNAR

A

A pattern of missingness related to the variable itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Litte’s MCAR test - when is data MCAR?

A

If p value is above 0.05 (non-significant difference from MCAR test/mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Missed data can be checked by?

A

List-wise deletion
Mean substitution
Expectation maximisation
Multiple imputation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

List-wise used when

A

Few cases missing
Variables not critival to your analysis
Data are missing at random
Missing data on a different variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mean substitution

A

Replacing value with the mease of cases across items

Not highly recommended, can skew mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Expectation maximisation

A

Estimated the shape of the distribution and infering the liklihood the value falling with that distribution
Most simple and reasonable with random missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multiple imputation

A

Used regregression to predict values based on other variables in your dataset
Most respectable, can be used a MNAR MCAR
More difficult

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An outlier is…

A

A case with such an extreme value on one variable (univariate) or such a strange combination of score on two or more variables (multivariate) that is distorts statistics
Can lead to type 1 (false positive) and type 2 (false negative) results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When can an outlier occur?

A

Participant interpreted question incorrectly
Experiementer eorror
Participants answer comes from different population
Population of participants has extreme values and is not normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Checking univariate outliers

A

Frequency distribution in histogram
Box-plots
Normal probability plots
Calculating standariised scrores (Z-scores +- 3.29

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Before checking univariate outliers, determine…

A

Ungrouped (Correlations, regression and factor analysis) or,

Split by group (t-tests, ANOVAs, ANCOVAs, MANOVAs, logistic regression and discriminant analysis

18
Q

Checking multivariant outliers

A

Mahalanobis distance (chi squared distribution used x2)
Leverage
Discrepancy
Influence (cooks distance)

19
Q

Five methods of addressing Outliers

A

Ignoring data points
Deleting individual Data points
Running analysis with and without outlier/s
Modification to reduce the bias through winsorizing or tremming (only univariate)
Transforming data for large data sets (univariate, normally too complex for multivariate)

20
Q

Kurtosis

A

Peakedness of distribution
Positive (Leptokurtic) - High
Negative (Platykurtic) - Flat (looks like flat platypus)
https://img.tfd.com/mk/K/X2604-K-11.png

21
Q

What tests used for normality?

A

Kolmogorov-Smitov (out-dated)

Shapiro-Wilk

22
Q

Weaskness of normality tests

A

As sample size, and statistical power increases, tests can appears statistically significant (suggesting skewness) despite data points being normally distributed

23
Q

What does Field recommend when testing for normality?

A

You assess the extent of non-normality in your data using ‘converging evident of multiple techniques
Box-plots, histograms, and/or normality tests

24
Q

Non-normal data is more likely to result in Type 1, or Type 2 errors?

A

Type 1 (False Positive, incorrect rejection of the H0)

25
Q

Is there a normality test for multivarite normality in SPSS

A

No, however, is Shapiro Wilk is non-significant, can assume multivariate normality

26
Q

What is normality of residuals

A

Used for ungrouped data

Difference between observed and expected valiues on a variable should be normally distributed

27
Q

What is linearity

A

Straight line relationship betweem variables

*Seen by bivariate scatterplot, or residuals plots (multiple regression/linearity of residuals)

28
Q

Field (2018) states that linearity is one of the most important assumptions to meet for you analyses, as it underpins the process that you want to model. True or False?

A

True

29
Q

What is Homoscedasticity

A

Used for ungrouped data
Assumption in regression analysis that the residulas on the coninuum of scores for the predictor variable are fairly conistent and have similiar variances
(

30
Q

What is Homogeneity of Variance

A

Same as Homoscedasicity, but for grouped data
‘the assumption that the variance of one cariable is stable at all levels of another variable. Linear relationship, not exponential etc.

31
Q

What is independance of Observations

A

Each participant only participates once in the research + no influence of participants on other participants

32
Q

What is independance of residuals/errors

A

Errors in your model are not related to each other

Durbin-Watson test statistic used to check 1-3 preferred

33
Q

What are multicollinearity and singularity

A

‘Problems with a correlation matric that occurs when variables are too highliy correlated

34
Q

Multicollinearity

A

Variables are very highly correlated (>0.8)

35
Q

Singularity

A

Variables are redundant, one variable is a combination of two or more other variables

36
Q

What is Additivity?

A

The combined effect of individual predictors on an outcome variable is best represented by adding these individual effects together

37
Q

What test is used to check for homegeneity of variance?

A

Levene’s test
Non-significant = homogeneity of variance
Signnificant = Heterogeneity of variance

38
Q

Trimming Data

A

Deletion of cases with extreme values

  • Percentage based
  • Standard deviation based
39
Q

Winsorising

A

Extreme scores are replace with value that is not as extreme

  • Next highest or lowest score
  • Reaplce with next highest/lowest score that is not an outlier
  • Replace with score that is +- 3.29 Standard deviations from the mean
40
Q

Non-parametric statistics

A

Do not rely on normally distributed data - Spearmann correlation

41
Q

Robust methods

A

Trimmed mean/M-Estimator

Bootstrapping (estimates parameters of the same distri ution based on the sample data