Screening Flashcards

1
Q

Ungrouped Data = ?

A

multiple regression, canonical correlation, factor analysis, or structural equation modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Grouped Data = ?

A

analysis of covariance, multivariate analysis of variance or covariance, profile analysis, discriminant analysis, or multilevel modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can Values be?

A
Out of range
plausible values
coding accuracy (frequencies table)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Missing Data?

A

suspect it-test it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Missing Data how to test?

A

use a dummy variable and test between group diff – anything less than 5% missing yields same result from any method of recovery depending on size of data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Types of missing data

A

1) MCAR (Missing completely at random – i.e. unpredictable, independent of other variables in study)
2) MAR (missing at random, ‘ignorable non-response’ – predictable from other variables – e.g. patients might miss questionnaire)
3) MNAR (missing not at random or ‘non-ignorable’ – missingness is related to the variable – serious bias of results, e.g. patients less likely to complete qu. Because of score)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of tests for missing data?

A

1) SPSS MVA (missing value analysis) – finds patterns and replaces values. T-test produced to test if missingness is related to any other variable on cases with >5% missing data. This tests MCAR, MAR & MNAR. ‘Little MCAR test’ = if not sig assume MCAR; if sig and missingness is relted to other IVs (not DV) then MAR. If sig with DV, then MNAR.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to deal with missing data ?

A

1) Omission / Deleting (still report) – Ok if (<5%) with MAR/MCAR & dataset mod to large. Problematic if small, experimental design, or MNAR.
2) Prior knowledge – Impute or fill in using expert prior knowledge. Could use dichotomous downgrade (high vs low).
3) Impute Mean Substitution – Easy, conservative, but reduces variance in data set (because the mean is closer to itself than to the missing value it replaces, and the correlation the variable has with other variables is reduced because of the reduction in variance)
4) Impute Regression – More sophisticated. Other IVs used to write regression equation for variable with missing data (DV). Sometimes repeated regressions, using predicted values from REG1 as new DV. Reduces variance (values closer to the mean) and inflates relationship (scores fit better than they should) between IVs. Relies on good relationship between DV and potential IVs. Can use SPSS MVA for this.
5) Expectation Maximisation (EM) – Possible for MCAR/MAR – EM assumes normal distribution and makes missing data correlation matrix for the partially missing data. It bases inferences about missing values on the likelihood under that distribution. 1) finsing conditional expectation given all data and estimates of parameters 2) maximum likelihood – values inputted iteratively. Biased as does not add error to imputed data (biased standard error).
6) Multiple Imputation – No assumptions about randomness. Complex in SPSS. Can be applied to longitudinal data and time-series and retains sampling variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

once dealt with missing data, what good to do?

A

Contrast different methods – do not base method on outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

types of outliers?

A

univariate & multivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what/how to test for univariate outliers?

A

1) Standardise variable – absolute values z = > 3.29 (.1% sample)
2) Histograms / boxplots
3) Outlier decision should be independent of results
4) Tackling univariate first should limit multivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what/how to test for multiivariate outliers?

A

1) Best with formal stats methods
2) Mahalanobis Distance (MD) = distance of a case centroid (intersection of variable means)
3) MD tested using chi sq distribution (X2) with conservative alpha (p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is Mahalanobis Distance ?

A

1) distance of a case centroid (intersection of variable means)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is leverage ?

A

1) Leverage = similar to MD but cant use same sig tests with X2 as on diff scale. Leverage is how far out (but can still be in line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is discrepancy ?

A

1) Discrepancy = is how out of line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is influence? and which is corresponding stat?

A

1) Influence = The product of Leverage and Discrepancy - It assesses change in regression coefficients when a case is deleted; cases with influence scores larger than 1.00 are suspected of being outliers – using Cook’s distance.

17
Q

why identify multiivariate outliers?

A

1) Why are cases extreme? 1. is case part of sample 2. If intending to modify scores, which scores to modify 3. Provides indication as to which results do not genrealise.

18
Q

how to properly describe multiivariate outliers?

A

1) Create a dummy variable for outliers and non-outliers
2) Dummy variable for outliers is then used as DV in logistic regression to determine which IVs best predict group membership.
3) Check M & SDs of outlier in variables identified.

19
Q

how to deal with outliers?

A

1) Omit with caution (if not in pop)
2) Transform variable
3) Windsorize / nearest neighbour – reduce or increase score based on next (surrounding) values
4) Trim a priori a % of cases from upper & lower tails of dist
5) After dealing with univariate you still have multivariate in large N best to omit.
6) Compare with and without outliers output for effect of omission
7) Report any remedies or changes
8) Non-parametric tests are less sensitive (possible solution)

20
Q

Important point of Grouped data normality and e.g. of one analysis type?

A

Grouped Data – e.g. anova – norm is sampling dist of variable means – central limit theorem

21
Q

Important point of UN Grouped data normality and e.g. of one analysis type?

A

Ungrouped Data – e.g. Regression – multivariate impractical, analysis of individual variables and residuals through norm, linearity, homoscedasticity

22
Q

what is test of distribution ?

A

Kolmogorov-Smirnoff

23
Q

what is skewness ?

A

1) degree of symmetry (positive / negative / normal = 0) – spss freq table

24
Q

what is kurtosis ?

A

– Peakedness or flatness (Leptokurtic (positive) / Platykurtic (negative) / Normal = value of 0 – spss freq table

25
Q

simple way of testing distribution ?

A

Can test using z scores diff from 0

26
Q

Normality can be assumed if …..

A

1) Curran et al. (1996) suggest normality can be assumed if skewness values are not greater than abs val 2 and kurtosis values are not greater than abs val 7.

27
Q

plots for normality?

A

1) Plots = Histogram (with normal overlay) // normal probability plot (straight line with dots plots observed against expected) // Detrended prob plot (similar but plots deviation)

28
Q

how to address normality issues?

A

1) Transform (log for skewed / sq rt / inverse)
2) Windsorize / Trim
3) Check after completion / examine impact on analysis
4) Use non-para if recalcitrant

29
Q

what is linearity?

A

Assumption variables have linear (straight line) relations - for Pearson corr and regression.

30
Q

how to assess linearity ?

A

1) Assessed using bivariate scatterplots – should be oval-shaped
1) Some variables may inherently have non-linear

31
Q

types of linear reatonships? + alternative if non-linear?

A

1) Alernatives if non-linear = (recode in to dummy variables somehow).
2) Types of relationship – Quadratic, cubic,

32
Q

what is Homoscedasticity?

A

Variables variance roughly equal

33
Q

How to assess Homoscedasticity Grouped?

A

1) Grouped = Var(DV) = Var(all levels of discrete IV) - assessed Levene’s Test

34
Q

How to assess Homoscedasticity UN Grouped?

A

1) Un Grouped = Inspect scatterplots – non-normaility in one or both of variables

35
Q

what is Multi-collinearity ?

A

• Multi-collinearity – IV too correlated with each other (e.g. >.90)

36
Q

Factors involved with multi-collinearity ?

A

1) Correlation Matrix
2) Tolerance / VIF (accepted values I forgot)
3) Can also use that MACRO I doenloaded for SPSS
4) Remove or combine