3 introduction to exploratory factor analysis Flashcards

1
Q

What is an exploratory factor analysis?

A

statistical technique used to uncover the underlying structure of a relatively large set of variables.

identify the number of potential latent factors that can explain the patterns of correlations among the observed variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are four general issues regarding the best practice EFA?

A

component vs. factor extraction

number of factors to retain for rotation

orthogonal vs. oblique rotation

adequate sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the best extraction method for an EFA?

A

PCA is default extraction method
factor analysis is preferrable

PCA - data reduction method, no regard to latent variables
researchers rarely collect and analyse data without an a priori idea about how the variables are related

FA - shared variance of a variable is partitioned from its unique+error variance of a variable to reveal the underlying factor structure

SPSS - six Factor Extraction Methods
(unweighted least squares, generalized least squares, maximum likelihood, principal axis factoring, alpha factoring, and image factoring)

if relatively normally distributed then maximum likelihood is good choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How many factors should be retained for rotation?

A

retain all factors with eigenvalues greater than1.0
→ least accurate method for selecting number of factors to retain

scree test
(examining the graph of the eigenvalues and looking for natural bend or break point in the data where the curve flattens out)

number of datapoints above the break is usually the number of factors to retain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What rotation technique should be used in an EFA?

A

implify and clarify the data structure
cannot improve the basic aspects of the analysis

varimax rotation, quartimax and equamax are orthogonal methods

direct oblimin, quartimin, and promax are oblique (allow for factor correlation)

oblique is generally more appropriate as a certain amount of intercorrelation is always expected in social sciences

SPSS - rotated factor matrix, pattern matrix, factor correlation

manipulation of delta or kappa values allows for factor correlation
appears to introduce unecessary complexity tho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the best sample size for an EFA?

A

research uses relatively small samples (measured by subject to item ratio)

no strict rules
partly determined by nature of data

the stronger the data, the smaller the sample can be for an accurate analysis. “Strong data” in factor analysis means uniformly high communalities without cross loadings, plus several variables loading strongly on each factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are good results of an EFA?

A

item communialities are considered high if they are all .8 or greater
unlikely to occur (more like 0.4-0.7)

.32 as minimum loading of an item, approx 10% of overlapping variance with other items

crossloading = item that loads higher on two or more factors

a factor with fewer than three items is generally weak and unstable
5 or more strongly loading items (.50 or better) are desired

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What did the study examining best practice EFA find?

A
  • Extraction Method: PCA resulted in significanlty higher total variance, item loadings were higher (doesnt partition unique variance from shared variance)
    factor analysis produced an average variance of 59.8% and principal component analysis produced 69.6% - overstimation
  • Rotation Method: orthogonal factors
    oblique rotation is recommended
  • Sample Size: larger samples tended to produce solutions that were more accurate
    20% in 2:1 produced correct solutions
    70% in 20:1
    number of misclassified items was significantly affected by sample size
    almost two of thirteen items were misclassified in smallest samples, over one item in every two analyses in largest sample

⇒ the norm: PCA with varimax and kaiser criterion

⇒ optimal: maximum likelihood, oblique rotation and scree plot plus multiple test runs for information on how meaningful factors might be in dataset

⇒ large samples, even with 20:1 theres error rates above the .05 level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the chi-square test say in an EFA?

A
  • As with CFA, the chi-square test of EFA effectively compares the model you have defined against the ‘ideal’ model for the data. As a result, you want the chi-square test to be non-significant (p>.05) as this will allow us to infer that our predefined model is not significantly different from the ideal model of our data.
  • Note: as the amount of data increase, chi-square is increasingly more likely to be significant and, therefore, any interpretation of the model must be based on more than the chi-square result, especially when the dataset is large.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the role of eigenvalues in an EFA?

A

greater than 1
-> explain significant amount of the variance present in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How should the scree plot technique be used?

A

The scree plot should be used in conjunction with the eigenvalues to determine how many factors to retain. The number of factors to retain is based on the number of factors that are above the ‘break’ (sometimes referred to as the ‘elbow’) in the scree plot. Do not include the factor at the break; only count those before the break/elbow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can be said about factor loadings?

A
  • As with CFA, the strength of the relationship between the observed and the latent factor variables is expressed in terms of a factor loading regression coefficient.
  • An observed variable is said to ‘load sufficiently’ onto a factor, that is, it has a strong enough relationship to be considered a true part of the factor, if the factor loading is more extreme than +/- 0.4.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the common factor model by Thurstone (1947)?

A

factors influence observed variables, accounting for their variation and covariation

factor analysis = fitting models to bivariate associations among unobserved variables

linear regression model

Y = An + e

Y = observed variable (item)
An = P x M matrix (factor loadings, output, common factors)
e = unique factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the primary computational challenge in an FA?

A

regression coefficients

variables are latent/unobserved

however, analysis of covariates among Y
-> reverse-engineering
interrelatedness leads to inference about underlying factor structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the statistical distinction between EFA and CFA?

A

EFA - free

factor loading matrix are freely estimated
M is estimated relation
no a prior hypothesis about M

CFA - resticted

factor loading matrix is fixed to equal zero
no crossloadings
independent clusters solution
rotation is not a thing
strong a prior hypothesis about M

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is rotational indeterminancy?

A

there is an infinite number of factor patterns (rotations) that fit the observed data equally well

17
Q

What is row parsimony and column parsimony?

A

row parsimony
(the extent to which each variable loads strongly on
one factor and near-zero on other factors) and

column parsimony
(the extent to which each factor has large loadings for some
variables and small loadings for others);

18
Q

How are the factors analysed?

A

FA frequently involves analysis of individual items from a single test or questionnaire rather than analysis of total scores from multiple tests or subscales

items - binary or ordered categorical variables

Just as a nonlinear procedure such as logistic regression is preferable to ordinary linear regression for a categorical outcome, a categorical variable methodology is often preferable for factor analyses of items.

The potential consequences of factor analysing items as continuous variables include incorrect conclusions about the number of factors (or model fit more generally), biased factor loadings and interfactor correlations, and biased parameter standard errors, leading to incorrect significance tests and confidence intervals.

19
Q

What are polychoric correlations?

A

used to detect correlation with underlying continuous variables that are normally distributed

estimates correlation coefficients

20
Q

How should FA be applied to psychological research?

A

psychometric development and validation of individual scales

long history of contributing to theories of intelligence and structures of personality

21
Q

When should FA be used?

A

if validity evidence for new scales is needed

internal structure (the dimensionality) of the item is consistent with expectations regarding the constructs that the scale is intended to measure

EFA is preferrable in the beginning cus there might be unanticipated but meaningful factors influencing items (cross-loadings)

22
Q

Why is FA not necessarily providing evidence for construct validity?

A

use of factor analysis is often presented as providing evidence for:
construct validity

substantive = theory of construct, developing a pool of test items which adresses all theoretical aspects
structural = items´ empirical relations are consistent with theoretical aspects
external = related to other constructs? measures, outcomes, …

factor analysis is squarely in structural phase of construct validity

does not give sufficient information for removing an item
removing an item might be harmful to content validity

influenced by sampling error

23
Q

Should I do a FA in my study?

A

cale validation is ongoing process, all empirical studies using a given scale contribute evidence for the validity of the use of that scale.

criterion validity is provided if a new study reports an association between the scale and an important outcome variable

differences across populations (meaning of significance)

reevaluate properties of a scale
→ measurement invariance?

24
Q

How can you determine if you should use a CFA or an EFA?

A

depends on the particularities of the research scenario

rooted in common factor model

Bown (2015)
CFA requires a strong empirical or conceptual foundation to guide the specification and evaluation of a factor model. Accordingly, CFA is typically used in later phases of scale development or construct validation after the underlying structure has been tentatively established by prior empirical analyses using EFA, as well as on theoretical grounds. (p. 41)

EFA:

Determine the nature of and the number of latent variables that account for observed variation and covariation between a set of observed indicators, without an a priori model

Latent factors are interpreted as the causes of observed values

Requires a structured covariance matrix

25
Q

How would one proceed in no CFA modeladequately fits the data?

A

researchers may conclude that the hypotheses or expectations behind the model were not supported, acknowledging the nature of science.

-> move to exploratory mode

-> revising CFA
(problematic, needs strong rationale)
For instance, if a one-factor model is rejected due to positively and negatively worded items, researchers may consider freeing error covariances among negatively worded items for improved fit.

-> stick to traditional EFA

26
Q

What are the main points when replicating an EFA with an CFA?

A

do not use the same dataset

new decisions need to be made about method

CFA might not fit adequately

return to EFA

27
Q

What are common mistakes with EFA?

A
  • mistaking PCA for FA
  • using kaiser criterion
  • not using oblique
  • neglecting categorical nature of items
  • naming fallacy
  • defaults in popular software
  • overcoming those with alternative software
28
Q

What are common mistakes with CFA?

A
  • misunderstanding model fit
  • limitations of norm statistics
  • multiple well-fitting models
  • overrreliance on good fit
  • problems with model revisions
  • exploratory nature of post hoc model modification
29
Q

How would an EFA be conducted in JASP?

A
  1. no need to name variables
  2. move all variables into variables box
  3. eigenvalues
    above 0
  4. rotation
    oblique - oblimin
    principal axis factoring
  5. output options
    scree plot
  6. chi-square test → nonsignificant
  7. scree plot → where is the elbow
  8. remove cross-loadings or the removed factor items