9. Scales creation and EFA Flashcards
What is important in EFA and name the steps
- Exploratory factor analysis is crucial tool
FOR CONSTRUCT CREATION - Selection of appropriate items/questions
- DEFINE WHAT YOU WANT TO MEASURE
- GENERATE QUESTIONS = ITEMS
- DECIDE WHAT TYPE OF MEASURE IT IS
- COLLECT DATA
- EFA
Measurement - dimensionality
Construct dimensionality
* Not every construct can be captured by one dimension, many have distinguishable facets that in composite
measure the original construct of interest
* Firms performance- operational excellence, customer relationships, revenue growth
* The sub-dimensions can be (sub)constructs themselves!
HOW DO WE DECIDE?
Structure of the entire construct and relationship between the dimension
* Sub-dimensions are manifestation of the overall construct or defining characteristics of it?
* Does the construct exist separately on a deeper level than its sub-dimensions?
* Is the change with the overall construct associated with change in all of the sub-dimensions or is it possible
that it is associated only with one/few of them?
What is EFA?
EFA is a statistical method used to uncover the underlying structure of a relatively large set of data
- Mathematically factor is a linear combination of the observed variables
- It is essentially explorative technique- there is no objectively one
best/most correct solution - It is the researcher that needs to make the final decision on which
rotation is the final one, and therefore how the factors will be defined
and interpreted - Factor loadings (the strength of association between observed variable
and the factor) help to define the factor- give it meaning- help to name
the pattern we have uncovered - Interpretability and stability of the factors is therefore how we can judge
the final “quality” of the analysis
What does EFA do?
Exploratory factor analysis is a statistical technique that is used to reduce data to a smaller set of summary variables and to explore the underlying theoretical structure of the phenomena. It is used to identify the structure of the relationship between the variable and the respondent.
Repackage correlation matrix through eigenvalues/eigenvectors to factor loadings
* Eigenvalue is collapsed variance from above and under diagonal in correlation matrix
* Through eigenvalues and eigenvectors we calculate factor loadings
* Factor loadings connect invisible factors to visible questions = items
- Imagine each dot is a specific question in your survey = item
- Factor is the actual variable you want to use later in your analysis
- Factor (line) tries to be positioned so it has strong relationships with
clusters of items - Difficult to do if there are no clear clusters (unsuitable data, you did the
survey wrong) - Factors can be correlated but should be distinct (they cannot be all
tangled together) - Usually you have a plan/idea about how many factors is in your data
(if a new factor pops up, could be a sign of again making a mistage in
the planning/surveying phase of your research)
Procedure of EFA
- GET DATA AND CHECK ASSUMPTIONS- Checking assumptions and the correlation matrix of the items we
have selected - EXTRACT FACTORS- the core task is to repackage the correlation matrix and find the factors and its
association with each variable- factor loadings - Initially we have as many factors as variables
- We extract only the interesting factors and force all items to be expressed through them
- ROTATE- Rotation of factors to increase interpretability-we want to find which items belong to which factorwhere
is the highest factor loadings, we then delete variables that had low association with the retained
factors - INTERPRETATION- Interpret the results, save factors for future analysis- ending up with clear structure (no
overlap between factors) we interpret the factors (name the underlying pattern) and save the factor scores for
future analysis
Performing EFA - prepare data
Sample size and study design
* Factor should be over-determined- at least 3-5 items for each factor
Sample size and missing data/suitability of the data
* Rules of thumb- 5 obs. - 10 obs. per item, but these are not set in stone
* Sample size depends on whether the correlations are expected to be strong in the population and whether
we expect the factors to be strong and independent- smaller sample will be ok (specific cut off points in
Fabrigar et.al, 1999)
* 100 and less- well-determined factors
* 100-200 obs- well-determined factors with lower common correlation between variables
* 300 obs- medium level of common correlation
* 500 obs- low level of correlations and a lot of poorly defined factors
Performing EFA - prepare
Check assumptions and factorability of R
Normality/case and variable outliers
* Not a necessity especially if we simply want to reduce data, but enhances solution
* Normality tests in SPSS are quite sensitive for large samples- Kolmogorov-Smirnov (>2000 obs) Shapiro-Wilk (<2000 obs)
* Values of skewness and kurtosis should be close 0 (+/- 2)
Outliers
* Visual tools- histograms, q-q plots, box plots
* Variables with low squared multiple correlation and low correlation with the factors are outliers and should be removed (you’ ll
find them during the analysis)
Linearity, multicollinearity but NOT singularity
* Correlations (our starting point) measures linear relationships, so we should have only variables/items that relate to each other
in a linear way, even very strong correlation- muticollinearity is fine
* Singularity (or extreme multicollinearity) should be avoided
Factorability of R
* Check correlation matrix for correlations higher than 0,3
* In linear algebra, the matrix is singular if and only if its determinant is zero
* There should be patterns of correlations- so more than a pair of variables
correlated- check partial correlation matrices
* SPSS- anti-image correlation matrix- to check the partial correlations
between variables
* Bartlett’s test of sphericity testing whether the correlations in correlation
matrix are zero, too sensitive
* Kaiser’s measure of sampling adequacy- ratio of sum of squared
correlations- SSC/(SSC-SSPartialC)- so if partial correlations are small the
ratio approaches 1
* Values above 0.6 acceptable
Performing EFA - Extract
Factor-extraction procedure (available in SPSS)- different methods how to estimate the parameters of factor model- factor
loadings and unique variances
* Principal components- tries to explain all variance in data not separate what items have in common and what is unique and
error
* Principal axis factoring- estimates communalities- squared multiple corr of the items- tries to explain just this variance with
factors
Crucial decision: Difference between PCA and EFA?
- Mathematically the difference is in the positive
diagonal of the correlation matrix that will be
analysed, for PCA it will be 1, since we are
interested in capturing all the variance in the
data, in EFA we are interested only in capturing
the variance that is shared among the
variables - In PCA each variable adds equally to the
variance studied, if we retain all components
of PCA we can reproduce backwards the
correlation matrix of the real data perfectly - In EFA the positive diagonal of correlation matrix has therefore communalities (squared multiple correlation
of the variable) on the positive diagonal- in SPSS you can see them as initial communalities - All communalities add up to the variance that is studied by EFA, which is less than the total variance of the
data, therefore factors do not retain the totality of the data and cannot reproduce back the entire correlation
matrix only approximation of it - Communalities
Initial communalities in EFA are squared multiple correlations of each variable (shared variance with other
variables), in PCA it is not calculated it is set to 1 (100%) so all the variance of the variable
Extracted communalities (in EFA and PCA)- sum of squared loadings for a variable across all extracted factors
(components), also shows how much variance on that particular variable has been explained by the extracted
factors (components)
Crucial devision - number of factors
Selecting number of factors
* Balancing the need for parsimony with the need for accuracy- essentially we should retain factors until
additional factors account for a very small amount of variance, their added value is minimal
* Underfactoring (not extracting enough factors) is usually introducing more problems than overfactoring
* When overfactoring, the main factors remain stable over several rotations, so the smaller unstable factors can
be removed
Rules of thumb?
* Kaiser criterion- we should keep all the factors that have eigenvalues higher than 1
* Scree plot- eigenvalues plotted against factors, shows how much variance is explained in each factor, usually
ordered in descending order, so we are looking for the turning point when the slope changes
* Variance explained- the total factors that are retained should be explaining together about 60% of variance
and more
Crucial decision rotation
- For any given solution with two or more factors, there exists
an infinite number of alternative orientations of the factors in
multidimensional space - Simple structure
- Factor loadings are then coordinates marking the position
of the variable in relation to the factor- they therefore
change during rotation - How much variance factors explain does not change
though - Point of rotation is to change the factor loadings so they fulfil
the criteria of simple structure
Performing rotation
Orthogonal
* Varimax- simplify columns of loading matrix by maximizing variance of loadings on each factor , the
spread of loadings within factor should be as wide as possible, high loadings after extraction become
higher after rotation and vice versa.
* Quartimax- simplify rows of loading matrix by maximizing variance of loadings within variables
* Equimax- combination of the above that tries to simplify both rows and columns of the loading matrix
Oblique rotation
* Oblimin- minimizes cross-products of loadings you can choose the level of correlation between the
factors by choosing the level of Delta
* Promax- an orthogonal solution is rotated again to allow correlations among factors, the orthogonal
loadings are raised to powers (2,3,6)
* It is not guaranteed that the factors are correlated and although you allow for oblique rotations, if there
is little correlation between factors, the solution will be very similar to orthogonal rotation
- Rotations create new matrices
- Orthogonal rotation- interpret the rotated matrix
- Oblique rotation- Instead of rotated factor matrix we get structure matrix which is a product of pattern and factor
correlation matrix - The pattern matrix is representing the “clean” amount of unique variation that the factor explain for each
variable - The structure matrix includes the shared variance caused by the overlap between factors
- Interpret the pattern matrix because it is easier and consider variables that have higher loading than 0.32,
best would actually be higher than 0.71
Most crucial - interpretation
- Pattern matrix- clean association between factors and items
- Look at the items that have the highest association
- Marker item- higher than 0.8
- Interpret the results- give factors name
- Decide on the structure
Inference
Highest loadings:
* Factor 2- “I am immersed in my work.” (0.671)
* Absorption as part of engagement scale by Schaufeli. et.al (2006)
* Factor 1- “There are lots of times when my job drives me up the wall.”
(0.807)
* Anxiety as part of work stress by Parker and DeCotiis’ (1983)
* Factor 3- “We have enough time for our work” (0,811)
* All the other items negative- Time pressure as a part of work stress by
Parker and DeCotiis’ (1983)
* The correlation between factor 1 and 3 indicated that they are part of
the same underlying construct- work stress
What if formative construct?
- If we believed work stress to be a formative construct, we could use the EFA to see what are the
components within the index- in this case it would be time pressure at work and anxiety - We decide- formative index composed of two reflective scales?
- If we decide that pressure and anxiety are adding up to formative overall measure- work stress
- The items related to having time off are connected with anxiety- they should be
removed and two items from anxiety should be removed as well to reduce the overlap
between the two as much as possible - The level of anxiety and the level of stress then can be added up to create stress index
- If work stress was reflective- feelings of work pressure and anxiety are just reflections of being
stressed- overlapped between them is actually welcomed