9. Scales creation and EFA Flashcards

Question 1

Q

What is important in EFA and name the steps

Answer

A

Exploratory factor analysis is crucial tool
FOR CONSTRUCT CREATION
Selection of appropriate items/questions
DEFINE WHAT YOU WANT TO MEASURE
GENERATE QUESTIONS = ITEMS
DECIDE WHAT TYPE OF MEASURE IT IS
COLLECT DATA
EFA

Question 2

Q

Measurement - dimensionality

Answer

A

Construct dimensionality
* Not every construct can be captured by one dimension, many have distinguishable facets that in composite
measure the original construct of interest
* Firms performance- operational excellence, customer relationships, revenue growth
* The sub-dimensions can be (sub)constructs themselves!

HOW DO WE DECIDE?
Structure of the entire construct and relationship between the dimension
* Sub-dimensions are manifestation of the overall construct or defining characteristics of it?
* Does the construct exist separately on a deeper level than its sub-dimensions?
* Is the change with the overall construct associated with change in all of the sub-dimensions or is it possible
that it is associated only with one/few of them?

Question 3

Q

What is EFA?

Answer

A

EFA is a statistical method used to uncover the underlying structure of a relatively large set of data

Mathematically factor is a linear combination of the observed variables
It is essentially explorative technique- there is no objectively one
best/most correct solution
It is the researcher that needs to make the final decision on which
rotation is the final one, and therefore how the factors will be defined
and interpreted
Factor loadings (the strength of association between observed variable
and the factor) help to define the factor- give it meaning- help to name
the pattern we have uncovered
Interpretability and stability of the factors is therefore how we can judge
the final “quality” of the analysis

Question 4

Q

What does EFA do?

Answer

A

Exploratory factor analysis is a statistical technique that is used to reduce data to a smaller set of summary variables and to explore the underlying theoretical structure of the phenomena. It is used to identify the structure of the relationship between the variable and the respondent.

Repackage correlation matrix through eigenvalues/eigenvectors to factor loadings
* Eigenvalue is collapsed variance from above and under diagonal in correlation matrix
* Through eigenvalues and eigenvectors we calculate factor loadings
* Factor loadings connect invisible factors to visible questions = items

Imagine each dot is a specific question in your survey = item
Factor is the actual variable you want to use later in your analysis
Factor (line) tries to be positioned so it has strong relationships with
clusters of items
Difficult to do if there are no clear clusters (unsuitable data, you did the
survey wrong)
Factors can be correlated but should be distinct (they cannot be all
tangled together)
Usually you have a plan/idea about how many factors is in your data
(if a new factor pops up, could be a sign of again making a mistage in
the planning/surveying phase of your research)

Question 5

Q

Procedure of EFA

Answer

A

GET DATA AND CHECK ASSUMPTIONS- Checking assumptions and the correlation matrix of the items we
have selected
EXTRACT FACTORS- the core task is to repackage the correlation matrix and find the factors and its
association with each variable- factor loadings
Initially we have as many factors as variables
We extract only the interesting factors and force all items to be expressed through them
ROTATE- Rotation of factors to increase interpretability-we want to find which items belong to which factorwhere
is the highest factor loadings, we then delete variables that had low association with the retained
factors
INTERPRETATION- Interpret the results, save factors for future analysis- ending up with clear structure (no
overlap between factors) we interpret the factors (name the underlying pattern) and save the factor scores for
future analysis

Question 6

Q

Performing EFA - prepare data

Answer

A

Sample size and study design
* Factor should be over-determined- at least 3-5 items for each factor
Sample size and missing data/suitability of the data
* Rules of thumb- 5 obs. - 10 obs. per item, but these are not set in stone
* Sample size depends on whether the correlations are expected to be strong in the population and whether
we expect the factors to be strong and independent- smaller sample will be ok (specific cut off points in
Fabrigar et.al, 1999)
* 100 and less- well-determined factors
* 100-200 obs- well-determined factors with lower common correlation between variables
* 300 obs- medium level of common correlation
* 500 obs- low level of correlations and a lot of poorly defined factors

Question 7

Q

Performing EFA - prepare

Answer

A

Check assumptions and factorability of R
Normality/case and variable outliers
* Not a necessity especially if we simply want to reduce data, but enhances solution
* Normality tests in SPSS are quite sensitive for large samples- Kolmogorov-Smirnov (>2000 obs) Shapiro-Wilk (<2000 obs)
* Values of skewness and kurtosis should be close 0 (+/- 2)
Outliers
* Visual tools- histograms, q-q plots, box plots
* Variables with low squared multiple correlation and low correlation with the factors are outliers and should be removed (you’ ll
find them during the analysis)
Linearity, multicollinearity but NOT singularity
* Correlations (our starting point) measures linear relationships, so we should have only variables/items that relate to each other
in a linear way, even very strong correlation- muticollinearity is fine
* Singularity (or extreme multicollinearity) should be avoided

Factorability of R
* Check correlation matrix for correlations higher than 0,3
* In linear algebra, the matrix is singular if and only if its determinant is zero
* There should be patterns of correlations- so more than a pair of variables
correlated- check partial correlation matrices
* SPSS- anti-image correlation matrix- to check the partial correlations
between variables
* Bartlett’s test of sphericity testing whether the correlations in correlation
matrix are zero, too sensitive
* Kaiser’s measure of sampling adequacy- ratio of sum of squared
correlations- SSC/(SSC-SSPartialC)- so if partial correlations are small the
ratio approaches 1
* Values above 0.6 acceptable

Question 8

Q

Performing EFA - Extract

Answer

A

Factor-extraction procedure (available in SPSS)- different methods how to estimate the parameters of factor model- factor
loadings and unique variances
* Principal components- tries to explain all variance in data not separate what items have in common and what is unique and
error
* Principal axis factoring- estimates communalities- squared multiple corr of the items- tries to explain just this variance with
factors

Question 9

Q

Crucial decision: Difference between PCA and EFA?

Answer

A

Mathematically the difference is in the positive
diagonal of the correlation matrix that will be
analysed, for PCA it will be 1, since we are
interested in capturing all the variance in the
data, in EFA we are interested only in capturing
the variance that is shared among the
variables
In PCA each variable adds equally to the
variance studied, if we retain all components
of PCA we can reproduce backwards the
correlation matrix of the real data perfectly
In EFA the positive diagonal of correlation matrix has therefore communalities (squared multiple correlation
of the variable) on the positive diagonal- in SPSS you can see them as initial communalities
All communalities add up to the variance that is studied by EFA, which is less than the total variance of the
data, therefore factors do not retain the totality of the data and cannot reproduce back the entire correlation
matrix only approximation of it
Communalities
Initial communalities in EFA are squared multiple correlations of each variable (shared variance with other
variables), in PCA it is not calculated it is set to 1 (100%) so all the variance of the variable
Extracted communalities (in EFA and PCA)- sum of squared loadings for a variable across all extracted factors
(components), also shows how much variance on that particular variable has been explained by the extracted
factors (components)

Question 10

Q

Crucial devision - number of factors

Answer

A

Selecting number of factors
* Balancing the need for parsimony with the need for accuracy- essentially we should retain factors until
additional factors account for a very small amount of variance, their added value is minimal
* Underfactoring (not extracting enough factors) is usually introducing more problems than overfactoring
* When overfactoring, the main factors remain stable over several rotations, so the smaller unstable factors can
be removed
Rules of thumb?
* Kaiser criterion- we should keep all the factors that have eigenvalues higher than 1
* Scree plot- eigenvalues plotted against factors, shows how much variance is explained in each factor, usually
ordered in descending order, so we are looking for the turning point when the slope changes
* Variance explained- the total factors that are retained should be explaining together about 60% of variance
and more

Question 11

Q

Crucial decision rotation

Answer

A

For any given solution with two or more factors, there exists
an infinite number of alternative orientations of the factors in
multidimensional space
Simple structure
Factor loadings are then coordinates marking the position
of the variable in relation to the factor- they therefore
change during rotation
How much variance factors explain does not change
though
Point of rotation is to change the factor loadings so they fulfil
the criteria of simple structure

Question 12

Q

Performing rotation

Answer

A

Orthogonal
* Varimax- simplify columns of loading matrix by maximizing variance of loadings on each factor , the
spread of loadings within factor should be as wide as possible, high loadings after extraction become
higher after rotation and vice versa.
* Quartimax- simplify rows of loading matrix by maximizing variance of loadings within variables
* Equimax- combination of the above that tries to simplify both rows and columns of the loading matrix
Oblique rotation
* Oblimin- minimizes cross-products of loadings you can choose the level of correlation between the
factors by choosing the level of Delta
* Promax- an orthogonal solution is rotated again to allow correlations among factors, the orthogonal
loadings are raised to powers (2,3,6)
* It is not guaranteed that the factors are correlated and although you allow for oblique rotations, if there
is little correlation between factors, the solution will be very similar to orthogonal rotation

Rotations create new matrices
Orthogonal rotation- interpret the rotated matrix
Oblique rotation- Instead of rotated factor matrix we get structure matrix which is a product of pattern and factor
correlation matrix
The pattern matrix is representing the “clean” amount of unique variation that the factor explain for each
variable
The structure matrix includes the shared variance caused by the overlap between factors
Interpret the pattern matrix because it is easier and consider variables that have higher loading than 0.32,
best would actually be higher than 0.71

Question 13

Q

Most crucial - interpretation

Answer

A

Pattern matrix- clean association between factors and items
Look at the items that have the highest association
Marker item- higher than 0.8
Interpret the results- give factors name
Decide on the structure

Question 14

Q

Inference

Answer

A

Highest loadings:
* Factor 2- “I am immersed in my work.” (0.671)
* Absorption as part of engagement scale by Schaufeli. et.al (2006)
* Factor 1- “There are lots of times when my job drives me up the wall.”
(0.807)
* Anxiety as part of work stress by Parker and DeCotiis’ (1983)
* Factor 3- “We have enough time for our work” (0,811)
* All the other items negative- Time pressure as a part of work stress by
Parker and DeCotiis’ (1983)
* The correlation between factor 1 and 3 indicated that they are part of
the same underlying construct- work stress

Question 15

Q

What if formative construct?

Answer

A

If we believed work stress to be a formative construct, we could use the EFA to see what are the
components within the index- in this case it would be time pressure at work and anxiety
We decide- formative index composed of two reflective scales?
If we decide that pressure and anxiety are adding up to formative overall measure- work stress
The items related to having time off are connected with anxiety- they should be
removed and two items from anxiety should be removed as well to reduce the overlap
between the two as much as possible
The level of anxiety and the level of stress then can be added up to create stress index
If work stress was reflective- feelings of work pressure and anxiety are just reflections of being
stressed- overlapped between them is actually welcomed

Question 16

Q

Save factors scores

Answer

A

We are doing EFA so we find out how to represent best our constructs
We can select the “best items” and simply calculate sum scales or
average score out of them
Directly save factor scores after finding EFA solution
These are all estimation techniques to assign a score to every individual
on the underlying construct that is now variable in its own right
Regression- new variable mean of 0, variance equal to SMC
between factors and variables
Barlett
Anderson-Rubin- standardized variable