9. Scales creation and EFA Flashcards

1
Q

What is important in EFA and name the steps

A
  • Exploratory factor analysis is crucial tool
    FOR CONSTRUCT CREATION
  • Selection of appropriate items/questions
  • DEFINE WHAT YOU WANT TO MEASURE
  • GENERATE QUESTIONS = ITEMS
  • DECIDE WHAT TYPE OF MEASURE IT IS
  • COLLECT DATA
  • EFA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Measurement - dimensionality

A

Construct dimensionality
* Not every construct can be captured by one dimension, many have distinguishable facets that in composite
measure the original construct of interest
* Firms performance- operational excellence, customer relationships, revenue growth
* The sub-dimensions can be (sub)constructs themselves!

HOW DO WE DECIDE?
Structure of the entire construct and relationship between the dimension
* Sub-dimensions are manifestation of the overall construct or defining characteristics of it?
* Does the construct exist separately on a deeper level than its sub-dimensions?
* Is the change with the overall construct associated with change in all of the sub-dimensions or is it possible
that it is associated only with one/few of them?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is EFA?

A

EFA is a statistical method used to uncover the underlying structure of a relatively large set of data

  • Mathematically factor is a linear combination of the observed variables
  • It is essentially explorative technique- there is no objectively one
    best/most correct solution
  • It is the researcher that needs to make the final decision on which
    rotation is the final one, and therefore how the factors will be defined
    and interpreted
  • Factor loadings (the strength of association between observed variable
    and the factor) help to define the factor- give it meaning- help to name
    the pattern we have uncovered
  • Interpretability and stability of the factors is therefore how we can judge
    the final “quality” of the analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does EFA do?

A

Exploratory factor analysis is a statistical technique that is used to reduce data to a smaller set of summary variables and to explore the underlying theoretical structure of the phenomena. It is used to identify the structure of the relationship between the variable and the respondent.

Repackage correlation matrix through eigenvalues/eigenvectors to factor loadings
* Eigenvalue is collapsed variance from above and under diagonal in correlation matrix
* Through eigenvalues and eigenvectors we calculate factor loadings
* Factor loadings connect invisible factors to visible questions = items

  • Imagine each dot is a specific question in your survey = item
  • Factor is the actual variable you want to use later in your analysis
  • Factor (line) tries to be positioned so it has strong relationships with
    clusters of items
  • Difficult to do if there are no clear clusters (unsuitable data, you did the
    survey wrong)
  • Factors can be correlated but should be distinct (they cannot be all
    tangled together)
  • Usually you have a plan/idea about how many factors is in your data
    (if a new factor pops up, could be a sign of again making a mistage in
    the planning/surveying phase of your research)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Procedure of EFA

A
  • GET DATA AND CHECK ASSUMPTIONS- Checking assumptions and the correlation matrix of the items we
    have selected
  • EXTRACT FACTORS- the core task is to repackage the correlation matrix and find the factors and its
    association with each variable- factor loadings
  • Initially we have as many factors as variables
  • We extract only the interesting factors and force all items to be expressed through them
  • ROTATE- Rotation of factors to increase interpretability-we want to find which items belong to which factorwhere
    is the highest factor loadings, we then delete variables that had low association with the retained
    factors
  • INTERPRETATION- Interpret the results, save factors for future analysis- ending up with clear structure (no
    overlap between factors) we interpret the factors (name the underlying pattern) and save the factor scores for
    future analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Performing EFA - prepare data

A

Sample size and study design
* Factor should be over-determined- at least 3-5 items for each factor
Sample size and missing data/suitability of the data
* Rules of thumb- 5 obs. - 10 obs. per item, but these are not set in stone
* Sample size depends on whether the correlations are expected to be strong in the population and whether
we expect the factors to be strong and independent- smaller sample will be ok (specific cut off points in
Fabrigar et.al, 1999)
* 100 and less- well-determined factors
* 100-200 obs- well-determined factors with lower common correlation between variables
* 300 obs- medium level of common correlation
* 500 obs- low level of correlations and a lot of poorly defined factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Performing EFA - prepare

A

Check assumptions and factorability of R
Normality/case and variable outliers
* Not a necessity especially if we simply want to reduce data, but enhances solution
* Normality tests in SPSS are quite sensitive for large samples- Kolmogorov-Smirnov (>2000 obs) Shapiro-Wilk (<2000 obs)
* Values of skewness and kurtosis should be close 0 (+/- 2)
Outliers
* Visual tools- histograms, q-q plots, box plots
* Variables with low squared multiple correlation and low correlation with the factors are outliers and should be removed (you’ ll
find them during the analysis)
Linearity, multicollinearity but NOT singularity
* Correlations (our starting point) measures linear relationships, so we should have only variables/items that relate to each other
in a linear way, even very strong correlation- muticollinearity is fine
* Singularity (or extreme multicollinearity) should be avoided

Factorability of R
* Check correlation matrix for correlations higher than 0,3
* In linear algebra, the matrix is singular if and only if its determinant is zero
* There should be patterns of correlations- so more than a pair of variables
correlated- check partial correlation matrices
* SPSS- anti-image correlation matrix- to check the partial correlations
between variables
* Bartlett’s test of sphericity testing whether the correlations in correlation
matrix are zero, too sensitive
* Kaiser’s measure of sampling adequacy- ratio of sum of squared
correlations- SSC/(SSC-SSPartialC)- so if partial correlations are small the
ratio approaches 1
* Values above 0.6 acceptable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Performing EFA - Extract

A

Factor-extraction procedure (available in SPSS)- different methods how to estimate the parameters of factor model- factor
loadings and unique variances
* Principal components- tries to explain all variance in data not separate what items have in common and what is unique and
error
* Principal axis factoring- estimates communalities- squared multiple corr of the items- tries to explain just this variance with
factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Crucial decision: Difference between PCA and EFA?

A
  • Mathematically the difference is in the positive
    diagonal of the correlation matrix that will be
    analysed, for PCA it will be 1, since we are
    interested in capturing all the variance in the
    data, in EFA we are interested only in capturing
    the variance that is shared among the
    variables
  • In PCA each variable adds equally to the
    variance studied, if we retain all components
    of PCA we can reproduce backwards the
    correlation matrix of the real data perfectly
  • In EFA the positive diagonal of correlation matrix has therefore communalities (squared multiple correlation
    of the variable) on the positive diagonal- in SPSS you can see them as initial communalities
  • All communalities add up to the variance that is studied by EFA, which is less than the total variance of the
    data, therefore factors do not retain the totality of the data and cannot reproduce back the entire correlation
    matrix only approximation of it
  • Communalities
    Initial communalities in EFA are squared multiple correlations of each variable (shared variance with other
    variables), in PCA it is not calculated it is set to 1 (100%) so all the variance of the variable
    Extracted communalities (in EFA and PCA)- sum of squared loadings for a variable across all extracted factors
    (components), also shows how much variance on that particular variable has been explained by the extracted
    factors (components)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Crucial devision - number of factors

A

Selecting number of factors
* Balancing the need for parsimony with the need for accuracy- essentially we should retain factors until
additional factors account for a very small amount of variance, their added value is minimal
* Underfactoring (not extracting enough factors) is usually introducing more problems than overfactoring
* When overfactoring, the main factors remain stable over several rotations, so the smaller unstable factors can
be removed
Rules of thumb?
* Kaiser criterion- we should keep all the factors that have eigenvalues higher than 1
* Scree plot- eigenvalues plotted against factors, shows how much variance is explained in each factor, usually
ordered in descending order, so we are looking for the turning point when the slope changes
* Variance explained- the total factors that are retained should be explaining together about 60% of variance
and more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Crucial decision rotation

A
  • For any given solution with two or more factors, there exists
    an infinite number of alternative orientations of the factors in
    multidimensional space
  • Simple structure
  • Factor loadings are then coordinates marking the position
    of the variable in relation to the factor- they therefore
    change during rotation
  • How much variance factors explain does not change
    though
  • Point of rotation is to change the factor loadings so they fulfil
    the criteria of simple structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Performing rotation

A

Orthogonal
* Varimax- simplify columns of loading matrix by maximizing variance of loadings on each factor , the
spread of loadings within factor should be as wide as possible, high loadings after extraction become
higher after rotation and vice versa.
* Quartimax- simplify rows of loading matrix by maximizing variance of loadings within variables
* Equimax- combination of the above that tries to simplify both rows and columns of the loading matrix
Oblique rotation
* Oblimin- minimizes cross-products of loadings you can choose the level of correlation between the
factors by choosing the level of Delta
* Promax- an orthogonal solution is rotated again to allow correlations among factors, the orthogonal
loadings are raised to powers (2,3,6)
* It is not guaranteed that the factors are correlated and although you allow for oblique rotations, if there
is little correlation between factors, the solution will be very similar to orthogonal rotation

  • Rotations create new matrices
  • Orthogonal rotation- interpret the rotated matrix
  • Oblique rotation- Instead of rotated factor matrix we get structure matrix which is a product of pattern and factor
    correlation matrix
  • The pattern matrix is representing the “clean” amount of unique variation that the factor explain for each
    variable
  • The structure matrix includes the shared variance caused by the overlap between factors
  • Interpret the pattern matrix because it is easier and consider variables that have higher loading than 0.32,
    best would actually be higher than 0.71
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Most crucial - interpretation

A
  • Pattern matrix- clean association between factors and items
  • Look at the items that have the highest association
  • Marker item- higher than 0.8
  • Interpret the results- give factors name
  • Decide on the structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Inference

A

Highest loadings:
* Factor 2- “I am immersed in my work.” (0.671)
* Absorption as part of engagement scale by Schaufeli. et.al (2006)
* Factor 1- “There are lots of times when my job drives me up the wall.”
(0.807)
* Anxiety as part of work stress by Parker and DeCotiis’ (1983)
* Factor 3- “We have enough time for our work” (0,811)
* All the other items negative- Time pressure as a part of work stress by
Parker and DeCotiis’ (1983)
* The correlation between factor 1 and 3 indicated that they are part of
the same underlying construct- work stress

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What if formative construct?

A
  • If we believed work stress to be a formative construct, we could use the EFA to see what are the
    components within the index- in this case it would be time pressure at work and anxiety
  • We decide- formative index composed of two reflective scales?
  • If we decide that pressure and anxiety are adding up to formative overall measure- work stress
  • The items related to having time off are connected with anxiety- they should be
    removed and two items from anxiety should be removed as well to reduce the overlap
    between the two as much as possible
  • The level of anxiety and the level of stress then can be added up to create stress index
  • If work stress was reflective- feelings of work pressure and anxiety are just reflections of being
    stressed- overlapped between them is actually welcomed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Save factors scores

A
  • We are doing EFA so we find out how to represent best our constructs
  • We can select the “best items” and simply calculate sum scales or
    average score out of them
  • Directly save factor scores after finding EFA solution
  • These are all estimation techniques to assign a score to every individual
    on the underlying construct that is now variable in its own right
  • Regression- new variable mean of 0, variance equal to SMC
    between factors and variables
  • Barlett
  • Anderson-Rubin- standardized variable