Week 11 - Factor Analysis Flashcards
What 2 aims of factor analysis used for ?
1) When looking for the underlying (predicted) structure of a set of related variables (You are specifically looking for an effect that you want to exist)
2) Trying to reduce set of variables to a single, composite variable that combine shared info in them (Variance) (More of an exploratory approach)
What does factor analysis seek to do?
Factor analysis seeks to form linear composites (sums of underlying variables)that represent the underlying structure of the correlation matrix
- Groups of highly intercorrelated items whose variance is well explained by the composite
What is main goal of factor analysis?
To start with a set of measured variables and find the smallest number of factors to account for most the variance in the measured variable and the correlations between them
Factor analysis allow to
Make statements about the patterns of intercorrelations
Typical factor analysis set up
Set of observed (questionnaire items) or conceptually related items (No Y variables)
- Does underlying structure link a subset of items?
- Subsets of variables that correlate highly with one another and low on other factors
Factor analysis is …
Geometric, based on correlations and the pattern of correlations that suggest linked items
Factors (Composites) in factor analysis should
Explain more than or two variables
Should explain systematic variance and exclude individual error as much as possible
History of factor analysis
Used in intelligence test (underlying g)
What can;t factor analysis do
Test for significance
2 reasons for doing factor analysis
1) data reduction to provide composites for further study
- empirical and technical reason
- after analysis is done we can obtain a factor as a new composite variable in our dataset
2) Investigate the underlying structure of a set of measured variables
- Give name to these variable using subjective judgement
Latent Variable
Not directly observed variable, Factor analysis uncover these
Observed Variables
Directly observed, exist in dataset already
Two Varieties of Factor Analysis
1) Principle Components factor analysis (PCA)
2) Factor analysis (Common factor model) (PAF)
Principle components factor analysis
Most basic form
Use mathematical properties of matrices (numbers)
Straight data reduction where error in original variables is not partialled out (all variance is used) (Does not ignore error in individual item)
Common factor model
First utilised for theory building - start with a set of variables and need to know how many dimensions (components) they contain
Known as exploratory factor analysis
Analyses common variance and leave out the unique variance to each individual variable
Differences Between PCA and PAF
PCA- Components PAF - Factors
PCA use R as is (diagonal)
PAF analyse covariance and modify R (estimated communalities on diagonal)
Confirmatory factor analysis
Can test for significance, number of factors and and fit
Subscale
Composite of constituent items
Factor
Weighted linear composite
In factor analysis we want to use data reduction to…
Reduce the number of measure variables (fewer = less error and more reliability)
Use a group of measured variables to indicate underlying construct that we can’t measure directly
Correlation Matrix
Contain number of underlying factors and components (Implicit)
Extraction
Extract factors, but only as few as needed to adequately summarise all the measured variables
Everytime we want to extract new factors we have to run another analysis
Each factor based on principle components of the correlation matrix
Principle component of correlation matrix (4)
1) Ordered from first to last (As many principle components as they are variables)
2) Each account for as much variance as possible in the whole set of variables (first factor is always the largest as it is trying to account for all the variance in factors)
3) first the largest, then sequentially become smaller
4) Next principle component account for as much of the remaining variance and is uncorrelated with all the preceeding components.
What is a factor/component
Each factor made up of sum of all measure variables (composite)
- weighted to reflect the strength of the correlation (Relationship with the variables)
- Higher weight = stronger relationship with the factor/component
What are loadings
Tell us how much each variable link up with the factor/component
The correlation of each measured variable with each factor
Range from 0 - 1 (0 = no loading) (1 = high correlation/loading)
Help us to find out how much variance of the measured variable is shared with the factor/component
Loading sizes .3 = min .5/.6 = mod .7/.8 = high
What are communalitites
How much each variable is explained by the factors/components
PCA = Communalities = 1
PAF = Communalities = Estimated squared multiple correlation (amount of variance accounted for by all the other variables)
After extraction Communialities become (h2) = sum of squared factor loadings
Use info from communalities to find how many factors and need to know number of factors to do communalities (indeterminacy problem)
Rotation Types
Initially principle components are all uncorrelated (factors completely independant, no overlap)
Orthogonal (Varimax) = Not correlated
- Maintains strict structure
- Good for interpretation, may not match construct/ data
Oblique rotation (oblim) = Allow correlation
- Relax constraint
- distinct but related factors
- when underlying constructs are seperate but related
Rotation
Re-weight the loadings to achieve simple structure
Simple structure
Is desirable - each variable only have high loading on 1 factor (Usually this does not occur)
Eigenvector
Principle Component
Eigenvalue
Variance explained by each component (total variance of all variables)
Each eigenvector has an associated eigenvalue
Negative eigenvalue mean there is measurement or statistical issue
Steps of factor analysis (6)
1) Obtain data on a set of variables
2) Standardise and create correlation matrix
3) Check there is a point to doing the factor analysis (Screening Run)
4) Extract components/factors
5) Rotate the ones you have extracted
6) Interpret loadings of variables on the factors
KMO
Measure sampling adequacy (.6 adequate)
Lower than .6 = more randomness than factors
Bartletts Sphericity
How far from 0 is your actual data (Identify matrix)
If violated = Can;t do meaningful factor analysis
Initial Communalities
Starting point
After extraction communalitites
Update after specific factors have been removed/extracted
Choosing factors to extract
Retain with egeinvalue >1 (Account for more variance than one single variable)
Catells scree test - as variance explained by factors decreases less rapidly (elbow in the plot) suggest the number of factors to extract (Subjective - inter-rator reliability can be low)