Factor Analysis Flashcards
What are you trying to do in a factor analysis?
Trying to build a table - solid, dependable, everyone uses them, easy to understand. But when you look closer, it is hard to construct a table - legs, what it is made up of etc
First one won’t be very good
How do you make a questionnaire?
Items: the part people interact with - see if they are good quality
Factors: the structure which holds up the items, want a few but provide the main support
What is the main idea?
Reduce a large number of variables to a smaller set of representative, meaningful variables while keeping as much information as possible - identify factors from a large set of correlated items
What do you get out of a FA?
A set of statistically identified factors - clusters of items which all measure the same characteristics / data - use these as variables for future analyses
What are factors?
Clusters of items which all measure the same characteristic
What is the goal?
Identify how many factors you have and what characteristic those factors each represent
What are the stages of FA?
- Identify variables and design
- Check data and assumptions
- Rotation
- Interpret the results
Identify variables and design: what are the initial checks?
Check the data
is there any missing?
what scale is the data measured on? - what do each end mean?
how many items and participants are there?
remove ppts who are incomplete cases or make invalid answers
Checking data and assumptions: what are the initial checks?
Normality and standard deviations
check the items are normally distributed (all of them)
check SD’s are between 0.5 and 1.5
identify the worst offenders - if all the data is skewed, can’t chuck them all out, identify the worst ones, may want to exclude them
What should the SD’s be?
Between 0.5 and 1.5
Checking data and assumptions: what are the second checks?
Correlations
Sphericity
Sampling adaquacy
Checking the correlations
You get a massive correlation - you expect them to correlate in FA, we are looking for underlying factors that explain groups of items so want correlations
What are the possible problems with correlations?
- items that don’t correlate with anything else - might indicate that an item doesn’t measure the construct, so not valid
look for items with r < .3 or p > .05 (not just one, has to be many) - items that correlate too highly - too much overlap, measuring the same thing, not valid
singularity r > .9
problems with multicollinearity
Check the determinant - should be greater than 0.00001 - no problems with multicollinearity
What should you do with correlations?
Identify the worst offenders - can’t discard them all if lots of WO, pick the ones which don’t correlate with loads of items, if it is just one item, then it is fine, report with justification
At this point, run the analysis with the final set of items
How do you check for suitability of the data?
Kaiser-Meyer-Olkin Measure of sampling adequacy KMO
Do you have a sufficient sample to extract the factors?
Values range between 0 (inappropriate) and 1 (go for it)
marvellous - bigger than .9
middling - bigger than .7
miserable - above 0.5
if below 0.5, you should stop and collect more data or do something else
Report it and cite the data
How do you check for the sphericity of the data?
Using Bartlettes test - see whether the correlations are too small for FA
if everything is okay, this test will be significant
Reporting: X2 (df) = chi square value, p value
Interpreting the results: what do we do here?
Extraction - how many do we have, which items below with each factor, what do the factors represent?
What is extraction?
Deciding how many factors best capture our data - we want parsimony, explaining as much variance as we can with as few factors as possible - don’t want to lose much data
What is Kaisers criteria for extraction?
Automatically extracts eigenvalues bigger than 1
What is an eigenvalue?
The variance in all the variables accounted for by a particular factor
If it is low, it doesn’t explain much - can be disregarded
A measure of how useful it is - each factor has its own eigenvalue - measure how much weight of the table each leg holds up, if it isn’t holding much up, we can get rid of it and not lose much
What are the initial eigenvalues?
Tells you how much factors you have - a factor for each given variable
Is Kaisers criteria always okay to use to extract factors?
No, it is only reliable if it meets certain circumstances
What are the circumstances in which you can use Kaisers criteria?
There are fewer than 30 variables and all commonalities are bigger than 0.7
or
There are more than 250 participants and the average communality is bigger or equal to 0.6
What is a communality?
The percent of variance in a variable explain by all of the factors together - after extraction, some information is lost
Communality after extraction - variance in each variable explained by the remaining factors
Higher - factor structure better explains the variance in variables - bigger is better
e.g. if .67 = 67% of variance of item 3 is explained by all of the factors
What do we do if Kaisers criterion is unreliable?
If you have more than 200 ppts, you can use a scree plot to decide the number of factors
What are we looking for in a screeplot?
The inflexion point: where the slope changes
It is the point where the slope changes and goes up, count from the left of the point of inflexion - don’t keep the factor where it changes
Very prone to interpretation - as long as you explain it
Can rerun the analysis with fixed amounts of factors
How can you tell how much variance our factors explain?
Look at: total variance explained table
Extraction sum of loadings - at the end of factor 6, there will be a cumulative percentage of what all the factors explain together
What do you do after deciding the amount of factors?
Rotation
What is rotation?
It optimises how the items load onto a factor - it should equalise the variance explained of each factor, so they all explain similar amounts
Aids and clarifies the interpretation - doesn’t change the number of factors or effect the method of extraction
What are the two types of extraction?
Orthogonal - factors are uncorrelated, independent of each other
Oblique - when the factors correlate, theoretical grounds thinking they will correlate
What do you use if you believe the factors are independent of each other?
Orthogonal rotation
Varimax
What do you use if you believe the factors are correlated?
Oblique rotation
Direct oblimin
How do you decide which rotation to use?
Look at previous research - see what other questionnaires have done
Think about your factors - do you think they will correlate
Make a choice based on your own research and judgement - and explain why you have made this decision
What does rotation actually do?
Spreads the variance more evenly among the factors - it is the same total variance explained, but the eigenvalues have changed so that they are better distributed - each factor explains a similar amount of variance rather than 1 explaining loads
How do you identify and name the factors?
Look at the items listed that load onto each factor - the number is the loading, the higher the loading, the stronger the association with that factor
Name them yourself
Negative loadings - because some of the questions were worded negatively
Sometimes cross-loadings - items which led onto more than one factor - allocate to either a higher factor or one that makes the most sense