FA & PCA Flashcards
what is exploratory FA?
Explaining relationships between observed variables with a smaller number of factors. Latent factors that theoretically underpin data. Exploring.
Exploratory FA - An exploratory technique; decision making during FA is guided by the outcome of the analysis. Reducing the number of variables. Exploring number of factors in dataset. Help discover meaning and importance of factors by examining variance in observed variables they account for.
what is PCA?
Not theoretically driven – Just reducing data down to a smaller number of components. Don’t care about latent factors. Does not assume there is something latent out there driving the relationship. Just want to reduce down to simpler lower dimensional set of scores. E>G –> too many variables to satisfy assumptions of a regression (too many predictors / or highly correlated) …. So, use PCA to reduce them down to a set of unrelated components. Some people label them, but need to remember …. Not latent factors!
what does having more items or variables in FA achieve?
More items on qu. The better - the more you measure the same thing, the more accurate and reliable it gets.
what is confirmatory FA?
An explicit, theory-driven model is tested by placing various constraints on relationships between observed variables and latent factors, and between latent factors. Can derive estimates of particular parameters and overall model fit. SEM AMOS etc. Reducing the number of variables. Theory testing. Testing replicability of factors across time.
how does Exploratory FA work generally?
Takes a correlation matrix. Uses observed relationships between variables to derive factors that best explain the observed variance. Loadings between items and factors - strength of loading determines factors
How does labelling factors work in FA?
Choose clear labels for factors – FA relies on decision making of researcher, pragmatism and knowledge of research area. Sometimes called a ‘last ditch attempt’ to save dataset.
what is Coverage?
Item coverage – if you don’t have proper coverage over all topics (e.g. big 3 and then expanded to 5 personality factors), then you cant claim to have a comprehensive coverage = garbage in garbage out.
What are the three types of variance in FA/PCA?
So, in FA, items are influenced by variation in various things across people. Three things to be exact. What is it that is causing people to vary? 1) Shared variance between items 2) Unique Variance for one item (stable variance over time – meaningful – independent of other items) 3) error variance – (random in ideal case, you will vary in your score across times of measurements)
what variance does FA measure and use?
FA measures the SHARED Variance – minimal error and unique variance
what variance does PCA measure and use?
PCA – tries to account for ALL the variance (1’s in diagonal)
What are the technical differences between PCA and FA in regard to the use of variance?
This causes a technical difference with the way you set it up. In FA, you start process (iterative) by saying, each factor you use multiple regression to estimate the shared variance of the items of that factor. FA replaces 1’s (corr with itself) with estimate of shared variance on a particular factor with that item, using regression, start process and it iteratively improves the solution. PCA use the corr matrix as it is. You can tell from the initial communalities output which is being used.
where are the squared multiple correlations?
The squared multiple correlations (SMC) are diagonals of the correlation matrix
what is PCA for generally?
PCA is for reducing and extracting linear composites from data for further analysis
What does FA assume the relationship between factor and variance is?
FA Assumed factor causes variation in the observed items
What does PCA assume the relationship between components and variance is?
PCA Not causal, goes other way round. PCA component is driven by the items. The component is a representation of the items. It is an aggregate of the items. Not caused by them. Aggregated weighted fashion.
what practical issues are there to consider?
1) sample size
2) missing data
3) normality
4) outliers
5) Multicollinearity
6) FACTORISABILITY
7) EXTRACTION
8) ROTATION
what is the rule of sample size FOR FA/PCA?
Only rule of thumbs. Generally large sample sizes. Comrey and Lee (1992) suggest: 50= v poor 100=poor 200=fair 300=good 500 v good >1000 excellent. Smaller N ok if higher loading markers on factors.
SOME say ratio of participant to variable – Nunnally 10:1, Guildford 2:1, Barrett & Kline find 2:1 replicates structure while 3:1 is better)
How do we deal with missing data?
Imputation methods tend to over fit the data – essentially create data based on data that is there. Going to inflate the relationship and push correltions. Large samples = small amount missing is probs ok, casewise/listwise (diff) deletion might be ok.
what are the issues with outliers/ways of dealing with them?
Bad news. One point outliers can have a massive effect on correlations (ellipse) screen em out. Recall leverage, influence etc.
– check scatterplots / hists – Malhulonobis distance – dummy code them out and delete – transform variable –windsorize
what is listwise deletion?
Exclusion of cases listwise means that if there are a set of variables to be used in the PCA or FA then for a case’s data to be included that case must contribute a datapoint for each variable. If the case has missing data for any variable the case is deleted. The alternative would be to estimate the correlation matrix (for FA or PCA) based on the maximum amount of data for each variable concerned.