Week 10 Factor Analysis Principle Components Analysis (PCA) Flashcards

To provide an overview of Principle Components Analysis (PCA), which is a type of Factor Analysis

1
Q

How does Hills define Factor Analysis?

A

“Factor Analyses … is a generic term that covers a number of different but related analysis techniques, most importantly Principal Components Analysis (PCA) and Factor Analysis”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we need to bear in mind with Exploratory Factor Analysis?

A

Exploratory Factor Analysis (EFA) can only be undertaken in SPSS – Confirmatory Factor Analysis (CFA) requires another package for eg; Lisrel, AMOS, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the main differences between PCA, EFA and CFA?

A

*EFA just looks at shared variance
*PCA is a simple form of Factor Analysis that analyses all the items clustered together, & identifies as much variance as possible
*CFA is more complex looks at unexplained variance and
forces the unexplained constructs into the model - we put all items in and the AMOS program determines which load onto each construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do Tabachnick & Fidell say is the goal of research that uses PCA and Factor Analysis?

A

The goal of research using PCA or FA is to concisely describe, & perhaps understand, the relationships among observed variables or to test theory about underlying processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the uses for Principle Components Analysis (PCA) and Factor Analysis (FA)?

A

*PCA and FA have considerable utility in reducing numerous variables down to a few factors. *Mathematically, PCA and FA produce several linear combinations of observed variables, each linear combination a factor”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the point of Exploratory Factor Analysis (EFA)?

A

*EFA is intended to describe and summarize data by grouping correlated variables together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Tabachnick & Fidell tell us that FA and PCA differ on the variance that is analysed, how does each analyse the variance?

A
  • In PCA, all the variance in the observed variables is analysed.
  • In FA, only shared variance is analyzed; attempts are made to estimate and eliminate variance due to error & variance that is unique to each variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

So, just to clarify, why is Confirmatory Factor Analysis (CFA) so kick-arse?

A
  • CFA tests theoretical & conceptual underpinnings of a theoretical model with items loading on specific factors.
  • It measures both the amount of variance explained & the unexplained variance not accounted for within the model.
  • CFA identifies the residuals at each level of the model
  • CFA cannot be performed through SPSS.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In SPSS there are several Factor Analysis extractions available (except CFA), what are they?

A
  • PCA – the mathematically determined solution with the common, unique and error variances mixed into the components.
  • Principal Factor Extraction (PFE) – Estimates communalities in an attempt to eliminate unique and error variances from variables – only shared variance is evaluated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model Rotation is required because without rotation it would be difficult to interpret the results. What are the 2 main rotation methods used in SPSS?

A

*Orthogonal Rotation = axes are maintained at 90 degrees (Orthogonal means right angle
Most common is Varimax)

*Oblique Rotation = axes are not maintained at 90 degrees (Oblique rotations do not need to be at right angles. This is a good for things that are closely associated but not strong correlations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Tell me more about Orthogonal Rotation

A

Orthogonal rotation – Varimax rotation is orthogonal rotation that simplifies the factors by setting levels on a simplicity criterion & is the default option in SPSS.
*The goal of Varimax is to maximize the variance of the factor loadings by making high loadings higher and low ones lower for each factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Tell me more about Oblique Rotation

A

Oblique rotation – uses the orthogonally rotated solution on rescaled factor loadings, therefore the solution may be oblique with respect to the original factor loadings. *Note that the factors often do not correlate in Oblique rotation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What should I be wary of when considering undertaking Factor Analysis?

A
  • You should have a good spread of scores to produce enough variance in the inter-correlations.
  • Beware of factors that are defined by only 2 variables. You have a saturated model with 3 or less variables and the question should be asked would one question also be representative – some packages won’t run with 2 variables.
  • Relying on statistical analyses alone to produce results should not be done. Remember GIGO – garbage in, garbage out – your data and measures should be theoretically driven
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do Tabachnick and Fidell (2007) suggest a factor matrix should include?

A
  • A matrix that is factorable should include several sizeable correlations.
  • The expected size of correlation depends, to some extent, on the sample size
  • but if correlations do not exceed .30 then the use of FA is questionable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we we need to know about Bartlett’s test of sphericity?

A

Bartlett’s test of sphericity is very sensitive and with a large sample size it may yield significant results when the sample is >5 (greater than) per variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Kaiser’s measure of sampling adequacy?

A

Kaiser’s measure of sampling adequacy is a ratio of the sum of squared correlations to the sum of squared correlations plus sum of squared partial correlations. Values >= (greater han or equal to) .6 are required for good FA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

There are 3 important issues to consider in factor analysis, that are all based on covariance and matrices, what are they?

A
  • Estimates of Communality – if communality values equal or exceed 1, problems are indicated with the solution.
  • Adequacy in Extraction and # of Factors – Because inclusion of factors in a solution improves the fit between observed and reproduced correlation matrices, adequacy of extraction is tied to the number of factors. The more factors extracted, the better the fit and the greater the percent of variance in the data explained.
  • Interpretations of Factors – understanding the underlying dimensions that unifies the group of variables
18
Q

Factor Analysis is based on Correlations so similar assumptions apply, what are the 5 assumptions of FA?

A
  • Sample size – 5 participants per variable but ideally at least 100, with preference for between 200 and 1000.
  • SEM ideally 200 plus
  • Missing data creates a real problem in this form of analysis (a particular problem for FA).
  • Assumptions of Normality for each variable is required
  • Linearity – inspect scatter plots for each pair of variables
  • Outliers need to be addressed. For both Univariate & Multivariate outliers (Mahalanobis & Casewise Diagnostics). In addition, Outlying variables need to be identified.
19
Q

What are the 2 distinct phases in Factor Analysis?

A
  1. obtaining simple structure through rotating the factors will hopefully facilitate interpretation – Thurstone’s simple structure.
  2. extraction to achieve the number of reliable factors that may be reliably interpreted labelling each factor.
20
Q

What does the Component Matrix represent?

A

The Component Matrix represents the loadings between the variables and components.

21
Q

What does the Rotated Component Matrix represent?

A

The Rotated Component Matrix represents the loadings between the variables and rotated components and may be used to assist researchers interpret the components by representing the best simple structure after rotation to the best, simplest solution.

22
Q

What do we need to take into account when it comes to Factor Extraction?

A
  • Factors should be moderately or highly correlated, above .3.
  • KMO testing the sampling adequacy should be >.6.
  • Bartlett’s Test of Sphericity should be significant T&F want it below .001 (liberal).
  • Eigen Values need to be 1 or greater
  • Also need to consider Communalities
23
Q

What do we need to know about Communalities when it comes to Factor Extraction?

A

Communalities – explains how much variance is explained by the “true” components that emerge. Higher communalities = variables are well represented. Low communalities = indicates an outlying variable that should be eliminated from the analysis.

24
Q

Staying with Factor Extraction, how do we estimate the proportion of variance in the set of variables accounted for by a component?

A

The eigenvalue divided by the number of variables in the set gives the % of Variance. Mostly, values > 1 are kept, known as the Kaiser’s criterion.

25
Q

Still with Factor Extraction, what does the Scree Plot tell us? (scree = waste)

A

The scree plot gives a visual representation of how many variables to keep. Where the two lines intersect indicates how many factors to keep and how many to discard (the scree – waste).

26
Q

When it comes to interpretation of the Rotated Components Matrix, why is it useful to suppress loadings that are less than .3?

A

*by suppressing the values that are below .3 we are then able to make a more robust component

27
Q

What is a complex or cross loading?

A
  • a cross loading AKA a complex item: is one that loads on more than one component
  • it might be that it’s not clear what it is tapping into, it also might be that it is tapping 2 items
28
Q

Again, with interpretation of the components, how do we determine the best label for each factor?

A
  • carefully look at each component & review the wording of each item.
  • What does the grouping of these items tell you about the factor so that you may label it?
  • Bear in mind where loadings are seen to greater on more than one component, this interpretation becomes complex.
29
Q

Labelling the components can be quite difficult. As suggested by T & F, 2007, to properly create a questionnaire, often it takes many attempts to carefully identify and then measure a given abstract concept, that may be both reliably and validly used. Often it entails several attempts at reliably obtaining information and many pilot studies undertaken. What tips can you give me to help me improve my scale?

A
  • Consider what might Component 1 indicate after reading the information on each item?
  • Do same with Component 2
  • it is important to reflect on what you would think would be appropriate labelling.
  • If on viewing the rotated components matrix there is a complex variable (e.g. Moon) which cross loads between the other two components.
  • carefully look at the wording and the intent of the question being posed in the questionnaire.
  • It might confuse things, so might delete it
30
Q

What is Data Summarization?

A

Data summarization = derives underlying dimensions that, when interpreted and understood, describe the data in a much smaller number of concepts than the original individual variables.
*NB bearly used now

31
Q

What is Data Reduction?

A

Data reduction = extends the process of data summarization by deriving an empirical value (factor score or summated scale) for each dimension (factor) and then substituting this value for the original values.
*NB: is the popular way to go

32
Q

What does (Orthogonal and Oblique) factor rotation achieve?

A

*The ultimate effect of rotating the factor matrix is to redistribute the variance from earlier factors to later ones to achieve a simpler, theoretically more meaningful factor pattern.
Factor rotation = the reference axes of the factors are turned about the origin until some other position has been reached.
*unrotated factor solutions extract factors based on how much variance they account for, with each subsequent factor accounting for less variance (less robust).

33
Q

Okay, how do I chose between Orthogonal and Oblique factor rotation?

A
  • Orthogonal rotation methods are the most widely used rotational methods. (Varimax & Promax)
  • Orthogonal the preferred method when data reduction is the research goal (to either a smaller number of variables or a set of uncorrelated measures for subsequent use in other multivariate techniques).

*Oblique rotation methods are best suited to obtaining several theoretically meaningful factors or constructs as very few constructs in the “real world” are uncorrelated.

34
Q
  1. What are the major uses of factor analysis? (Probable Exam Question)
A

The major uses of factor analysis are to describe and summarize data (Questionnaire or instrument design). This done by grouping variables (items) that are similar together that are correlated.

35
Q
  1. What is the difference between component analysis and factor analysis? (Probable Exam Question)
A

The difference between PCA and Factor Analysis is related to variance. In PCA all variance is accounted for in the analysis, whereas in Factor analysis it is only shared variance that is explained.

36
Q
  1. Is rotation of factors necessary?

(Probable Exam Question)

A

Yes to find a simple structure or solution. It is often the case that the initial unrotated solution does not provide the simplest structure. Iterations occur that allow for the matrix to be rotated to provide the best grouping of variables.

37
Q
  1. How do you decide how many factors to extract?

(Probable Exam Question)

A

*Extraction: PCA – the mathematically determined solution with the common, unique and error variances mixed into the components.
*Whereas, Principal Factor Extraction estimates communalities in an attempt to eliminate unique and error variances from the variables.
*Eigenvalues estimates the proportion of variance in the set of variables accounted for by a component.
The eigenvalue divided by the number of variables in the set gives the percentage of variance – also known as Kaiser’s Criterion.
*Alternatively, some look to the scree plot, there are a number of options suggested in this lecture.
*Different references will suggest different ways of assessing the structure of the analysis.

38
Q
  1. What is a significant factor loading?

(Probable Exam Question)

A

Generally loadings above .3 are considered important enough to take into account. However, given we are looking for simple structure, Thurstone’s simple structure, so loadings above .3 should not be found on more than one factor.

39
Q
  1. How and why do you name a factor?

(Probable Exam Question)

A

A factor is labelled not with a convenient summary but a label that identifies the underlying structure of the group of items loading on the factor.

40
Q
  1. Should you use factor scores or summated ratings in follow-up analyses?
    (Probable Exam Question)
A

Factor scores are a standardised value that is the weighted according to the component or factor loadings, whereas summated ratings or scales only use the variables/items rating highly as a summative measure for all follow up analyses.

  • A summated scale is only as good as the items used to represent the construct. While it may pass all empirical tests, it is useless without theoretical justification.
  • Never create a summated scale without first assessing its unidimensionality with exploratory or confirmatory factor analysis.
  • Creating Summated Scales – combining several individual variables into a single composite measure. *Dimensionality – Items are unidimensional, strongly associated and measure a single scale.
  • Whereas, Computing Factor Scores – Factor loadings of all variables on the factor, whereas the summated scale is calculated by combining only selected variables.
41
Q

What assumptions checks are essential for Factor Analysis

A

*Check Multivariate outliers in Regression – pg 261, Hills
*Mahalanobis Distance
*Scores identified with MV outliers in excess of the critical value on χ2 of 29.588. Check the df (=number of items) (in this case 10) at p<.001. Our value is 31.127. Go to column to identify the extent of MV outliers.
Perhaps sort values for this column making it easier to identify them.
*What will you do & what criteria or reference will you use

42
Q

Final things to remember based on the example in class

A
  • Examine Correlation Matrix need correlations above .3
  • KMO check should be >.6
  • Bartlett’s test of sphericity, should be sig.
    • Tests the HO that correlations are zero.
  • Communalities High communalities indicate good representation.
  • Check Total Variance explained.
  • Eigenvalue of a factor is the proportion of variance in the set of variables .
  • Kaiser’s criterion on number of components to retain.
  • Also check scree plot. Where lines intersect, is the suggested # of components to use, or are you going with Hair et al’s criteria.
  • Hills explains about the sq’d correlations being additive for SS of communalities