L10 - Factor analysis Flashcards

1
Q

Factor analysis definition (Malhotra, 2010)

A
  • a class of procedures primarily used for data reduction and summarization.
  • take a large number of variables or objects and searches to see whether they have a small number of factors in common which account for their intercorrelations.
  • an interdependence technique in that an entire set of interdependent relationships is examined.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Applications of Factor analysis (Malhotra, 2010)

A

1) Market segmentation - identify the underlying variable to group the customers.
2) Product research - determine the brand attributes that influence consumer choice
3) Advertising studies - identify the characteristics of price-sensitive consumers.
4) Data reduction; Structure identification; Measurement scale purification; Scale development and Data transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bartlett’s test of sphericity definition (Malhotra, 2010)

A

used to examine the hypothesis that the variables are uncorrelated in the population.
> the population correlation matrix is an identity matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation matrix definition (Malhotra, 2010)

A

a lower triangle matrix showing the simple correlations, r, between all possible pairs of variables included in the analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Factor loading plot definition (Malhotra, 2010)

A

A plot of the original variables using the factor loadings as coordinates. The correlation of a variable and a factor indicates the degree of correspondence between the variable and a factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Factor matrix definition (Malhotra, 2010)

A

Contain the factor loadings of all variables on all the factors extracted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Factor score coefficient matrix definition (Malhotra, 2010)

A

the matrix contains weights (or factor score coefficients) used to combine the standardized variables to obtain factor scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

KMO (Kaiser-Meyer-Olkin) definition (Malhotra, 2010)

A

an index used to examine the appropriateness of FA. Value < 0.5 imply that FA may not be appropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Residuals definition (Malhotra, 2010)

A

the differences between the observed correlations, as given in the input correlation matrix, and the reproduced correlations, as estimated from the factor matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scree plot definition (Malhotra, 2010; Field, 2013)

A
  • A plot of eigenvalues against the number of factors in order of extraction. (X-axis: factor; Y-axis: eigenvalue)
  • The point where curve first begins to straightens out = maximum number of factors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The procedure for conducting factor analysis (Malhotra, 2010)

A

1) Formulate the problem
2) Construct the correlation matrix
3) Determine the method of factor analysis
4) Determine the number of factors
5) Rotate the factors
6) Interpret the factors
7) Calculate factor scores or select surrogate variables
8) Determine the fit of the FA model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Step 1: Formulate the problem

A

Identify the objectives of FA, as well as specifying the appropriate sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Assumptions of the factor analysis (Malhotra, 2010)

A
  • Metric measures variables.
  • Sample size > 50 cases, each case > 5 per variable
  • Sufficient correlations: > 0.30
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Step 2: Construct the correlation matrix

A

To test the correlation:

  • Test Ho with Bartlett’s test of sphericity: identity matrix, sig < 0.05
  • Test with KMO: value < 0.5 => FA may not be appropriate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Two methods to derive the factor score coefficients:

A

Principal components analysis and Common factor analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Principal components analysis definition (Malhotra, 2010)

A
  • Considers the total variance in the data.
  • Primary concern is to determine the minimum number of factors that will account for maximum variance in the data for use in subsequent multivariate analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Common factor analysis definition (Malhotra, 2010)

A
  • Estimates the factors based only on the common variance.

- Primary concern is to identify the dimensions and common variance.

18
Q

Step 4: Determine the number of factors

A
  • Eigenvalue: > 1 or Scree plot.

- Percentage of variance accounted for: >= 60%

19
Q

A priori determination definition (Malhotra, 2010)

A

The prior research helps researchers to know how many factors to expect and thus can specify the number of factors to be extracted.

20
Q

Step 5: Rotate the factors

A

Through rotation, the factor matrix is transformed into a simpler one that is easier to interpret.

21
Q

Two ways to rotate the factors:

A

Orthogonal rotation, and Oblique rotation

22
Q

Orthogonal rotation definition (Malhotra, 2010)

A

Rotation of factors in which the axes are maintained at the right angles.

23
Q

Varimax procedure definition (Malhotra, 2010; Field, 20)

A
  • The orthogonal method of factor rotation - keeping the factors independent and uncorrelated.
  • Minimize the number of variables with high loadings on a factor, thereby enhancing the interpretability of the factors.
  • Suppress the factor loadings < 0.4 to show clearer interpretation of factor solution.
24
Q

Oblique rotation definition (Malhotra, 2010)

A

Rotation of factors in which the axes are not maintained at right angles.

25
Q

Step 6: Interpret the factors

A
  • The factor can be interpreted in terms of the variables that load high on it.
  • Higher loadings has more influence on the factor and should be reflected in the name.
26
Q

Composite measure approach definition (slide)

A

Represents the average of the variables that make up a factor.

27
Q

Pros and cons of composite measure (slide)

A
  • Pros: More stable results; Easier to interpret.

- Cons: Need to be calculated; Some correlations remain but this is negligible.

28
Q

Pros and cons pf Factor scores approach (slide)

A
  • Pros: easily calculated. correlation of 0 between factors.

- Cons: difficult to interpret, inaccurate results.

29
Q

Selecting the surrogate variable approach (Malhotra, 2010)

A

It means singling out some of the original variables for use in subsequent analysis - analysing the original variables rather than factors

30
Q

Pros and cons of selecting the surrogate variable approach (Malhotra, 2010)

A
  • Pros: work well if one factor loading for a variable is clearly higher than all other factor loadings.
  • Cons: cannot remove if they have similarly high loadings.
31
Q

Step 8: Determine the fit

A
  • Use residuals to determine the factor model fit.

- Many large residuals > Not a good fit to the data and the model should be reconsidered.

32
Q

Roles of factor analysis and principal component analysis (Field, 2013 and Malhotra, 2010)

A

1) To understand the structure of a set of variables. It is identifying underlying dimensions, or factors, that explain the correlations among a set of variables.
2) To construct a questionnaire to measure an underlying variable.
3) To reduce a data set to a more manageable size and uncorrelated variable while retaining as much of the original information as possible. This is to replace the original set of correlated variables.
4) To identify a smaller set of salient variables from a larger set for use in subsequent multivariate analysis.

33
Q

KMO criterion on value (Hutcheson and Sofroniou, 1999).

A
Marvelous: value in the 0.90s
Meritorious: value in the 0.80s
Middling: values in the 0.70s
Mediocre: values in the 0.60s
Miserable: values in the 0.50s
Merde: values below 0.50s
34
Q

Factor definition (Field, 2013)

A

The explanatory constructs that are used to explain the maximum amount of common variance in the correlation matrix.

35
Q

Communality definition (Malhotra, 2010)

A
  • The amount of variances a variable shares with all other variables included in the analysis.
  • Communality is < 0.5: remove the variable.
36
Q

Eigenvalue definition (Malhotra, 2010)

A
  • Represent the total variance explained by each factor.

- Any factor should account for the variance of at least one variable. Eigenvalue of factor > 1 are included

37
Q

Factor loadings definition (Malhotra, 2010)

A

Simple correlations between the variables and the factors.

38
Q

Factor score definition (Field, 2013)

A

The composite scores for each individual on a particular factor.

39
Q

Percentage of variance (Malhotra, 2010)

A

The percentage of total variance attributed to each factor.

40
Q

Total variance explained definition (Malhotra, 2010)

A
  • The cumulative percentage of variance extracted by the factors reaches a satisfactory level.
  • The factors extracted should account for >= 60% of the variance.