M18 - Factor & Cluster Analysis Flashcards
Factor and Cluste Analysis
- are … … techniques
- difference
- are DATA REDUCTION techniques
- difference:
FA - groups variables
CA groups observations
Factor Analysis method
– classification of similar variables measuring the same in factors/groups
Cluster Analysis method
- classification
- principle
– classification of similar objects in groups
- principle: homogeneity within each group, heterogeneity between the groups
Factor analysis
- what is a factor?
- Factor analysis tries to identify…
- always fewer … than …
- Factor analysis is an … technique in which all the variablesare … considered, each … to all others.
- -> contrast to regression analysis
- factor = is a construct, hypothetical entity, ‘latent’ variable, that is assumed to underly tests
- Factor analysis tries to identify a set of common underlying factors, in a group of variables
- always fewer FACTORS than VARIABLES
- Factor analysis is an INTERDEPENDENCE technique in which all the variablesare SIMULTANEOUSLY considered, each RELATED to all others.
– In contrast, regressions are dependence techniques in which one variable isexplicitly regarded as the dependent variable.
What the factor laoding?
the correlation coeff ajs between factor and variable
Purpose of factor analysis
1-4
- test validity of a scale
- reveal interesting patterns
- solving problems of ulticollinearity, if two variables are correlated and theoretically meaningful
- smaller number of variables to work with
two types of factor analysis
1.
2.
- Exploratory factor analysis
- used to identify complex interrelationships among items and group items that are part of unified concepts. - no a priori assumptions about relationships among factors. - Confirmatory factor analysis
- tests the hypothesis that the items are associated with specific factors
Exploratory Factor Analysis
- uncovers …
- priori ass.:
- shows … and assesses …
- needs to be done …
- uncovers underlying structures in a large set of variables
- priori ass.: any indicator ay be associated with any factor
- shows (uni)dimensionality and assesses reliability of a scale
- ALWAYS needs to be done
Confirmatory Factor Analysis
- determines
- … … are selected on the basis of … … and factor analysis is used to if they … as ….
- shows … …
- not … if the scale has been … …
- determines the number of factors based on what is expected from previous research
- INDICATOR VARIABLES are selected on the basis of PRIOR THEORY and factor analysis is used to see if they LOAD as PREDICTED
- shows GENERAL VALIDITY of a scale
- not NECESSARY if the scale has been used before
3 steps of exploratory factor analysis
1-3
Assumptions!
- Correlation matrix
- -> examine corr of variables and KMO-criterion: should be >=0.8, must be >=0.5 - Factor extraction from variables
- Factor rotation
- -> to maximize relship between variables and factors
Assumptions:
- variables continuous
- variables normally distributed
- but also for ordinal var possible
Factor extraction method
1-2
Key differences
- Principal component f.a.
- -> maximize expl variance among underlying variables
purpose: derive a small number of linear combinations (principal components) to retain as much information from the original variables as possible - Common f.a.
- -> maximize the underlying correlations among the underlying variables
differences:
- ass Principal c.f.a.: all variance can be explained –> use: data reduction
- Common f.a.: aims at latent constructs
How many factors?
3 criterias
- Kaiser criterion: Eigenvalue >1
- achieving a high specified cumulative % of total variance (usually 60%)
- Elbow criteria in screeplot
Kaiser criterion:
Eigenvalue
Communality
Factor loadings
Eigenvalue = sum of suqared factor loadings of one factor over all variables / the amount of variance explained by a factor
Communality = sum of squared factor laodings of one variable / the proportion of common variance present in a variable (rest is random variance)
Factor loadings = correlation of factor and variable
Factor distinct
- why?
- what it does:
- so that variables only load on one factor, not many
- reference axes of the factors are turned about the origin
– Process of adjusting axes to achieve a more meaningful factor solution
Factor rotation
- why?
- how?
- result
- to discriminate between different factors
- rotate the axes such that variables are oaded maximally to only one factor
- by rotating the axes we ensure that both clusters of variables are intersected by the factor to which they relate the most
- -> after rotation the laodings of the variables are maximized on one factor
Kaiser-Meyer-Olkin criteria
Exploratory Factor Analysis - Corrleation matrix
is this data set suitable for factor analysis?
- -> between 0 and 1
- -> if 0, than the sum of partial correlations is large relative to the sum of correlations –> diffusion in the pattern of correlations –> f.a. inappropriate
–> if 1, than the patterns are relatively compact and f.a. should yield distinct and reliable factors
Bartlett’s test
tells us whether our correlation matrix is significantly different from an identity matrix
–> used to test if k samples are from populations with equal variances.
Equal variances across populations is called homoscedasticity or homogeneity of variances.
What to do with the results of factor rotation?
–> study the … …
- what if a variable still shows it?
- assess reliability through … …
- -> study the CROSS LOADINGS
- variable should only load on one factor
- factor loading for variable of interets > 0.4 (<0.4 all others)
- drop variable and run again
- still use the factor scores
- assess reliability of a scale through CRONBACHS ALPHA
= determines internal consistency
– Ranges between 0 and 1. Values ≥ 0.7 are good (and expected!)
Factor score
index scale
summated scale
- estimated values of the factors for each observation
- Index scales are calculated as the average score across items, summated scales as their sum
- Summated scales are a collection of related questions that measure underlying constructs.
Difference between factor and index score?
index score: add the values up
factor score: add them up with a weight accord. to their factor loading
Cluster Analysis
- goal
- cluster algorithms
- goal: assign observations to homogeneous clusters
- -> classification of similar objects in groups
- centroid-based / k-means: no. of clusters k is specified in advance, and then found per mathematical optimization
– Connectivity-based or hierarchical: algorithms connect items based on their “distance” from one another.
-
partly new algorithms exits in
- Big data: needs different algorithms to overcome performance issues
– Social network analysis: usually uses connectivity-based logic
Steps in hierarchical cluster analysis
- Choose a metric
- Similarity
- Dissimilarity –> Euclidian Distance - choose linkage criteria
- -> Divisive (Top-down): start with one cluster, that is split up until no. of clusters = no. of observations
- -> Agglomerative (Bottom-up): start no. of clusters = no . of objects, than you aggregate them until you have one cluster left - Choose no. of clusters
- -> Statistically (quality measures, stopping value)
- -> Optically (elbow criteria)
- -> Conveniently (ease of use, heterogeneity)
- -> Robust (check with rules of k-means clustering)
Wrap-up Cluster and Factor analysis
Factor analysis reduces the no. of variables
– allows to address issues of multicollinearity
– Key method to assess reliability and validity of constructs;
Cluster analysis reduces the number of observations
– More practical than research applications
Squared Euclidian DIstance
for each pair of objects, the difference values of each attribute are squared and added up
- -> large values get higher significance
- -> small differences indicate a cluster
M19 -
1. Cluster combined
- Coefficients
- Stage cluster first appears and next stage
- whcih clusters are merged to a new cluster in the different iterations
- -> when two clusters merge always retain the lower cluster - the error sum of squares corresponding to the clustering iterations
- in which iteration cases and clusters are merged with already existing clusters