M18 - Factor & Cluster Analysis Flashcards

1
Q

Factor and Cluste Analysis

  • are … … techniques
  • difference
A
  • are DATA REDUCTION techniques
  • difference:
    FA - groups variables
    CA groups observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Factor Analysis method

A

– classification of similar variables measuring the same in factors/groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cluster Analysis method

  • classification
  • principle
A

– classification of similar objects in groups

  • principle: homogeneity within each group, heterogeneity between the groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Factor analysis

  • what is a factor?
  • Factor analysis tries to identify…
  • always fewer … than …
  • Factor analysis is an … technique in which all the variablesare … considered, each … to all others.
  • -> contrast to regression analysis
A
  • factor = is a construct, hypothetical entity, ‘latent’ variable, that is assumed to underly tests
  • Factor analysis tries to identify a set of common underlying factors, in a group of variables
  • always fewer FACTORS than VARIABLES
  • Factor analysis is an INTERDEPENDENCE technique in which all the variablesare SIMULTANEOUSLY considered, each RELATED to all others.
    – In contrast, regressions are dependence techniques in which one variable isexplicitly regarded as the dependent variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What the factor laoding?

A

the correlation coeff ajs between factor and variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Purpose of factor analysis

1-4

A
  1. test validity of a scale
  2. reveal interesting patterns
  3. solving problems of ulticollinearity, if two variables are correlated and theoretically meaningful
  4. smaller number of variables to work with
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

two types of factor analysis
1.
2.

A
  1. Exploratory factor analysis
    - used to identify complex interrelationships among items and group items that are part of unified concepts. - no a priori assumptions about relationships among factors.
  2. Confirmatory factor analysis
    - tests the hypothesis that the items are associated with specific factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Exploratory Factor Analysis

  • uncovers …
  • priori ass.:
  • shows … and assesses …
  • needs to be done …
A
  • uncovers underlying structures in a large set of variables
  • priori ass.: any indicator ay be associated with any factor
  • shows (uni)dimensionality and assesses reliability of a scale
  • ALWAYS needs to be done
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Confirmatory Factor Analysis

  • determines
  • … … are selected on the basis of … … and factor analysis is used to if they … as ….
  • shows … …
  • not … if the scale has been … …
A
  • determines the number of factors based on what is expected from previous research
  • INDICATOR VARIABLES are selected on the basis of PRIOR THEORY and factor analysis is used to see if they LOAD as PREDICTED
  • shows GENERAL VALIDITY of a scale
  • not NECESSARY if the scale has been used before
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

3 steps of exploratory factor analysis
1-3

Assumptions!

A
  1. Correlation matrix
    - -> examine corr of variables and KMO-criterion: should be >=0.8, must be >=0.5
  2. Factor extraction from variables
  3. Factor rotation
    - -> to maximize relship between variables and factors

Assumptions:

  • variables continuous
  • variables normally distributed
  • but also for ordinal var possible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Factor extraction method
1-2
Key differences

A
  1. Principal component f.a.
    - -> maximize expl variance among underlying variables
    purpose: derive a small number of linear combinations (principal components) to retain as much information from the original variables as possible
  2. Common f.a.
    - -> maximize the underlying correlations among the underlying variables

differences:
- ass Principal c.f.a.: all variance can be explained –> use: data reduction
- Common f.a.: aims at latent constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many factors?

3 criterias

A
  1. Kaiser criterion: Eigenvalue >1
  2. achieving a high specified cumulative % of total variance (usually 60%)
  3. Elbow criteria in screeplot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kaiser criterion:
Eigenvalue
Communality
Factor loadings

A

Eigenvalue = sum of suqared factor loadings of one factor over all variables / the amount of variance explained by a factor
Communality = sum of squared factor laodings of one variable / the proportion of common variance present in a variable (rest is random variance)
Factor loadings = correlation of factor and variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Factor distinct

  • why?
  • what it does:
A
  • so that variables only load on one factor, not many
  • reference axes of the factors are turned about the origin
    – Process of adjusting axes to achieve a more meaningful factor solution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Factor rotation

  • why?
  • how?
  • result
A
  • to discriminate between different factors
  • rotate the axes such that variables are oaded maximally to only one factor
  • by rotating the axes we ensure that both clusters of variables are intersected by the factor to which they relate the most
  • -> after rotation the laodings of the variables are maximized on one factor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kaiser-Meyer-Olkin criteria

A

Exploratory Factor Analysis - Corrleation matrix
is this data set suitable for factor analysis?

  • -> between 0 and 1
  • -> if 0, than the sum of partial correlations is large relative to the sum of correlations –> diffusion in the pattern of correlations –> f.a. inappropriate

–> if 1, than the patterns are relatively compact and f.a. should yield distinct and reliable factors

17
Q

Bartlett’s test

A

tells us whether our correlation matrix is significantly different from an identity matrix

–> used to test if k samples are from populations with equal variances.

Equal variances across populations is called homoscedasticity or homogeneity of variances.

18
Q

What to do with the results of factor rotation?
–> study the … …

  • what if a variable still shows it?
  • assess reliability through … …
A
  • -> study the CROSS LOADINGS
  • variable should only load on one factor
  • factor loading for variable of interets > 0.4 (<0.4 all others)
  • drop variable and run again
  • still use the factor scores
  • assess reliability of a scale through CRONBACHS ALPHA
    = determines internal consistency
    – Ranges between 0 and 1. Values ≥ 0.7 are good (and expected!)
19
Q

Factor score

index scale

summated scale

A
  • estimated values of the factors for each observation
  • Index scales are calculated as the average score across items, summated scales as their sum
  • Summated scales are a collection of related questions that measure underlying constructs.
20
Q

Difference between factor and index score?

A

index score: add the values up

factor score: add them up with a weight accord. to their factor loading

21
Q

Cluster Analysis

  • goal
  • cluster algorithms
A
  • goal: assign observations to homogeneous clusters
  • -> classification of similar objects in groups
  • centroid-based / k-means: no. of clusters k is specified in advance, and then found per mathematical optimization
    – Connectivity-based or hierarchical: algorithms connect items based on their “distance” from one another.

-

22
Q

partly new algorithms exits in

A
  • Big data: needs different algorithms to overcome performance issues

– Social network analysis: usually uses connectivity-based logic

23
Q

Steps in hierarchical cluster analysis

A
  1. Choose a metric
    - Similarity
    - Dissimilarity –> Euclidian Distance
  2. choose linkage criteria
    - -> Divisive (Top-down): start with one cluster, that is split up until no. of clusters = no. of observations
    - -> Agglomerative (Bottom-up): start no. of clusters = no . of objects, than you aggregate them until you have one cluster left
  3. Choose no. of clusters
    - -> Statistically (quality measures, stopping value)
    - -> Optically (elbow criteria)
    - -> Conveniently (ease of use, heterogeneity)
    - -> Robust (check with rules of k-means clustering)
24
Q

Wrap-up Cluster and Factor analysis

A

 Factor analysis reduces the no. of variables
– allows to address issues of multicollinearity
– Key method to assess reliability and validity of constructs;

 Cluster analysis reduces the number of observations
– More practical than research applications

25
Q

Squared Euclidian DIstance

A

for each pair of objects, the difference values of each attribute are squared and added up

  • -> large values get higher significance
  • -> small differences indicate a cluster
26
Q

M19 -
1. Cluster combined

  1. Coefficients
  2. Stage cluster first appears and next stage
A
  1. whcih clusters are merged to a new cluster in the different iterations
    - -> when two clusters merge always retain the lower cluster
  2. the error sum of squares corresponding to the clustering iterations
  3. in which iteration cases and clusters are merged with already existing clusters